Benchmark Lisp And Ada Programs
NASA Technical Reports Server (NTRS)
Davis, Gloria; Galant, David; Lim, Raymond; Stutz, John; Gibson, J.; Raghavan, B.; Cheesema, P.; Taylor, W.
1992-01-01
Suite of nonparallel benchmark programs, ELAPSE, designed for three tests: comparing efficiency of computer processing via Lisp vs. Ada; comparing efficiencies of several computers processing via Lisp; or comparing several computers processing via Ada. Tests efficiency which computer executes routines in each language. Available for computer equipped with validated Ada compiler and/or Common Lisp system.
NASA Technical Reports Server (NTRS)
Walston, W. H., Jr.
1986-01-01
The comparative computational efficiencies of the finite element (FEM), boundary element (BEM), and hybrid boundary element-finite element (HVFEM) analysis techniques are evaluated for representative bounded domain interior and unbounded domain exterior problems in elastostatics. Computational efficiency is carefully defined in this study as the computer time required to attain a specified level of solution accuracy. The study found the FEM superior to the BEM for the interior problem, while the reverse was true for the exterior problem. The hybrid analysis technique was found to be comparable or superior to both the FEM and BEM for both the interior and exterior problems.
Application of microarray analysis on computer cluster and cloud platforms.
Bernau, C; Boulesteix, A-L; Knaus, J
2013-01-01
Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
T.Z. Ye; K.J.S. Jayawickrama; G.R. Johnson
2006-01-01
Using computer simulation, we evaluated the impact of using first-generation information to increase selection efficiency in a second-generation breeding program. Selection efficiency was compared in terms of increase in rank correlation between estimated and true breeding values (i.e., ranking accuracy), reduction in coefficient of variation of correlation...
NASA Technical Reports Server (NTRS)
Dayton, J. A., Jr.; Kosmahl, H. G.; Ramins, P.; Stankiewicz, N.
1979-01-01
Experimental and analytical results are compared for two high performance, octave bandwidth TWT's that use depressed collectors (MDC's) to improve the efficiency. The computations were carried out with advanced, multidimensional computer programs that are described here in detail. These programs model the electron beam as a series of either disks or rings of charge and follow their multidimensional trajectories from the RF input of the ideal TWT, through the slow wave structure, through the magnetic refocusing system, to their points of impact in the depressed collector. Traveling wave tube performance, collector efficiency, and collector current distribution were computed and the results compared with measurements for a number of TWT-MDC systems. Power conservation and correct accounting of TWT and collector losses were observed. For the TWT's operating at saturation, very good agreement was obtained between the computed and measured collector efficiencies. For a TWT operating 3 and 6 dB below saturation, excellent agreement between computed and measured collector efficiencies was obtained in some cases but only fair agreement in others. However, deviations can largely be explained by small differences in the computed and actual spent beam energy distributions. The analytical tools used here appear to be sufficiently refined to design efficient collectors for this class of TWT. However, for maximum efficiency, some experimental optimization (e.g., collector voltages and aperture sizes) will most likely be required.
NASA Astrophysics Data System (ADS)
Beck, Jeffrey; Bos, Jeremy P.
2017-05-01
We compare several modifications to the open-source wave optics package, WavePy, intended to improve execution time. Specifically, we compare the relative performance of the Intel MKL, a CPU based OpenCV distribution, and GPU-based version. Performance is compared between distributions both on the same compute platform and between a fully-featured computing workstation and the NVIDIA Jetson TX1 platform. Comparisons are drawn in terms of both execution time and power consumption. We have found that substituting the Fast Fourier Transform operation from OpenCV provides a marked improvement on all platforms. In addition, we show that embedded platforms offer some possibility for extensive improvement in terms of efficiency compared to a fully featured workstation.
Interhemispheric Resource Sharing: Decreasing Benefits with Increasing Processing Efficiency
ERIC Educational Resources Information Center
Maertens, M.; Pollmann, S.
2005-01-01
Visual matches are sometimes faster when stimuli are presented across visual hemifields, compared to within-field matching. Using a cued geometric figure matching task, we investigated the influence of computational complexity vs. processing efficiency on this bilateral distribution advantage (BDA). Computational complexity was manipulated by…
Secure Multiparty Quantum Computation for Summation and Multiplication.
Shi, Run-hua; Mu, Yi; Zhong, Hong; Cui, Jie; Zhang, Shun
2016-01-21
As a fundamental primitive, Secure Multiparty Summation and Multiplication can be used to build complex secure protocols for other multiparty computations, specially, numerical computations. However, there is still lack of systematical and efficient quantum methods to compute Secure Multiparty Summation and Multiplication. In this paper, we present a novel and efficient quantum approach to securely compute the summation and multiplication of multiparty private inputs, respectively. Compared to classical solutions, our proposed approach can ensure the unconditional security and the perfect privacy protection based on the physical principle of quantum mechanics.
Secure Multiparty Quantum Computation for Summation and Multiplication
Shi, Run-hua; Mu, Yi; Zhong, Hong; Cui, Jie; Zhang, Shun
2016-01-01
As a fundamental primitive, Secure Multiparty Summation and Multiplication can be used to build complex secure protocols for other multiparty computations, specially, numerical computations. However, there is still lack of systematical and efficient quantum methods to compute Secure Multiparty Summation and Multiplication. In this paper, we present a novel and efficient quantum approach to securely compute the summation and multiplication of multiparty private inputs, respectively. Compared to classical solutions, our proposed approach can ensure the unconditional security and the perfect privacy protection based on the physical principle of quantum mechanics. PMID:26792197
Prakash, Jaya; Yalavarthy, Phaneendra K
2013-03-01
Developing a computationally efficient automated method for the optimal choice of regularization parameter in diffuse optical tomography. The least-squares QR (LSQR)-type method that uses Lanczos bidiagonalization is known to be computationally efficient in performing the reconstruction procedure in diffuse optical tomography. The same is effectively deployed via an optimization procedure that uses the simplex method to find the optimal regularization parameter. The proposed LSQR-type method is compared with the traditional methods such as L-curve, generalized cross-validation (GCV), and recently proposed minimal residual method (MRM)-based choice of regularization parameter using numerical and experimental phantom data. The results indicate that the proposed LSQR-type and MRM-based methods performance in terms of reconstructed image quality is similar and superior compared to L-curve and GCV-based methods. The proposed method computational complexity is at least five times lower compared to MRM-based method, making it an optimal technique. The LSQR-type method was able to overcome the inherent limitation of computationally expensive nature of MRM-based automated way finding the optimal regularization parameter in diffuse optical tomographic imaging, making this method more suitable to be deployed in real-time.
Comparing Server Energy Use and Efficiency Using Small Sample Sizes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coles, Henry C.; Qin, Yong; Price, Phillip N.
This report documents a demonstration that compared the energy consumption and efficiency of a limited sample size of server-type IT equipment from different manufacturers by measuring power at the server power supply power cords. The results are specific to the equipment and methods used. However, it is hoped that those responsible for IT equipment selection can used the methods described to choose models that optimize energy use efficiency. The demonstration was conducted in a data center at Lawrence Berkeley National Laboratory in Berkeley, California. It was performed with five servers of similar mechanical and electronic specifications; three from Intel andmore » one each from Dell and Supermicro. Server IT equipment is constructed using commodity components, server manufacturer-designed assemblies, and control systems. Server compute efficiency is constrained by the commodity component specifications and integration requirements. The design freedom, outside of the commodity component constraints, provides room for the manufacturer to offer a product with competitive efficiency that meets market needs at a compelling price. A goal of the demonstration was to compare and quantify the server efficiency for three different brands. The efficiency is defined as the average compute rate (computations per unit of time) divided by the average energy consumption rate. The research team used an industry standard benchmark software package to provide a repeatable software load to obtain the compute rate and provide a variety of power consumption levels. Energy use when the servers were in an idle state (not providing computing work) were also measured. At high server compute loads, all brands, using the same key components (processors and memory), had similar results; therefore, from these results, it could not be concluded that one brand is more efficient than the other brands. The test results show that the power consumption variability caused by the key components as a group is similar to all other components as a group. However, some differences were observed. The Supermicro server used 27 percent more power at idle compared to the other brands. The Intel server had a power supply control feature called cold redundancy, and the data suggest that cold redundancy can provide energy savings at low power levels. Test and evaluation methods that might be used by others having limited resources for IT equipment evaluation are explained in the report.« less
NASA Technical Reports Server (NTRS)
Kwak, Dochan; Kiris, C.; Smith, Charles A. (Technical Monitor)
1998-01-01
Performance of the two commonly used numerical procedures, one based on artificial compressibility method and the other pressure projection method, are compared. These formulations are selected primarily because they are designed for three-dimensional applications. The computational procedures are compared by obtaining steady state solutions of a wake vortex and unsteady solutions of a curved duct flow. For steady computations, artificial compressibility was very efficient in terms of computing time and robustness. For an unsteady flow which requires small physical time step, pressure projection method was found to be computationally more efficient than an artificial compressibility method. This comparison is intended to give some basis for selecting a method or a flow solution code for large three-dimensional applications where computing resources become a critical issue.
Time and Learning Efficiency in Internet-Based Learning: A Systematic Review and Meta-Analysis
ERIC Educational Resources Information Center
Cook, David A.; Levinson, Anthony J.; Garside, Sarah
2010-01-01
Authors have claimed that Internet-based instruction promotes greater learning efficiency than non-computer methods. Objectives Determine, through a systematic synthesis of evidence in health professions education, how Internet-based instruction compares with non-computer instruction in time spent learning, and what features of Internet-based…
ERIC Educational Resources Information Center
Anglin, Linda; Anglin, Kenneth; Schumann, Paul L.; Kaliski, John A.
2008-01-01
This study tests the use of computer-assisted grading rubrics compared to other grading methods with respect to the efficiency and effectiveness of different grading processes for subjective assignments. The test was performed on a large Introduction to Business course. The students in this course were randomly assigned to four treatment groups…
Gradient gravitational search: An efficient metaheuristic algorithm for global optimization.
Dash, Tirtharaj; Sahu, Prabhat K
2015-05-30
The adaptation of novel techniques developed in the field of computational chemistry to solve the concerned problems for large and flexible molecules is taking the center stage with regard to efficient algorithm, computational cost and accuracy. In this article, the gradient-based gravitational search (GGS) algorithm, using analytical gradients for a fast minimization to the next local minimum has been reported. Its efficiency as metaheuristic approach has also been compared with Gradient Tabu Search and others like: Gravitational Search, Cuckoo Search, and Back Tracking Search algorithms for global optimization. Moreover, the GGS approach has also been applied to computational chemistry problems for finding the minimal value potential energy of two-dimensional and three-dimensional off-lattice protein models. The simulation results reveal the relative stability and physical accuracy of protein models with efficient computational cost. © 2015 Wiley Periodicals, Inc.
Solving groundwater flow problems by conjugate-gradient methods and the strongly implicit procedure
Hill, Mary C.
1990-01-01
The performance of the preconditioned conjugate-gradient method with three preconditioners is compared with the strongly implicit procedure (SIP) using a scalar computer. The preconditioners considered are the incomplete Cholesky (ICCG) and the modified incomplete Cholesky (MICCG), which require the same computer storage as SIP as programmed for a problem with a symmetric matrix, and a polynomial preconditioner (POLCG), which requires less computer storage than SIP. Although POLCG is usually used on vector computers, it is included here because of its small storage requirements. In this paper, published comparisons of the solvers are evaluated, all four solvers are compared for the first time, and new test cases are presented to provide a more complete basis by which the solvers can be judged for typical groundwater flow problems. Based on nine test cases, the following conclusions are reached: (1) SIP is actually as efficient as ICCG for some of the published, linear, two-dimensional test cases that were reportedly solved much more efficiently by ICCG; (2) SIP is more efficient than other published comparisons would indicate when common convergence criteria are used; and (3) for problems that are three-dimensional, nonlinear, or both, and for which common convergence criteria are used, SIP is often more efficient than ICCG, and is sometimes more efficient than MICCG.
Fuel Injector Design Optimization for an Annular Scramjet Geometry
NASA Technical Reports Server (NTRS)
Steffen, Christopher J., Jr.
2003-01-01
A four-parameter, three-level, central composite experiment design has been used to optimize the configuration of an annular scramjet injector geometry using computational fluid dynamics. The computational fluid dynamic solutions played the role of computer experiments, and response surface methodology was used to capture the simulation results for mixing efficiency and total pressure recovery within the scramjet flowpath. An optimization procedure, based upon the response surface results of mixing efficiency, was used to compare the optimal design configuration against the target efficiency value of 92.5%. The results of three different optimization procedures are presented and all point to the need to look outside the current design space for different injector geometries that can meet or exceed the stated mixing efficiency target.
Automated Measurement of Patient-Specific Tibial Slopes from MRI
Amerinatanzi, Amirhesam; Summers, Rodney K.; Ahmadi, Kaveh; Goel, Vijay K.; Hewett, Timothy E.; Nyman, Edward
2017-01-01
Background: Multi-planar proximal tibial slopes may be associated with increased likelihood of osteoarthritis and anterior cruciate ligament injury, due in part to their role in checking the anterior-posterior stability of the knee. Established methods suffer repeatability limitations and lack computational efficiency for intuitive clinical adoption. The aims of this study were to develop a novel automated approach and to compare the repeatability and computational efficiency of the approach against previously established methods. Methods: Tibial slope geometries were obtained via MRI and measured using an automated Matlab-based approach. Data were compared for repeatability and evaluated for computational efficiency. Results: Mean lateral tibial slope (LTS) for females (7.2°) was greater than for males (1.66°). Mean LTS in the lateral concavity zone was greater for females (7.8° for females, 4.2° for males). Mean medial tibial slope (MTS) for females was greater (9.3° vs. 4.6°). Along the medial concavity zone, female subjects demonstrated greater MTS. Conclusion: The automated method was more repeatable and computationally efficient than previously identified methods and may aid in the clinical assessment of knee injury risk, inform surgical planning, and implant design efforts. PMID:28952547
Development of a Compact Eleven Feed Cryostat for the Patriot 12-m Antenna System
NASA Technical Reports Server (NTRS)
Beaudoin, Christopher; Kildal, Per-Simon; Yang, Jian; Pantaleev, Miroslav
2010-01-01
The Eleven antenna has constant beam width, constant phase center location, and low spillover over a decade bandwidth. Therefore, it can feed a reflector for high aperture efficiency (also called feed efficiency). It is equally important that the feed efficiency and its subefficiencies not be degraded significantly by installing the feed in a cryostat. The MIT Haystack Observatory, with guidance from Onsala Space Observatory and Chalmers University, has been working to integrate the Eleven antenna into a compact cryostat suitable for the Patriot 12-m antenna. Since the analysis of the feed efficiencies in this presentation is purely computational, we first demonstrate the validity of the computed results by comparing them to measurements. Subsequently, we analyze the dependence of the cryostat size on the feed efficiencies, and, lastly, the Patriot 12-m subreflector is incorporated into the computational model to assess the overall broadband efficiency of the antenna system.
USSR Report: Cybernetics, Computers and Automation Technology. No. 69.
1983-05-06
computers in multiprocessor and multistation design , control and scientific research automation systems. The results of comparing the efficiency of...Podvizhnaya, Scientific Research Institute of Control Computers, Severodonetsk] [Text] The most significant change in the design of the SM-2M compared to...UPRAVLYAYUSHCHIYE SISTEMY I MASHINY, Nov-Dec 82) 95 APPLICATIONS Kiev Automated Control System, Design Features and Prospects for Development (V. A
A hydrological emulator for global applications - HE v1.0.0
NASA Astrophysics Data System (ADS)
Liu, Yaling; Hejazi, Mohamad; Li, Hongyi; Zhang, Xuesong; Leng, Guoyong
2018-03-01
While global hydrological models (GHMs) are very useful in exploring water resources and interactions between the Earth and human systems, their use often requires numerous model inputs, complex model calibration, and high computation costs. To overcome these challenges, we construct an efficient open-source and ready-to-use hydrological emulator (HE) that can mimic complex GHMs at a range of spatial scales (e.g., basin, region, globe). More specifically, we construct both a lumped and a distributed scheme of the HE based on the monthly abcd model to explore the tradeoff between computational cost and model fidelity. Model predictability and computational efficiency are evaluated in simulating global runoff from 1971 to 2010 with both the lumped and distributed schemes. The results are compared against the runoff product from the widely used Variable Infiltration Capacity (VIC) model. Our evaluation indicates that the lumped and distributed schemes present comparable results regarding annual total quantity, spatial pattern, and temporal variation of the major water fluxes (e.g., total runoff, evapotranspiration) across the global 235 basins (e.g., correlation coefficient r between the annual total runoff from either of these two schemes and the VIC is > 0.96), except for several cold (e.g., Arctic, interior Tibet), dry (e.g., North Africa) and mountainous (e.g., Argentina) regions. Compared against the monthly total runoff product from the VIC (aggregated from daily runoff), the global mean Kling-Gupta efficiencies are 0.75 and 0.79 for the lumped and distributed schemes, respectively, with the distributed scheme better capturing spatial heterogeneity. Notably, the computation efficiency of the lumped scheme is 2 orders of magnitude higher than the distributed one and 7 orders more efficient than the VIC model. A case study of uncertainty analysis for the world's 16 basins with top annual streamflow is conducted using 100 000 model simulations, and it demonstrates the lumped scheme's extraordinary advantage in computational efficiency. Our results suggest that the revised lumped abcd model can serve as an efficient and reasonable HE for complex GHMs and is suitable for broad practical use, and the distributed scheme is also an efficient alternative if spatial heterogeneity is of more interest.
Printed Multi-Turn Loop Antenna for RF Bio-Telemetry
NASA Technical Reports Server (NTRS)
Simons, Rainee N.; Hall, David G.; Miranda, Felix A.
2004-01-01
In this paper, a novel printed multi-turn loop antenna for contact-less powering and RF telemetry from implantable bio- MEMS sensors at a design frequency of 300 MHz is demonstrated. In addition, computed values of input reactance, radiation resistance, skin effect resistance, and radiation efficiency for the printed multi-turn loop antenna are presented. The computed input reactance is compared with the measured values and shown to be in fair agreement. The computed radiation efficiency at the design frequency is about 24 percent.
Computation of unsteady transonic aerodynamics with steady state fixed by truncation error injection
NASA Technical Reports Server (NTRS)
Fung, K.-Y.; Fu, J.-K.
1985-01-01
A novel technique is introduced for efficient computations of unsteady transonic aerodynamics. The steady flow corresponding to body shape is maintained by truncation error injection while the perturbed unsteady flows corresponding to unsteady body motions are being computed. This allows the use of different grids comparable to the characteristic length scales of the steady and unsteady flows and, hence, allows efficient computation of the unsteady perturbations. An example of typical unsteady computation of flow over a supercritical airfoil shows that substantial savings in computation time and storage without loss of solution accuracy can easily be achieved. This technique is easy to apply and requires very few changes to existing codes.
A novel patient-specific model to compute coronary fractional flow reserve.
Kwon, Soon-Sung; Chung, Eui-Chul; Park, Jin-Seo; Kim, Gook-Tae; Kim, Jun-Woo; Kim, Keun-Hong; Shin, Eun-Seok; Shim, Eun Bo
2014-09-01
The fractional flow reserve (FFR) is a widely used clinical index to evaluate the functional severity of coronary stenosis. A computer simulation method based on patients' computed tomography (CT) data is a plausible non-invasive approach for computing the FFR. This method can provide a detailed solution for the stenosed coronary hemodynamics by coupling computational fluid dynamics (CFD) with the lumped parameter model (LPM) of the cardiovascular system. In this work, we have implemented a simple computational method to compute the FFR. As this method uses only coronary arteries for the CFD model and includes only the LPM of the coronary vascular system, it provides simpler boundary conditions for the coronary geometry and is computationally more efficient than existing approaches. To test the efficacy of this method, we simulated a three-dimensional straight vessel using CFD coupled with the LPM. The computed results were compared with those of the LPM. To validate this method in terms of clinically realistic geometry, a patient-specific model of stenosed coronary arteries was constructed from CT images, and the computed FFR was compared with clinically measured results. We evaluated the effect of a model aorta on the computed FFR and compared this with a model without the aorta. Computationally, the model without the aorta was more efficient than that with the aorta, reducing the CPU time required for computing a cardiac cycle to 43.4%. Copyright © 2014. Published by Elsevier Ltd.
Monte Carlo simulation of Ising models by multispin coding on a vector computer
NASA Astrophysics Data System (ADS)
Wansleben, Stephan; Zabolitzky, John G.; Kalle, Claus
1984-11-01
Rebbi's efficient multispin coding algorithm for Ising models is combined with the use of the vector computer CDC Cyber 205. A speed of 21.2 million updates per second is reached. This is comparable to that obtained by special- purpose computers.
Gradient-free MCMC methods for dynamic causal modelling
Sengupta, Biswa; Friston, Karl J.; Penny, Will D.
2015-03-14
Here, we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density -- albeit at almost 1000% increase in computational time, in comparisonmore » to the most efficient algorithm (i.e., the adaptive MCMC sampler).« less
STAR adaptation of QR algorithm. [program for solving over-determined systems of linear equations
NASA Technical Reports Server (NTRS)
Shah, S. N.
1981-01-01
The QR algorithm used on a serial computer and executed on the Control Data Corporation 6000 Computer was adapted to execute efficiently on the Control Data STAR-100 computer. How the scalar program was adapted for the STAR-100 and why these adaptations yielded an efficient STAR program is described. Program listings of the old scalar version and the vectorized SL/1 version are presented in the appendices. Execution times for the two versions applied to the same system of linear equations, are compared.
An efficient and accurate 3D displacements tracking strategy for digital volume correlation
NASA Astrophysics Data System (ADS)
Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles
2014-07-01
Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
Kaya, Mine; Hajimirza, Shima
2018-05-25
This paper uses surrogate modeling for very fast design of thin film solar cells with improved solar-to-electricity conversion efficiency. We demonstrate that the wavelength-specific optical absorptivity of a thin film multi-layered amorphous-silicon-based solar cell can be modeled accurately with Neural Networks and can be efficiently approximated as a function of cell geometry and wavelength. Consequently, the external quantum efficiency can be computed by averaging surrogate absorption and carrier recombination contributions over the entire irradiance spectrum in an efficient way. Using this framework, we optimize a multi-layer structure consisting of ITO front coating, metallic back-reflector and oxide layers for achieving maximum efficiency. Our required computation time for an entire model fitting and optimization is 5 to 20 times less than the best previous optimization results based on direct Finite Difference Time Domain (FDTD) simulations, therefore proving the value of surrogate modeling. The resulting optimization solution suggests at least 50% improvement in the external quantum efficiency compared to bare silicon, and 25% improvement compared to a random design.
Improvements in the efficiency of turboexpanders in cryogenic applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agahi, R.R.; Lin, M.C.; Ershaghi, B.
1996-12-31
Process designers have utilized turboexpanders in cryogenic processes because of their higher thermal efficiencies when compared with conventional refrigeration cycles. Process design and equipment performance have improved substantially through the utilization of modern technologies. Turboexpander manufacturers have also adopted Computational Fluid Dynamic Software, Computer Numerical Control Technology and Holography Techniques to further improve an already impressive turboexpander efficiency performance. In this paper, the authors explain the design process of the turboexpander utilizing modern technology. Two cases of turboexpanders processing helium (4.35{degrees}K) and hydrogen (56{degrees}K) will be presented.
ERIC Educational Resources Information Center
Agasisti, Tommaso; Johnes, Geraint
2009-01-01
We employ Data Envelopment Analysis to compute the technical efficiency of Italian and English higher education institutions. Our results show that, in relation to the country-specific frontier, institutions in both countries are typically very efficient. However, institutions in England are more efficient than those in Italy when we compare…
Efficient path-based computations on pedigree graphs with compact encodings
2012-01-01
A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. PMID:22536898
An effective and secure key-management scheme for hierarchical access control in E-medicine system.
Odelu, Vanga; Das, Ashok Kumar; Goswami, Adrijit
2013-04-01
Recently several hierarchical access control schemes are proposed in the literature to provide security of e-medicine systems. However, most of them are either insecure against 'man-in-the-middle attack' or they require high storage and computational overheads. Wu and Chen proposed a key management method to solve dynamic access control problems in a user hierarchy based on hybrid cryptosystem. Though their scheme improves computational efficiency over Nikooghadam et al.'s approach, it suffers from large storage space for public parameters in public domain and computational inefficiency due to costly elliptic curve point multiplication. Recently, Nikooghadam and Zakerolhosseini showed that Wu-Chen's scheme is vulnerable to man-in-the-middle attack. In order to remedy this security weakness in Wu-Chen's scheme, they proposed a secure scheme which is again based on ECC (elliptic curve cryptography) and efficient one-way hash function. However, their scheme incurs huge computational cost for providing verification of public information in the public domain as their scheme uses ECC digital signature which is costly when compared to symmetric-key cryptosystem. In this paper, we propose an effective access control scheme in user hierarchy which is only based on symmetric-key cryptosystem and efficient one-way hash function. We show that our scheme reduces significantly the storage space for both public and private domains, and computational complexity when compared to Wu-Chen's scheme, Nikooghadam-Zakerolhosseini's scheme, and other related schemes. Through the informal and formal security analysis, we further show that our scheme is secure against different attacks and also man-in-the-middle attack. Moreover, dynamic access control problems in our scheme are also solved efficiently compared to other related schemes, making our scheme is much suitable for practical applications of e-medicine systems.
Computationally efficient algorithm for Gaussian Process regression in case of structured samples
NASA Astrophysics Data System (ADS)
Belyaev, M.; Burnaev, E.; Kapushev, Y.
2016-04-01
Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Enhancing battery efficiency for pervasive health-monitoring systems based on electronic textiles.
Zheng, Nenggan; Wu, Zhaohui; Lin, Man; Yang, Laurence Tianruo
2010-03-01
Electronic textiles are regarded as one of the most important computation platforms for future computer-assisted health-monitoring applications. In these novel systems, multiple batteries are used in order to prolong their operational lifetime, which is a significant metric for system usability. However, due to the nonlinear features of batteries, computing systems with multiple batteries cannot achieve the same battery efficiency as those powered by a monolithic battery of equal capacity. In this paper, we propose an algorithm aiming to maximize battery efficiency globally for the computer-assisted health-care systems with multiple batteries. Based on an accurate analytical battery model, the concept of weighted battery fatigue degree is introduced and the novel battery-scheduling algorithm called predicted weighted fatigue degree least first (PWFDLF) is developed. Besides, we also discuss our attempts during search PWFDLF: a weighted round-robin (WRR) and a greedy algorithm achieving highest local battery efficiency, which reduces to the sequential discharging policy. Evaluation results show that a considerable improvement in battery efficiency can be obtained by PWFDLF under various battery configurations and current profiles compared to conventional sequential and WRR discharging policies.
A hydrological emulator for global applications – HE v1.0.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yaling; Hejazi, Mohamad; Li, Hongyi
While global hydrological models (GHMs) are very useful in exploring water resources and interactions between the Earth and human systems, their use often requires numerous model inputs, complex model calibration, and high computation costs. To overcome these challenges, we construct an efficient open-source and ready-to-use hydrological emulator (HE) that can mimic complex GHMs at a range of spatial scales (e.g., basin, region, globe). More specifically, we construct both a lumped and a distributed scheme of the HE based on the monthly abcd model to explore the tradeoff between computational cost and model fidelity. Model predictability and computational efficiency are evaluatedmore » in simulating global runoff from 1971 to 2010 with both the lumped and distributed schemes. The results are compared against the runoff product from the widely used Variable Infiltration Capacity (VIC) model. Our evaluation indicates that the lumped and distributed schemes present comparable results regarding annual total quantity, spatial pattern, and temporal variation of the major water fluxes (e.g., total runoff, evapotranspiration) across the global 235 basins (e.g., correlation coefficient r between the annual total runoff from either of these two schemes and the VIC is > 0.96), except for several cold (e.g., Arctic, interior Tibet), dry (e.g., North Africa) and mountainous (e.g., Argentina) regions. Compared against the monthly total runoff product from the VIC (aggregated from daily runoff), the global mean Kling–Gupta efficiencies are 0.75 and 0.79 for the lumped and distributed schemes, respectively, with the distributed scheme better capturing spatial heterogeneity. Notably, the computation efficiency of the lumped scheme is 2 orders of magnitude higher than the distributed one and 7 orders more efficient than the VIC model. A case study of uncertainty analysis for the world's 16 basins with top annual streamflow is conducted using 100 000 model simulations, and it demonstrates the lumped scheme's extraordinary advantage in computational efficiency. Lastly, our results suggest that the revised lumped abcd model can serve as an efficient and reasonable HE for complex GHMs and is suitable for broad practical use, and the distributed scheme is also an efficient alternative if spatial heterogeneity is of more interest.« less
Preliminary study of the use of the STAR-100 computer for transonic flow calculations
NASA Technical Reports Server (NTRS)
Keller, J. D.; Jameson, A.
1977-01-01
An explicit method for solving the transonic small-disturbance potential equation is presented. This algorithm, which is suitable for the new vector-processor computers such as the CDC STAR-100, is compared to successive line over-relaxation (SLOR) on a simple test problem. The convergence rate of the explicit scheme is slower than that of SLOR, however, the efficiency of the explicit scheme on the STAR-100 computer is sufficient to overcome the slower convergence rate and allow an overall speedup compared to SLOR on the CYBER 175 computer.
Computer-based learning: interleaving whole and sectional representation of neuroanatomy.
Pani, John R; Chariker, Julia H; Naaz, Farah
2013-01-01
The large volume of material to be learned in biomedical disciplines requires optimizing the efficiency of instruction. In prior work with computer-based instruction of neuroanatomy, it was relatively efficient for learners to master whole anatomy and then transfer to learning sectional anatomy. It may, however, be more efficient to continuously integrate learning of whole and sectional anatomy. A study of computer-based learning of neuroanatomy was conducted to compare a basic transfer paradigm for learning whole and sectional neuroanatomy with a method in which the two forms of representation were interleaved (alternated). For all experimental groups, interactive computer programs supported an approach to instruction called adaptive exploration. Each learning trial consisted of time-limited exploration of neuroanatomy, self-timed testing, and graphical feedback. The primary result of this study was that interleaved learning of whole and sectional neuroanatomy was more efficient than the basic transfer method, without cost to long-term retention or generalization of knowledge to recognizing new images (Visible Human and MRI). Copyright © 2012 American Association of Anatomists.
Computer-Based Learning: Interleaving Whole and Sectional Representation of Neuroanatomy
Pani, John R.; Chariker, Julia H.; Naaz, Farah
2015-01-01
The large volume of material to be learned in biomedical disciplines requires optimizing the efficiency of instruction. In prior work with computer-based instruction of neuroanatomy, it was relatively efficient for learners to master whole anatomy and then transfer to learning sectional anatomy. It may, however, be more efficient to continuously integrate learning of whole and sectional anatomy. A study of computer-based learning of neuroanatomy was conducted to compare a basic transfer paradigm for learning whole and sectional neuroanatomy with a method in which the two forms of representation were interleaved (alternated). For all experimental groups, interactive computer programs supported an approach to instruction called adaptive exploration. Each learning trial consisted of time-limited exploration of neuroanatomy, self-timed testing, and graphical feedback. The primary result of this study was that interleaved learning of whole and sectional neuroanatomy was more efficient than the basic transfer method, without cost to long-term retention or generalization of knowledge to recognizing new images (Visible Human and MRI). PMID:22761001
NASA Astrophysics Data System (ADS)
Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng
2018-04-01
In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.
Gradient-free MCMC methods for dynamic causal modelling.
Sengupta, Biswa; Friston, Karl J; Penny, Will D
2015-05-15
In this technical note we compare the performance of four gradient-free MCMC samplers (random walk Metropolis sampling, slice-sampling, adaptive MCMC sampling and population-based MCMC sampling with tempering) in terms of the number of independent samples they can produce per unit computational time. For the Bayesian inversion of a single-node neural mass model, both adaptive and population-based samplers are more efficient compared with random walk Metropolis sampler or slice-sampling; yet adaptive MCMC sampling is more promising in terms of compute time. Slice-sampling yields the highest number of independent samples from the target density - albeit at almost 1000% increase in computational time, in comparison to the most efficient algorithm (i.e., the adaptive MCMC sampler). Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
A Computationally Efficient Method for Polyphonic Pitch Estimation
NASA Astrophysics Data System (ADS)
Zhou, Ruohua; Reiss, Joshua D.; Mattavelli, Marco; Zoia, Giorgio
2009-12-01
This paper presents a computationally efficient method for polyphonic pitch estimation. The method employs the Fast Resonator Time-Frequency Image (RTFI) as the basic time-frequency analysis tool. The approach is composed of two main stages. First, a preliminary pitch estimation is obtained by means of a simple peak-picking procedure in the pitch energy spectrum. Such spectrum is calculated from the original RTFI energy spectrum according to harmonic grouping principles. Then the incorrect estimations are removed according to spectral irregularity and knowledge of the harmonic structures of the music notes played on commonly used music instruments. The new approach is compared with a variety of other frame-based polyphonic pitch estimation methods, and results demonstrate the high performance and computational efficiency of the approach.
Propulsive efficiency of the underwater dolphin kick in humans.
von Loebbecke, Alfred; Mittal, Rajat; Fish, Frank; Mark, Russell
2009-05-01
Three-dimensional fully unsteady computational fluid dynamic simulations of five Olympic-level swimmers performing the underwater dolphin kick are used to estimate the swimmer's propulsive efficiencies. These estimates are compared with those of a cetacean performing the dolphin kick. The geometries of the swimmers and the cetacean are based on laser and CT scans, respectively, and the stroke kinematics is based on underwater video footage. The simulations indicate that the propulsive efficiency for human swimmers varies over a relatively wide range from about 11% to 29%. The efficiency of the cetacean is found to be about 56%, which is significantly higher than the human swimmers. The computed efficiency is found not to correlate with either the slender body theory or with the Strouhal number.
Higher-order adaptive finite-element methods for Kohn–Sham density functional theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Motamarri, P.; Nowak, M.R.; Leiter, K.
2013-11-15
We present an efficient computational approach to perform real-space electronic structure calculations using an adaptive higher-order finite-element discretization of Kohn–Sham density-functional theory (DFT). To this end, we develop an a priori mesh-adaption technique to construct a close to optimal finite-element discretization of the problem. We further propose an efficient solution strategy for solving the discrete eigenvalue problem by using spectral finite-elements in conjunction with Gauss–Lobatto quadrature, and a Chebyshev acceleration technique for computing the occupied eigenspace. The proposed approach has been observed to provide a staggering 100–200-fold computational advantage over the solution of a generalized eigenvalue problem. Using the proposedmore » solution procedure, we investigate the computational efficiency afforded by higher-order finite-element discretizations of the Kohn–Sham DFT problem. Our studies suggest that staggering computational savings—of the order of 1000-fold—relative to linear finite-elements can be realized, for both all-electron and local pseudopotential calculations, by using higher-order finite-element discretizations. On all the benchmark systems studied, we observe diminishing returns in computational savings beyond the sixth-order for accuracies commensurate with chemical accuracy, suggesting that the hexic spectral-element may be an optimal choice for the finite-element discretization of the Kohn–Sham DFT problem. A comparative study of the computational efficiency of the proposed higher-order finite-element discretizations suggests that the performance of finite-element basis is competing with the plane-wave discretization for non-periodic local pseudopotential calculations, and compares to the Gaussian basis for all-electron calculations to within an order of magnitude. Further, we demonstrate the capability of the proposed approach to compute the electronic structure of a metallic system containing 1688 atoms using modest computational resources, and good scalability of the present implementation up to 192 processors.« less
An efficient graph theory based method to identify every minimal reaction set in a metabolic network
2014-01-01
Background Development of cells with minimal metabolic functionality is gaining importance due to their efficiency in producing chemicals and fuels. Existing computational methods to identify minimal reaction sets in metabolic networks are computationally expensive. Further, they identify only one of the several possible minimal reaction sets. Results In this paper, we propose an efficient graph theory based recursive optimization approach to identify all minimal reaction sets. Graph theoretical insights offer systematic methods to not only reduce the number of variables in math programming and increase its computational efficiency, but also provide efficient ways to find multiple optimal solutions. The efficacy of the proposed approach is demonstrated using case studies from Escherichia coli and Saccharomyces cerevisiae. In case study 1, the proposed method identified three minimal reaction sets each containing 38 reactions in Escherichia coli central metabolic network with 77 reactions. Analysis of these three minimal reaction sets revealed that one of them is more suitable for developing minimal metabolism cell compared to other two due to practically achievable internal flux distribution. In case study 2, the proposed method identified 256 minimal reaction sets from the Saccharomyces cerevisiae genome scale metabolic network with 620 reactions. The proposed method required only 4.5 hours to identify all the 256 minimal reaction sets and has shown a significant reduction (approximately 80%) in the solution time when compared to the existing methods for finding minimal reaction set. Conclusions Identification of all minimal reactions sets in metabolic networks is essential since different minimal reaction sets have different properties that effect the bioprocess development. The proposed method correctly identified all minimal reaction sets in a both the case studies. The proposed method is computationally efficient compared to other methods for finding minimal reaction sets and useful to employ with genome-scale metabolic networks. PMID:24594118
Novel scheme to compute chemical potentials of chain molecules on a lattice
NASA Astrophysics Data System (ADS)
Mooij, G. C. A. M.; Frenkel, D.
We present a novel method that allows efficient computation of the total number of allowed conformations of a chain molecule in a dense phase. Using this method, it is possible to estimate the chemical potential of such a chain molecule. We have tested the present method in simulations of a two-dimensional monolayer of chain molecules on a lattice (Whittington-Chapman model) and compared it with existing schemes to compute the chemical potential. We find that the present approach is two to three orders of magnitude faster than the most efficient of the existing methods.
NASA Technical Reports Server (NTRS)
Chaderjian, Neal M.
1991-01-01
Computations from two Navier-Stokes codes, NSS and F3D, are presented for a tangent-ogive-cylinder body at high angle of attack. Features of this steady flow include a pair of primary vortices on the leeward side of the body as well as secondary vortices. The topological and physical plausibility of this vortical structure is discussed. The accuracy of these codes are assessed by comparison of the numerical solutions with experimental data. The effects of turbulence model, numerical dissipation, and grid refinement are presented. The overall efficiency of these codes are also assessed by examining their convergence rates, computational time per time step, and maximum allowable time step for time-accurate computations. Overall, the numerical results from both codes compared equally well with experimental data, however, the NSS code was found to be significantly more efficient than the F3D code.
An efficient method for hybrid density functional calculation with spin-orbit coupling
NASA Astrophysics Data System (ADS)
Wang, Maoyuan; Liu, Gui-Bin; Guo, Hong; Yao, Yugui
2018-03-01
In first-principles calculations, hybrid functional is often used to improve accuracy from local exchange correlation functionals. A drawback is that evaluating the hybrid functional needs significantly more computing effort. When spin-orbit coupling (SOC) is taken into account, the non-collinear spin structure increases computing effort by at least eight times. As a result, hybrid functional calculations with SOC are intractable in most cases. In this paper, we present an approximate solution to this problem by developing an efficient method based on a mixed linear combination of atomic orbital (LCAO) scheme. We demonstrate the power of this method using several examples and we show that the results compare very well with those of direct hybrid functional calculations with SOC, yet the method only requires a computing effort similar to that without SOC. The presented technique provides a good balance between computing efficiency and accuracy, and it can be extended to magnetic materials.
Patwary, Nurmohammed; Preza, Chrysanthe
2015-01-01
A depth-variant (DV) image restoration algorithm for wide field fluorescence microscopy, using an orthonormal basis decomposition of DV point-spread functions (PSFs), is investigated in this study. The efficient PSF representation is based on a previously developed principal component analysis (PCA), which is computationally intensive. We present an approach developed to reduce the number of DV PSFs required for the PCA computation, thereby making the PCA-based approach computationally tractable for thick samples. Restoration results from both synthetic and experimental images show consistency and that the proposed algorithm addresses efficiently depth-induced aberration using a small number of principal components. Comparison of the PCA-based algorithm with a previously-developed strata-based DV restoration algorithm demonstrates that the proposed method improves performance by 50% in terms of accuracy and simultaneously reduces the processing time by 64% using comparable computational resources. PMID:26504634
A Survey of Methods for Analyzing and Improving GPU Energy Efficiency
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S
2014-01-01
Recent years have witnessed a phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to dramatic increase in their power consumption. This paper surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works which compare energy efficiency of GPUs with other computing systems, e.g. FPGAs and CPUs. The aim of this survey is to provide researchers with knowledge of state-of-the-art in GPU power management and motivate them to architectmore » highly energy-efficient GPUs of tomorrow.« less
Image detection and compression for memory efficient system analysis
NASA Astrophysics Data System (ADS)
Bayraktar, Mustafa
2015-02-01
The advances in digital signal processing have been progressing towards efficient use of memory and processing. Both of these factors can be utilized efficiently by using feasible techniques of image storage by computing the minimum information of image which will enhance computation in later processes. Scale Invariant Feature Transform (SIFT) can be utilized to estimate and retrieve of an image. In computer vision, SIFT can be implemented to recognize the image by comparing its key features from SIFT saved key point descriptors. The main advantage of SIFT is that it doesn't only remove the redundant information from an image but also reduces the key points by matching their orientation and adding them together in different windows of image [1]. Another key property of this approach is that it works on highly contrasted images more efficiently because it`s design is based on collecting key points from the contrast shades of image.
Computational Study of the CC3 Impeller and Vaneless Diffuser Experiment
NASA Technical Reports Server (NTRS)
Kulkarni, Sameer; Beach, Timothy A.; Skoch, Gary J.
2013-01-01
Centrifugal compressors are compatible with the low exit corrected flows found in the high pressure compressor of turboshaft engines and may play an increasing role in turbofan engines as engine overall pressure ratios increase. Centrifugal compressor stages are difficult to model accurately with RANS CFD solvers. A computational study of the CC3 centrifugal impeller in its vaneless diffuser configuration was undertaken as part of an effort to understand potential causes of RANS CFD mis-prediction in these types of geometries. Three steady, periodic cases of the impeller and diffuser were modeled using the TURBO Parallel Version 4 code: 1) a k-epsilon turbulence model computation on a 6.8 million point grid using wall functions, 2) a k-epsilon turbulence model computation on a 14 million point grid integrating to the wall, and 3) a k-omega turbulence model computation on the 14 million point grid integrating to the wall. It was found that all three cases compared favorably to data from inlet to impeller trailing edge, but the k-epsilon and k-omega computations had disparate results beyond the trailing edge and into the vaneless diffuser. A large region of reversed flow was observed in the k-epsilon computations which extended from 70% to 100% span at the exit rating plane, whereas the k-omega computation had reversed flow from 95% to 100% span. Compared to experimental data at near-peak-efficiency, the reversed flow region in the k-epsilon case resulted in an under-prediction in adiabatic efficiency of 8.3 points, whereas the k-omega case was 1.2 points lower in efficiency.
Computational Study of the CC3 Impeller and Vaneless Diffuser Experiment
NASA Technical Reports Server (NTRS)
Kulkarni, Sameer; Beach, Timothy A.; Skoch, Gary J.
2013-01-01
Centrifugal compressors are compatible with the low exit corrected flows found in the high pressure compressor of turboshaft engines and may play an increasing role in turbofan engines as engine overall pressure ratios increase. Centrifugal compressor stages are difficult to model accurately with RANS CFD solvers. A computational study of the CC3 centrifugal impeller in its vaneless diffuser configuration was undertaken as part of an effort to understand potential causes of RANS CFD mis-prediction in these types of geometries. Three steady, periodic cases of the impeller and diffuser were modeled using the TURBO Parallel Version 4 code: (1) a k-e turbulence model computation on a 6.8 million point grid using wall functions, (2) a k-e turbulence model computation on a 14 million point grid integrating to the wall, and (3) a k-? turbulence model computation on the 14 million point grid integrating to the wall. It was found that all three cases compared favorably to data from inlet to impeller trailing edge, but the k-e and k-? computations had disparate results beyond the trailing edge and into the vaneless diffuser. A large region of reversed flow was observed in the k-e computations which extended from 70 to 100 percent span at the exit rating plane, whereas the k-? computation had reversed flow from 95 to 100 percent span. Compared to experimental data at near-peak-efficiency, the reversed flow region in the k-e case resulted in an underprediction in adiabatic efficiency of 8.3 points, whereas the k-? case was 1.2 points lower in efficiency.
Estimating Skin Cancer Risk: Evaluating Mobile Computer-Adaptive Testing.
Djaja, Ngadiman; Janda, Monika; Olsen, Catherine M; Whiteman, David C; Chien, Tsair-Wei
2016-01-22
Response burden is a major detriment to questionnaire completion rates. Computer adaptive testing may offer advantages over non-adaptive testing, including reduction of numbers of items required for precise measurement. Our aim was to compare the efficiency of non-adaptive (NAT) and computer adaptive testing (CAT) facilitated by Partial Credit Model (PCM)-derived calibration to estimate skin cancer risk. We used a random sample from a population-based Australian cohort study of skin cancer risk (N=43,794). All 30 items of the skin cancer risk scale were calibrated with the Rasch PCM. A total of 1000 cases generated following a normal distribution (mean [SD] 0 [1]) were simulated using three Rasch models with three fixed-item (dichotomous, rating scale, and partial credit) scenarios, respectively. We calculated the comparative efficiency and precision of CAT and NAT (shortening of questionnaire length and the count difference number ratio less than 5% using independent t tests). We found that use of CAT led to smaller person standard error of the estimated measure than NAT, with substantially higher efficiency but no loss of precision, reducing response burden by 48%, 66%, and 66% for dichotomous, Rating Scale Model, and PCM models, respectively. CAT-based administrations of the skin cancer risk scale could substantially reduce participant burden without compromising measurement precision. A mobile computer adaptive test was developed to help people efficiently assess their skin cancer risk.
Comparison of a degree-day computer and a recording thermograph in a forest environment.
Boyd E. Wickman
1985-01-01
A field test showed that degree-days accumulated by a miniature computer and a recording thermograph in early spring and summer were comparable. For phenological studies the biophenometer was as accurate and more efficient than the hygrothermograph.
NASA Astrophysics Data System (ADS)
Wang, Yuan; Chen, Zhidong; Sang, Xinzhu; Li, Hui; Zhao, Linmin
2018-03-01
Holographic displays can provide the complete optical wave field of a three-dimensional (3D) scene, including the depth perception. However, it often takes a long computation time to produce traditional computer-generated holograms (CGHs) without more complex and photorealistic rendering. The backward ray-tracing technique is able to render photorealistic high-quality images, which noticeably reduce the computation time achieved from the high-degree parallelism. Here, a high-efficiency photorealistic computer-generated hologram method is presented based on the ray-tracing technique. Rays are parallelly launched and traced under different illuminations and circumstances. Experimental results demonstrate the effectiveness of the proposed method. Compared with the traditional point cloud CGH, the computation time is decreased to 24 s to reconstruct a 3D object of 100 ×100 rays with continuous depth change.
ERIC Educational Resources Information Center
Coleman, Mari Beth; Hurley, Kevin J.; Cihak, David F.
2012-01-01
The purpose of this study was to compare the effectiveness and efficiency of teacher-directed and computer-assisted constant time delay strategies for teaching three students with moderate intellectual disability to read functional sight words. Target words were those found in recipes and were taught via teacher-delivered constant time delay or…
ERIC Educational Resources Information Center
Daniels, Mindy A.
2012-01-01
The purpose of this case study was to compare the pedagogical and affective efficiency and efficacy of creative prose fiction writing workshops taught via asynchronous computer-mediated online distance education with creative prose fiction writing workshops taught face-to-face in order to better understand their operational pedagogy and…
Efficient Geometric Sound Propagation Using Visibility Culling
NASA Astrophysics Data System (ADS)
Chandak, Anish
2011-07-01
Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
An efficient sparse matrix multiplication scheme for the CYBER 205 computer
NASA Technical Reports Server (NTRS)
Lambiotte, Jules J., Jr.
1988-01-01
This paper describes the development of an efficient algorithm for computing the product of a matrix and vector on a CYBER 205 vector computer. The desire to provide software which allows the user to choose between the often conflicting goals of minimizing central processing unit (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of four types of storage is selected for each diagonal. The candidate storage types employed were chosen to be efficient on the CYBER 205 for diagonals which have nonzero structure which is dense, moderately sparse, very sparse and short, or very sparse and long; however, for many densities, no diagonal type is most efficient with respect to both resource requirements, and a trade-off must be made. For each diagonal, an initialization subroutine estimates the CPU time and storage required for each storage type based on results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the two resources. The adjusted resource requirements are then compared to select the most efficient storage and computational scheme.
Teodoro, George; Kurc, Tahsin; Andrade, Guilherme; Kong, Jun; Ferreira, Renato; Saltz, Joel
2015-01-01
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations. PMID:28239253
Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus
2016-05-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
NASA Technical Reports Server (NTRS)
Van Dalsem, W. R.; Steger, J. L.
1985-01-01
A simple and computationally efficient algorithm for solving the unsteady three-dimensional boundary-layer equations in the time-accurate or relaxation mode is presented. Results of the new algorithm are shown to be in quantitative agreement with detailed experimental data for flow over a swept infinite wing. The separated flow over a 6:1 ellipsoid at angle of attack, and the transonic flow over a finite-wing with shock-induced 'mushroom' separation are also computed and compared with available experimental data. It is concluded that complex, separated, three-dimensional viscous layers can be economically and routinely computed using a time-relaxation boundary-layer algorithm.
NASA Astrophysics Data System (ADS)
Singh, Harendra
2018-04-01
The key purpose of this article is to introduce an efficient computational method for the approximate solution of the homogeneous as well as non-homogeneous nonlinear Lane-Emden type equations. Using proposed computational method given nonlinear equation is converted into a set of nonlinear algebraic equations whose solution gives the approximate solution to the Lane-Emden type equation. Various nonlinear cases of Lane-Emden type equations like standard Lane-Emden equation, the isothermal gas spheres equation and white-dwarf equation are discussed. Results are compared with some well-known numerical methods and it is observed that our results are more accurate.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures
Manolakos, Elias S.
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.
Sharma, Anuj; Manolakos, Elias S
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Fast Reduction Method in Dominance-Based Information Systems
NASA Astrophysics Data System (ADS)
Li, Yan; Zhou, Qinghua; Wen, Yongchuan
2018-01-01
In real world applications, there are often some data with continuous values or preference-ordered values. Rough sets based on dominance relations can effectively deal with these kinds of data. Attribute reduction can be done in the framework of dominance-relation based approach to better extract decision rules. However, the computational cost of the dominance classes greatly affects the efficiency of attribute reduction and rule extraction. This paper presents an efficient method of computing dominance classes, and further compares it with traditional method with increasing attributes and samples. Experiments on UCI data sets show that the proposed algorithm obviously improves the efficiency of the traditional method, especially for large-scale data.
Efficient computation of kinship and identity coefficients on large pedigrees.
Cheng, En; Elliott, Brendan; Ozsoyoglu, Z Meral
2009-06-01
With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing inbreeding coefficient. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. Furthermore, we propose an improved scheme using Family NodeCodes for the computation of generalized kinship coefficients, which is motivated by the significant improvement of using Family NodeCodes for inbreeding coefficient over the use of NodeCodes. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
Precise and fast spatial-frequency analysis using the iterative local Fourier transform.
Lee, Sukmock; Choi, Heejoo; Kim, Dae Wook
2016-09-19
The use of the discrete Fourier transform has decreased since the introduction of the fast Fourier transform (fFT), which is a numerically efficient computing process. This paper presents the iterative local Fourier transform (ilFT), a set of new processing algorithms that iteratively apply the discrete Fourier transform within a local and optimal frequency domain. The new technique achieves 210 times higher frequency resolution than the fFT within a comparable computation time. The method's superb computing efficiency, high resolution, spectrum zoom-in capability, and overall performance are evaluated and compared to other advanced high-resolution Fourier transform techniques, such as the fFT combined with several fitting methods. The effectiveness of the ilFT is demonstrated through the data analysis of a set of Talbot self-images (1280 × 1024 pixels) obtained with an experimental setup using grating in a diverging beam produced by a coherent point source.
NASA Technical Reports Server (NTRS)
Lin, Z.; Stamnes, S.; Jin, Z.; Laszlo, I.; Tsay, S. C.; Wiscombe, W. J.; Stamnes, K.
2015-01-01
A successor version 3 of DISORT (DISORT3) is presented with important upgrades that improve the accuracy, efficiency, and stability of the algorithm. Compared with version 2 (DISORT2 released in 2000) these upgrades include (a) a redesigned BRDF computation that improves both speed and accuracy, (b) a revised treatment of the single scattering correction, and (c) additional efficiency and stability upgrades for beam sources. In DISORT3 the BRDF computation is improved in the following three ways: (i) the Fourier decomposition is prepared "off-line", thus avoiding the repeated internal computations done in DISORT2; (ii) a large enough number of terms in the Fourier expansion of the BRDF is employed to guarantee accurate values of the expansion coefficients (default is 200 instead of 50 in DISORT2); (iii) in the post processing step the reflection of the direct attenuated beam from the lower boundary is included resulting in a more accurate single scattering correction. These improvements in the treatment of the BRDF have led to improved accuracy and a several-fold increase in speed. In addition, the stability of beam sources has been improved by removing a singularity occurring when the cosine of the incident beam angle is too close to the reciprocal of any of the eigenvalues. The efficiency for beam sources has been further improved from reducing by a factor of 2 (compared to DISORT2) the dimension of the linear system of equations that must be solved to obtain the particular solutions, and by replacing the LINPAK routines used in DISORT2 by LAPACK 3.5 in DISORT3. These beam source stability and efficiency upgrades bring enhanced stability and an additional 5-7% improvement in speed. Numerical results are provided to demonstrate and quantify the improvements in accuracy and efficiency of DISORT3 compared to DISORT2.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Macías-Díaz, J E; Macías, Siegfried; Medina-Ramírez, I E
2013-12-01
In this manuscript, we present a computational model to approximate the solutions of a partial differential equation which describes the growth dynamics of microbial films. The numerical technique reported in this work is an explicit, nonlinear finite-difference methodology which is computationally implemented using Newton's method. Our scheme is compared numerically against an implicit, linear finite-difference discretization of the same partial differential equation, whose computer coding requires an implementation of the stabilized bi-conjugate gradient method. Our numerical results evince that the nonlinear approach results in a more efficient approximation to the solutions of the biofilm model considered, and demands less computer memory. Moreover, the positivity of initial profiles is preserved in the practice by the nonlinear scheme proposed. Copyright © 2013 Elsevier Ltd. All rights reserved.
Future computing platforms for science in a power constrained era
Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; ...
2015-12-23
Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7 32-bit, ARMv8 64-bit, Many-Core and GPU solutions, as well as newer System-on-Chip (SoC) solutions. We compare performance and energy efficiency using an evolving set of standardized HEP-related benchmarks and power measurement techniques we have been developing. In conclusion, we evaluate the potentialmore » for use of such computing solutions in the context of DHTC systems, such as the Worldwide LHC Computing Grid (WLCG).« less
NASA Astrophysics Data System (ADS)
Jimenez, Edward S.; Goodman, Eric L.; Park, Ryeojin; Orr, Laurel J.; Thompson, Kyle R.
2014-09-01
This paper will investigate energy-efficiency for various real-world industrial computed-tomography reconstruction algorithms, both CPU- and GPU-based implementations. This work shows that the energy required for a given reconstruction is based on performance and problem size. There are many ways to describe performance and energy efficiency, thus this work will investigate multiple metrics including performance-per-watt, energy-delay product, and energy consumption. This work found that irregular GPU-based approaches1 realized tremendous savings in energy consumption when compared to CPU implementations while also significantly improving the performance-per- watt and energy-delay product metrics. Additional energy savings and other metric improvement was realized on the GPU-based reconstructions by improving storage I/O by implementing a parallel MIMD-like modularization of the compute and I/O tasks.
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms. PMID:26473166
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter.
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms.
78 FR 39617 - Data Practices, Computer III Further Remand: BOC Provision of Enhanced Services
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-02
... Docket No. 10-132; FCC 13-69] Data Practices, Computer III Further Remand: BOC Provision of Enhanced... eliminates comparably efficient interconnection (CEI) and open network architecture (ONA) narrowband... disseminates data, including by altering or eliminating collections that are no longer useful or necessary to...
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
Efficient classical simulation of the Deutsch-Jozsa and Simon's algorithms
NASA Astrophysics Data System (ADS)
Johansson, Niklas; Larsson, Jan-Åke
2017-09-01
A long-standing aim of quantum information research is to understand what gives quantum computers their advantage. This requires separating problems that need genuinely quantum resources from those for which classical resources are enough. Two examples of quantum speed-up are the Deutsch-Jozsa and Simon's problem, both efficiently solvable on a quantum Turing machine, and both believed to lack efficient classical solutions. Here we present a framework that can simulate both quantum algorithms efficiently, solving the Deutsch-Jozsa problem with probability 1 using only one oracle query, and Simon's problem using linearly many oracle queries, just as expected of an ideal quantum computer. The presented simulation framework is in turn efficiently simulatable in a classical probabilistic Turing machine. This shows that the Deutsch-Jozsa and Simon's problem do not require any genuinely quantum resources, and that the quantum algorithms show no speed-up when compared with their corresponding classical simulation. Finally, this gives insight into what properties are needed in the two algorithms and calls for further study of oracle separation between quantum and classical computation.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
GPU-accelerated element-free reverse-time migration with Gauss points partition
NASA Astrophysics Data System (ADS)
Zhou, Zhen; Jia, Xiaofeng; Qiang, Xiaodong
2018-06-01
An element-free method (EFM) has been demonstrated successfully in elasticity, heat conduction and fatigue crack growth problems. We present the theory of EFM and its numerical applications in seismic modelling and reverse time migration (RTM). Compared with the finite difference method and the finite element method, the EFM has unique advantages: (1) independence of grids in computation and (2) lower expense and more flexibility (because only the information of the nodes and the boundary of the concerned area is required). However, in EFM, due to improper computation and storage of some large sparse matrices, such as the mass matrix and the stiffness matrix, the method is difficult to apply to seismic modelling and RTM for a large velocity model. To solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition and utilise the graphics processing unit to improve the computational efficiency. We employ the compressed sparse row format to compress the intermediate large sparse matrices and attempt to simplify the operations by solving the linear equations with CULA solver. To improve the computation efficiency further, we introduce the concept of the lumped mass matrix. Numerical experiments indicate that the proposed method is accurate and more efficient than the regular EFM.
McEwan, Phil; Bergenheim, Klas; Yuan, Yong; Tetlow, Anthony P; Gordon, Jason P
2010-01-01
Simulation techniques are well suited to modelling diseases yet can be computationally intensive. This study explores the relationship between modelled effect size, statistical precision, and efficiency gains achieved using variance reduction and an executable programming language. A published simulation model designed to model a population with type 2 diabetes mellitus based on the UKPDS 68 outcomes equations was coded in both Visual Basic for Applications (VBA) and C++. Efficiency gains due to the programming language were evaluated, as was the impact of antithetic variates to reduce variance, using predicted QALYs over a 40-year time horizon. The use of C++ provided a 75- and 90-fold reduction in simulation run time when using mean and sampled input values, respectively. For a series of 50 one-way sensitivity analyses, this would yield a total run time of 2 minutes when using C++, compared with 155 minutes for VBA when using mean input values. The use of antithetic variates typically resulted in a 53% reduction in the number of simulation replications and run time required. When drawing all input values to the model from distributions, the use of C++ and variance reduction resulted in a 246-fold improvement in computation time compared with VBA - for which the evaluation of 50 scenarios would correspondingly require 3.8 hours (C++) and approximately 14.5 days (VBA). The choice of programming language used in an economic model, as well as the methods for improving precision of model output can have profound effects on computation time. When constructing complex models, more computationally efficient approaches such as C++ and variance reduction should be considered; concerns regarding model transparency using compiled languages are best addressed via thorough documentation and model validation.
Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus
2016-01-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
NASA Technical Reports Server (NTRS)
Elbanna, Hesham M.; Carlson, Leland A.
1992-01-01
The quasi-analytical approach is applied to the three-dimensional full potential equation to compute wing aerodynamic sensitivity coefficients in the transonic regime. Symbolic manipulation is used to reduce the effort associated with obtaining the sensitivity equations, and the large sensitivity system is solved using 'state of the art' routines. Results are compared to those obtained by the direct finite difference approach and both methods are evaluated to determine their computational accuracy and efficiency. The quasi-analytical approach is shown to be accurate and efficient for large aerodynamic systems.
Computational efficiency and Amdahl’s law for the adaptive resolution simulation technique
DOE Office of Scientific and Technical Information (OSTI.GOV)
Junghans, Christoph; Agarwal, Animesh; Delle Site, Luigi
Here, we discuss the computational performance of the adaptive resolution technique in molecular simulation when it is compared with equivalent full coarse-grained and full atomistic simulations. We show that an estimate of its efficiency, within 10%–15% accuracy, is given by the Amdahl’s Law adapted to the specific quantities involved in the problem. The derivation of the predictive formula is general enough that it may be applied to the general case of molecular dynamics approaches where a reduction of degrees of freedom in a multi scale fashion occurs.
Chen, A Y; Liu, Y-W H; Sheu, R J
2008-01-01
This study investigates the radiation shielding design of the treatment room for boron neutron capture therapy at Tsing Hua Open-pool Reactor using "TORT-coupled MCNP" method. With this method, the computational efficiency is improved significantly by two to three orders of magnitude compared to the analog Monte Carlo MCNP calculation. This makes the calculation feasible using a single CPU in less than 1 day. Further optimization of the photon weight windows leads to additional 50-75% improvement in the overall computational efficiency.
Computational efficiency and Amdahl’s law for the adaptive resolution simulation technique
Junghans, Christoph; Agarwal, Animesh; Delle Site, Luigi
2017-06-01
Here, we discuss the computational performance of the adaptive resolution technique in molecular simulation when it is compared with equivalent full coarse-grained and full atomistic simulations. We show that an estimate of its efficiency, within 10%–15% accuracy, is given by the Amdahl’s Law adapted to the specific quantities involved in the problem. The derivation of the predictive formula is general enough that it may be applied to the general case of molecular dynamics approaches where a reduction of degrees of freedom in a multi scale fashion occurs.
Solution of steady and unsteady transonic-vortex flows using Euler and full-potential equations
NASA Technical Reports Server (NTRS)
Kandil, Osama A.; Chuang, Andrew H.; Hu, Hong
1989-01-01
Two methods are presented for inviscid transonic flows: unsteady Euler equations in a rotating frame of reference for transonic-vortex flows and integral solution of full-potential equation with and without embedded Euler domains for transonic airfoil flows. The computational results covered: steady and unsteady conical vortex flows; 3-D steady transonic vortex flow; and transonic airfoil flows. The results are in good agreement with other computational results and experimental data. The rotating frame of reference solution is potentially efficient as compared with the space fixed reference formulation with dynamic gridding. The integral equation solution with embedded Euler domain is computationally efficient and as accurate as the Euler equations.
CFD comparison with centrifugal compressor measurements on a wide operating range
NASA Astrophysics Data System (ADS)
Le Sausse, P.; Fabrie, P.; Arnou, D.; Clunet, F.
2013-04-01
Centrifugal compressors are widely used in industrial applications thanks to their high efficiency. They are able to provide a wide operating range before reaching the flow barrier or surge limits. Performances and range are described by compressor maps obtained experimentally. After a description of performance test rig, this article compares measured centrifugal compressor performances with computational fluid dynamics results. These computations are performed at steady conditions with R134a refrigerant as fluid. Navier-Stokes equations, coupled with k-ɛ turbulence model, are solved by the commercial software ANSYS-CFX by means of volume finite method. Input conditions are varied in order to calculate several speed lines. Theoretical isentropic efficiency and theoretical surge line are finally compared to experimental data.
Dynamic water allocation policies improve the global efficiency of storage systems
NASA Astrophysics Data System (ADS)
Niayifar, Amin; Perona, Paolo
2017-06-01
Water impoundment by dams strongly affects the river natural flow regime, its attributes and the related ecosystem biodiversity. Fostering the sustainability of water uses e.g., hydropower systems thus implies searching for innovative operational policies able to generate Dynamic Environmental Flows (DEF) that mimic natural flow variability. The objective of this study is to propose a Direct Policy Search (DPS) framework based on defining dynamic flow release rules to improve the global efficiency of storage systems. The water allocation policies proposed for dammed systems are an extension of previously developed flow redistribution rules for small hydropower plants by Razurel et al. (2016).The mathematical form of the Fermi-Dirac statistical distribution applied to lake equations for the stored water in the dam is used to formulate non-proportional redistribution rules that partition the flow for energy production and environmental use. While energy production is computed from technical data, riverine ecological benefits associated with DEF are computed by integrating the Weighted Usable Area (WUA) for fishes with Richter's hydrological indicators. Then, multiobjective evolutionary algorithms (MOEAs) are applied to build ecological versus economic efficiency plot and locate its (Pareto) frontier. This study benchmarks two MOEAs (NSGA II and Borg MOEA) and compares their efficiency in terms of the quality of Pareto's frontier and computational cost. A detailed analysis of dam characteristics is performed to examine their impact on the global system efficiency and choice of the best redistribution rule. Finally, it is found that non-proportional flow releases can statistically improve the global efficiency, specifically the ecological one, of the hydropower system when compared to constant minimal flows.
Efficient Algorithms for Estimating the Absorption Spectrum within Linear Response TDDFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brabec, Jiri; Lin, Lin; Shao, Meiyue
We present two iterative algorithms for approximating the absorption spectrum of molecules within linear response of time-dependent density functional theory (TDDFT) framework. These methods do not attempt to compute eigenvalues or eigenvectors of the linear response matrix. They are designed to approximate the absorption spectrum as a function directly. They take advantage of the special structure of the linear response matrix. Neither method requires the linear response matrix to be constructed explicitly. They only require a procedure that performs the multiplication of the linear response matrix with a vector. These methods can also be easily modified to efficiently estimate themore » density of states (DOS) of the linear response matrix without computing the eigenvalues of this matrix. We show by computational experiments that the methods proposed in this paper can be much more efficient than methods that are based on the exact diagonalization of the linear response matrix. We show that they can also be more efficient than real-time TDDFT simulations. We compare the pros and cons of these methods in terms of their accuracy as well as their computational and storage cost.« less
Progress in a novel architecture for high performance processing
NASA Astrophysics Data System (ADS)
Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin
2018-04-01
The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Lou, Der-Chyuan; Lee, Tian-Fu; Lin, Tsung-Hung
2015-05-01
Authenticated key agreements for telecare medicine information systems provide patients, doctors, nurses and health visitors with accessing medical information systems and getting remote services efficiently and conveniently through an open network. In order to have higher security, many authenticated key agreement schemes appended biometric keys to realize identification except for using passwords and smartcards. Due to too many transmissions and computational costs, these authenticated key agreement schemes are inefficient in communication and computation. This investigation develops two secure and efficient authenticated key agreement schemes for telecare medicine information systems by using biometric key and extended chaotic maps. One scheme is synchronization-based, while the other nonce-based. Compared to related approaches, the proposed schemes not only retain the same security properties with previous schemes, but also provide users with privacy protection and have fewer transmissions and lower computational cost.
Biyikli, Emre; To, Albert C.
2015-01-01
A new topology optimization method called the Proportional Topology Optimization (PTO) is presented. As a non-sensitivity method, PTO is simple to understand, easy to implement, and is also efficient and accurate at the same time. It is implemented into two MATLAB programs to solve the stress constrained and minimum compliance problems. Descriptions of the algorithm and computer programs are provided in detail. The method is applied to solve three numerical examples for both types of problems. The method shows comparable efficiency and accuracy with an existing optimality criteria method which computes sensitivities. Also, the PTO stress constrained algorithm and minimum compliance algorithm are compared by feeding output from one algorithm to the other in an alternative manner, where the former yields lower maximum stress and volume fraction but higher compliance compared to the latter. Advantages and disadvantages of the proposed method and future works are discussed. The computer programs are self-contained and publicly shared in the website www.ptomethod.org. PMID:26678849
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
On the Use of Electrooculogram for Efficient Human Computer Interfaces
Usakli, A. B.; Gurkan, S.; Aloise, F.; Vecchiato, G.; Babiloni, F.
2010-01-01
The aim of this study is to present electrooculogram signals that can be used for human computer interface efficiently. Establishing an efficient alternative channel for communication without overt speech and hand movements is important to increase the quality of life for patients suffering from Amyotrophic Lateral Sclerosis or other illnesses that prevent correct limb and facial muscular responses. We have made several experiments to compare the P300-based BCI speller and EOG-based new system. A five-letter word can be written on average in 25 seconds and in 105 seconds with the EEG-based device. Giving message such as “clean-up” could be performed in 3 seconds with the new system. The new system is more efficient than P300-based BCI system in terms of accuracy, speed, applicability, and cost efficiency. Using EOG signals, it is possible to improve the communication abilities of those patients who can move their eyes. PMID:19841687
ERIC Educational Resources Information Center
Zaidel, Mark; Luo, XiaoHui
2010-01-01
This study investigates the efficiency of multimedia instruction at the college level by comparing the effectiveness of multimedia elements used in the computer supported learning with the cost of their preparation. Among the various technologies that advance learning, instructors and students generally identify interactive multimedia elements as…
Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations
Marinier, Eric; Zaheer, Rahat; Berry, Chrystal; Weedmark, Kelly A.; Domaratzki, Michael; Mabon, Philip; Knox, Natalie C.; Reimer, Aleisha R.; Graham, Morag R.; Chui, Linda; Patterson-Fortin, Laura; Zhang, Jian; Pagotto, Franco; Farber, Jeff; Mahony, Jim; Seyer, Karine; Bekal, Sadjia; Tremblay, Cécile; Isaac-Renton, Judy; Prystajecky, Natalie; Chen, Jessica; Slade, Peter
2017-01-01
Abstract The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using ‘big data’ approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune’s loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune. PMID:29048594
Software Design Strategies for Multidisciplinary Computational Fluid Dynamics
2012-07-01
on the left-hand-side of Figure 3. The resulting unstructured grid system does a good job of representing the flowfield locally around the solid... Laboratory [16–19]. It uses Cartesian block structured grids, which lead to a substantially more efficient computational execution compared to the...including blade sectional lift and pitching moment. These Helios-computed airloads show good agreement with the experimental data. Many of the
A high performance scientific cloud computing environment for materials simulations
NASA Astrophysics Data System (ADS)
Jorissen, K.; Vila, F. D.; Rehr, J. J.
2012-09-01
We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
An, Gao; Hong, Li; Zhou, Xiao-Bing; Yang, Qiong; Li, Mei-Qing; Tang, Xiang-Yang
2017-03-01
We investigated and compared the functionality of two 3D visualization software provided by a CT vendor and a third-party vendor, respectively. Using surgical anatomical measurement as baseline, we evaluated the accuracy of 3D visualization and verified their utility in computer-aided anatomical analysis. The study cohort consisted of 50 adult cadavers fixed with the classical formaldehyde method. The computer-aided anatomical analysis was based on CT images (in DICOM format) acquired by helical scan with contrast enhancement, using a CT vendor provided 3D visualization workstation (Syngo) and a third-party 3D visualization software (Mimics) that was installed on a PC. Automated and semi-automated segmentations were utilized in the 3D visualization workstation and software, respectively. The functionality and efficiency of automated and semi-automated segmentation methods were compared. Using surgical anatomical measurement as a baseline, the accuracy of 3D visualization based on automated and semi-automated segmentations was quantitatively compared. In semi-automated segmentation, the Mimics 3D visualization software outperformed the Syngo 3D visualization workstation. No significant difference was observed in anatomical data measurement by the Syngo 3D visualization workstation and the Mimics 3D visualization software (P>0.05). Both the Syngo 3D visualization workstation provided by a CT vendor and the Mimics 3D visualization software by a third-party vendor possessed the needed functionality, efficiency and accuracy for computer-aided anatomical analysis. Copyright © 2016 Elsevier GmbH. All rights reserved.
An 81.6 μW FastICA processor for epileptic seizure detection.
Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming
2015-02-01
To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
GPU-Accelerated Voxelwise Hepatic Perfusion Quantification
Wang, H; Cao, Y
2012-01-01
Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645
An energy-efficient failure detector for vehicular cloud computing.
Liu, Jiaxi; Wu, Zhibo; Dong, Jian; Wu, Jin; Wen, Dongxin
2018-01-01
Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption.
An energy-efficient failure detector for vehicular cloud computing
Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Wen, Dongxin
2018-01-01
Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption. PMID:29352282
A CFD Study on the Prediction of Cyclone Collection Efficiency
NASA Astrophysics Data System (ADS)
Gimbun, Jolius; Chuah, T. G.; Choong, Thomas S. Y.; Fakhru'L-Razi, A.
2005-09-01
This work presents a Computational Fluid Dynamics calculation to predict and to evaluate the effects of temperature, operating pressure and inlet velocity on the collection efficiency of gas cyclones. The numerical solutions were carried out using spreadsheet and commercial CFD code FLUENT 6.0. This paper also reviews four empirical models for the prediction of cyclone collection efficiency, namely Lapple [1], Koch and Licht [2], Li and Wang [3], and Iozia and Leith [4]. All the predictions proved to be satisfactory when compared with the presented experimental data. The CFD simulations predict the cyclone cut-off size for all operating conditions with a deviation of 3.7% from the experimental data. Specifically, results obtained from the computer modelling exercise have demonstrated that CFD model is the best method of modelling the cyclones collection efficiency.
Parallel, adaptive finite element methods for conservation laws
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.
1994-01-01
We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Desktop supercomputer: what can it do?
NASA Astrophysics Data System (ADS)
Bogdanov, A.; Degtyarev, A.; Korkhov, V.
2017-12-01
The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.
Efficient tiled calculation of over-10-gigapixel holograms using ray-wavefront conversion.
Igarashi, Shunsuke; Nakamura, Tomoya; Matsushima, Kyoji; Yamaguchi, Masahiro
2018-04-16
In the calculation of large-scale computer-generated holograms, an approach called "tiling," which divides the hologram plane into small rectangles, is often employed due to limitations on computational memory. However, the total amount of computational complexity severely increases with the number of divisions. In this paper, we propose an efficient method for calculating tiled large-scale holograms using ray-wavefront conversion. In experiments, the effectiveness of the proposed method was verified by comparing its calculation cost with that using the previous method. Additionally, a hologram of 128K × 128K pixels was calculated and fabricated by a laser-lithography system, and a high-quality 105 mm × 105 mm 3D image including complicated reflection and translucency was optically reconstructed.
Adding computationally efficient realism to Monte Carlo turbulence simulation
NASA Technical Reports Server (NTRS)
Campbell, C. W.
1985-01-01
Frequently in aerospace vehicle flight simulation, random turbulence is generated using the assumption that the craft is small compared to the length scales of turbulence. The turbulence is presumed to vary only along the flight path of the vehicle but not across the vehicle span. The addition of the realism of three-dimensionality is a worthy goal, but any such attempt will not gain acceptance in the simulator community unless it is computationally efficient. A concept for adding three-dimensional realism with a minimum of computational complexity is presented. The concept involves the use of close rational approximations to irrational spectra and cross-spectra so that systems of stable, explicit difference equations can be used to generate the turbulence.
Median Robust Extended Local Binary Pattern for Texture Classification.
Liu, Li; Lao, Songyang; Fieguth, Paul W; Guo, Yulan; Wang, Xiaogang; Pietikäinen, Matti
2016-03-01
Local binary patterns (LBP) are considered among the most computationally efficient high-performance texture features. However, the LBP method is very sensitive to image noise and is unable to capture macrostructure information. To best address these disadvantages, in this paper, we introduce a novel descriptor for texture classification, the median robust extended LBP (MRELBP). Different from the traditional LBP and many LBP variants, MRELBP compares regional image medians rather than raw image intensities. A multiscale LBP type descriptor is computed by efficiently comparing image medians over a novel sampling scheme, which can capture both microstructure and macrostructure texture information. A comprehensive evaluation on benchmark data sets reveals MRELBP's high performance-robust to gray scale variations, rotation changes and noise-but at a low computational cost. MRELBP produces the best classification scores of 99.82%, 99.38%, and 99.77% on three popular Outex test suites. More importantly, MRELBP is shown to be highly robust to image noise, including Gaussian noise, Gaussian blur, salt-and-pepper noise, and random pixel corruption.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
ERIC Educational Resources Information Center
Owusu, K. A.; Monney, K. A.; Appiah, J. Y.; Wilmot, E. M.
2010-01-01
This study investigated the comparative efficiency of computer-assisted instruction (CAI) and conventional teaching method in biology on senior high school students. A science class was selected in each of two randomly selected schools. The pretest-posttest non equivalent quasi experimental design was used. The students in the experimental group…
32 CFR Appendix E to Part 323 - OMB Guidelines for Matching Programs
Code of Federal Regulations, 2010 CFR
2010-07-01
... concern expressed by the Congress in the Privacy Act of 1974 that “the increasing use of computers and sophisticated information technology, while essential to the efficient operation of the Government, has greatly... which a computer is used to compare two or more automated systems of records or a system of records with...
32 CFR Appendix E to Part 323 - OMB Guidelines for Matching Programs
Code of Federal Regulations, 2011 CFR
2011-07-01
... concern expressed by the Congress in the Privacy Act of 1974 that “the increasing use of computers and sophisticated information technology, while essential to the efficient operation of the Government, has greatly... which a computer is used to compare two or more automated systems of records or a system of records with...
32 CFR Appendix E to Part 323 - OMB Guidelines for Matching Programs
Code of Federal Regulations, 2012 CFR
2012-07-01
... concern expressed by the Congress in the Privacy Act of 1974 that “the increasing use of computers and sophisticated information technology, while essential to the efficient operation of the Government, has greatly... which a computer is used to compare two or more automated systems of records or a system of records with...
The Effects of Routing and Scoring within a Computer Adaptive Multi-Stage Framework
ERIC Educational Resources Information Center
Dallas, Andrew
2014-01-01
This dissertation examined the overall effects of routing and scoring within a computer adaptive multi-stage framework (ca-MST). Testing in a ca-MST environment has become extremely popular in the testing industry. Testing companies enjoy its efficiency benefits as compared to traditionally linear testing and its quality-control features over…
Lattice Boltzmann Method for 3-D Flows with Curved Boundary
NASA Technical Reports Server (NTRS)
Mei, Renwei; Shyy, Wei; Yu, Dazhi; Luo, Li-Shi
2002-01-01
In this work, we investigate two issues that are important to computational efficiency and reliability in fluid dynamics applications of the lattice, Boltzmann equation (LBE): (1) Computational stability and accuracy of different lattice Boltzmann models and (2) the treatment of the boundary conditions on curved solid boundaries and their 3-D implementations. Three athermal 3-D LBE models (D3QI5, D3Ql9, and D3Q27) are studied and compared in terms of efficiency, accuracy, and robustness. The boundary treatment recently developed by Filippova and Hanel and Met et al. in 2-D is extended to and implemented for 3-D. The convergence, stability, and computational efficiency of the 3-D LBE models with the boundary treatment for curved boundaries were tested in simulations of four 3-D flows: (1) Fully developed flows in a square duct, (2) flow in a 3-D lid-driven cavity, (3) fully developed flows in a circular pipe, and (4) a uniform flow over a sphere. We found that while the fifteen-velocity 3-D (D3Ql5) model is more prone to numerical instability and the D3Q27 is more computationally intensive, the 63Q19 model provides a balance between computational reliability and efficiency. Through numerical simulations, we demonstrated that the boundary treatment for 3-D arbitrary curved geometry has second-order accuracy and possesses satisfactory stability characteristics.
Multi-GPU hybrid programming accelerated three-dimensional phase-field model in binary alloy
NASA Astrophysics Data System (ADS)
Zhu, Changsheng; Liu, Jieqiong; Zhu, Mingfang; Feng, Li
2018-03-01
In the process of dendritic growth simulation, the computational efficiency and the problem scales have extremely important influence on simulation efficiency of three-dimensional phase-field model. Thus, seeking for high performance calculation method to improve the computational efficiency and to expand the problem scales has a great significance to the research of microstructure of the material. A high performance calculation method based on MPI+CUDA hybrid programming model is introduced. Multi-GPU is used to implement quantitative numerical simulations of three-dimensional phase-field model in binary alloy under the condition of multi-physical processes coupling. The acceleration effect of different GPU nodes on different calculation scales is explored. On the foundation of multi-GPU calculation model that has been introduced, two optimization schemes, Non-blocking communication optimization and overlap of MPI and GPU computing optimization, are proposed. The results of two optimization schemes and basic multi-GPU model are compared. The calculation results show that the use of multi-GPU calculation model can improve the computational efficiency of three-dimensional phase-field obviously, which is 13 times to single GPU, and the problem scales have been expanded to 8193. The feasibility of two optimization schemes is shown, and the overlap of MPI and GPU computing optimization has better performance, which is 1.7 times to basic multi-GPU model, when 21 GPUs are used.
Exponential Methods for the Time Integration of Schroedinger Equation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cano, B.; Gonzalez-Pachon, A.
2010-09-30
We consider exponential methods of second order in time in order to integrate the cubic nonlinear Schroedinger equation. We are interested in taking profit of the special structure of this equation. Therefore, we look at symmetry, symplecticity and approximation of invariants of the proposed methods. That will allow to integrate till long times with reasonable accuracy. Computational efficiency is also our aim. Therefore, we make numerical computations in order to compare the methods considered and so as to conclude that explicit Lawson schemes projected on the norm of the solution are an efficient tool to integrate this equation.
Numerical calculations of two dimensional, unsteady transonic flows with circulation
NASA Technical Reports Server (NTRS)
Beam, R. M.; Warming, R. F.
1974-01-01
The feasibility of obtaining two-dimensional, unsteady transonic aerodynamic data by numerically integrating the Euler equations is investigated. An explicit, third-order-accurate, noncentered, finite-difference scheme is used to compute unsteady flows about airfoils. Solutions for lifting and nonlifting airfoils are presented and compared with subsonic linear theory. The applicability and efficiency of the numerical indicial function method are outlined. Numerically computed subsonic and transonic oscillatory aerodynamic coefficients are presented and compared with those obtained from subsonic linear theory and transonic wind-tunnel data.
Efficient integration method for fictitious domain approaches
NASA Astrophysics Data System (ADS)
Duczek, Sascha; Gabbert, Ulrich
2015-10-01
In the current article, we present an efficient and accurate numerical method for the integration of the system matrices in fictitious domain approaches such as the finite cell method (FCM). In the framework of the FCM, the physical domain is embedded in a geometrically larger domain of simple shape which is discretized using a regular Cartesian grid of cells. Therefore, a spacetree-based adaptive quadrature technique is normally deployed to resolve the geometry of the structure. Depending on the complexity of the structure under investigation this method accounts for most of the computational effort. To reduce the computational costs for computing the system matrices an efficient quadrature scheme based on the divergence theorem (Gauß-Ostrogradsky theorem) is proposed. Using this theorem the dimension of the integral is reduced by one, i.e. instead of solving the integral for the whole domain only its contour needs to be considered. In the current paper, we present the general principles of the integration method and its implementation. The results to several two-dimensional benchmark problems highlight its properties. The efficiency of the proposed method is compared to conventional spacetree-based integration techniques.
An imperialist competitive algorithm for virtual machine placement in cloud computing
NASA Astrophysics Data System (ADS)
Jamali, Shahram; Malektaji, Sepideh; Analoui, Morteza
2017-05-01
Cloud computing, the recently emerged revolution in IT industry, is empowered by virtualisation technology. In this paradigm, the user's applications run over some virtual machines (VMs). The process of selecting proper physical machines to host these virtual machines is called virtual machine placement. It plays an important role on resource utilisation and power efficiency of cloud computing environment. In this paper, we propose an imperialist competitive-based algorithm for the virtual machine placement problem called ICA-VMPLC. The base optimisation algorithm is chosen to be ICA because of its ease in neighbourhood movement, good convergence rate and suitable terminology. The proposed algorithm investigates search space in a unique manner to efficiently obtain optimal placement solution that simultaneously minimises power consumption and total resource wastage. Its final solution performance is compared with several existing methods such as grouping genetic and ant colony-based algorithms as well as bin packing heuristic. The simulation results show that the proposed method is superior to other tested algorithms in terms of power consumption, resource wastage, CPU usage efficiency and memory usage efficiency.
Efficient option valuation of single and double barrier options
NASA Astrophysics Data System (ADS)
Kabaivanov, Stanimir; Milev, Mariyan; Koleva-Petkova, Dessislava; Vladev, Veselin
2017-12-01
In this paper we present an implementation of pricing algorithm for single and double barrier options using Mellin transformation with Maximum Entropy Inversion and its suitability for real-world applications. A detailed analysis of the applied algorithm is accompanied by implementation in C++ that is then compared to existing solutions in terms of efficiency and computational power. We then compare the applied method with existing closed-form solutions and well known methods of pricing barrier options that are based on finite differences.
NASA Astrophysics Data System (ADS)
Lin, Y.; O'Malley, D.; Vesselinov, V. V.
2015-12-01
Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
Indoor Pedestrian Localization Using iBeacon and Improved Kalman Filter.
Sung, Kwangjae; Lee, Dong Kyu 'Roy'; Kim, Hwangnam
2018-05-26
The reliable and accurate indoor pedestrian positioning is one of the biggest challenges for location-based systems and applications. Most pedestrian positioning systems have drift error and large bias due to low-cost inertial sensors and random motions of human being, as well as unpredictable and time-varying radio-frequency (RF) signals used for position determination. To solve this problem, many indoor positioning approaches that integrate the user's motion estimated by dead reckoning (DR) method and the location data obtained by RSS fingerprinting through Bayesian filter, such as the Kalman filter (KF), unscented Kalman filter (UKF), and particle filter (PF), have recently been proposed to achieve higher positioning accuracy in indoor environments. Among Bayesian filtering methods, PF is the most popular integrating approach and can provide the best localization performance. However, since PF uses a large number of particles for the high performance, it can lead to considerable computational cost. This paper presents an indoor positioning system implemented on a smartphone, which uses simple dead reckoning (DR), RSS fingerprinting using iBeacon and machine learning scheme, and improved KF. The core of the system is the enhanced KF called a sigma-point Kalman particle filter (SKPF), which localize the user leveraging both the unscented transform of UKF and the weighting method of PF. The SKPF algorithm proposed in this study is used to provide the enhanced positioning accuracy by fusing positional data obtained from both DR and fingerprinting with uncertainty. The SKPF algorithm can achieve better positioning accuracy than KF and UKF and comparable performance compared to PF, and it can provide higher computational efficiency compared with PF. iBeacon in our positioning system is used for energy-efficient localization and RSS fingerprinting. We aim to design the localization scheme that can realize the high positioning accuracy, computational efficiency, and energy efficiency through the SKPF and iBeacon indoors. Empirical experiments in real environments show that the use of the SKPF algorithm and iBeacon in our indoor localization scheme can achieve very satisfactory performance in terms of localization accuracy, computational cost, and energy efficiency.
Step-by-step magic state encoding for efficient fault-tolerant quantum computation
Goto, Hayato
2014-01-01
Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387
Step-by-step magic state encoding for efficient fault-tolerant quantum computation.
Goto, Hayato
2014-12-16
Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.
Adaptation of a program for nonlinear finite element analysis to the CDC STAR 100 computer
NASA Technical Reports Server (NTRS)
Pifko, A. B.; Ogilvie, P. L.
1978-01-01
The conversion of a nonlinear finite element program to the CDC STAR 100 pipeline computer is discussed. The program called DYCAST was developed for the crash simulation of structures. Initial results with the STAR 100 computer indicated that significant gains in computation time are possible for operations on gloval arrays. However, for element level computations that do not lend themselves easily to long vector processing, the STAR 100 was slower than comparable scalar computers. On this basis it is concluded that in order for pipeline computers to impact the economic feasibility of large nonlinear analyses it is absolutely essential that algorithms be devised to improve the efficiency of element level computations.
A 60 GOPS/W, -1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology
NASA Astrophysics Data System (ADS)
Rossi, Davide; Pullini, Antonio; Loi, Igor; Gautschi, Michael; Gürkaynak, Frank K.; Bartolini, Andrea; Flatresse, Philippe; Benini, Luca
2016-03-01
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas, such as E-health, Internet of Things, and wearable Human-Computer Interfaces. A promising approach to achieve up to one order of magnitude of improvement in energy efficiency over current generation of integrated circuits is near-threshold computing. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all performance-constrained applications. Thread-level parallelism over multiple cores can be used to overcome the performance degradation at low voltage. Moreover, enabling the processors to operate on-demand and over a wide supply voltage and body bias ranges allows to achieve the best possible energy efficiency while satisfying a large spectrum of computational demands. In this work we present the first ever implementation of a 4-core cluster fabricated using conventional-well 28 nm UTBB FD-SOI technology. The multi-core architecture we present in this work is able to operate on a wide range of supply voltages starting from 0.44 V to 1.2 V. In addition, the architecture allows a wide range of body bias to be applied from -1.8 V to 0.9 V. The peak energy efficiency 60 GOPS/W is achieved at 0.5 V supply voltage and 0.5 V forward body bias. Thanks to the extended body bias range of conventional-well FD-SOI technology, high energy efficiency can be guaranteed for a wide range of process and environmental conditions. We demonstrate the ability to compensate for up to 99.7% of chips for process variation with only ±0.2 V of body biasing, and compensate temperature variation in the range -40 °C to 120 °C exploiting -1.1 V to 0.8 V body biasing. When compared to leading-edge near-threshold RISC processors optimized for extremely low power applications, the multi-core architecture we propose has 144× more performance at comparable energy efficiency levels. Even when compared to other low-power processors with comparable performance, including those implemented in 28 nm technology, our platform provides 1.4× to 3.7× better energy efficiency.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
NASA Astrophysics Data System (ADS)
Lachinova, Svetlana L.; Vorontsov, Mikhail A.; Filimonov, Grigory A.; LeMaster, Daniel A.; Trippel, Matthew E.
2017-07-01
Computational efficiency and accuracy of wave-optics-based Monte-Carlo and brightness function numerical simulation techniques for incoherent imaging of extended objects through atmospheric turbulence are evaluated. Simulation results are compared with theoretical estimates based on known analytical solutions for the modulation transfer function of an imaging system and the long-exposure image of a Gaussian-shaped incoherent light source. It is shown that the accuracy of both techniques is comparable over the wide range of path lengths and atmospheric turbulence conditions, whereas the brightness function technique is advantageous in terms of the computational speed.
Enhancements in medicine by integrating content based image retrieval in computer-aided diagnosis
NASA Astrophysics Data System (ADS)
Aggarwal, Preeti; Sardana, H. K.
2010-02-01
Computer-aided diagnosis (CAD) has become one of the major research subjects in medical imaging and diagnostic radiology. With cad, radiologists use the computer output as a "second opinion" and make the final decisions. Retrieving images is a useful tool to help radiologist to check medical image and diagnosis. The impact of contentbased access to medical images is frequently reported but existing systems are designed for only a particular context of diagnosis. The challenge in medical informatics is to develop tools for analyzing the content of medical images and to represent them in a way that can be efficiently searched and compared by the physicians. CAD is a concept established by taking into account equally the roles of physicians and computers. To build a successful computer aided diagnostic system, all the relevant technologies, especially retrieval need to be integrated in such a manner that should provide effective and efficient pre-diagnosed cases with proven pathology for the current case at the right time. In this paper, it is suggested that integration of content-based image retrieval (CBIR) in cad can bring enormous results in medicine especially in diagnosis. This approach is also compared with other approaches by highlighting its advantages over those approaches.
Development and validation of a two-dimensional fast-response flood estimation model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Judi, David R; Mcpherson, Timothy N; Burian, Steven J
2009-01-01
A finite difference formulation of the shallow water equations using an upwind differencing method was developed maintaining computational efficiency and accuracy such that it can be used as a fast-response flood estimation tool. The model was validated using both laboratory controlled experiments and an actual dam breach. Through the laboratory experiments, the model was shown to give good estimations of depth and velocity when compared to the measured data, as well as when compared to a more complex two-dimensional model. Additionally, the model was compared to high water mark data obtained from the failure of the Taum Sauk dam. Themore » simulated inundation extent agreed well with the observed extent, with the most notable differences resulting from the inability to model sediment transport. The results of these validation studies complex two-dimensional model. Additionally, the model was compared to high water mark data obtained from the failure of the Taum Sauk dam. The simulated inundation extent agreed well with the observed extent, with the most notable differences resulting from the inability to model sediment transport. The results of these validation studies show that a relatively numerical scheme used to solve the complete shallow water equations can be used to accurately estimate flood inundation. Future work will focus on further reducing the computation time needed to provide flood inundation estimates for fast-response analyses. This will be accomplished through the efficient use of multi-core, multi-processor computers coupled with an efficient domain-tracking algorithm, as well as an understanding of the impacts of grid resolution on model results.« less
All-atom calculation of protein free-energy profiles
NASA Astrophysics Data System (ADS)
Orioli, S.; Ianeselli, A.; Spagnolli, G.; Faccioli, P.
2017-10-01
The Bias Functional (BF) approach is a variational method which enables one to efficiently generate ensembles of reactive trajectories for complex biomolecular transitions, using ordinary computer clusters. For example, this scheme was applied to simulate in atomistic detail the folding of proteins consisting of several hundreds of amino acids and with experimental folding time of several minutes. A drawback of the BF approach is that it produces trajectories which do not satisfy microscopic reversibility. Consequently, this method cannot be used to directly compute equilibrium observables, such as free energy landscapes or equilibrium constants. In this work, we develop a statistical analysis which permits us to compute the potential of mean-force (PMF) along an arbitrary collective coordinate, by exploiting the information contained in the reactive trajectories calculated with the BF approach. We assess the accuracy and computational efficiency of this scheme by comparing its results with the PMF obtained for a small protein by means of plain molecular dynamics.
Blanson Henkemans, O. A.; Rogers, W. A.; Fisk, A. D.; Neerincx, M. A.; Lindenberg, J.; van der Mast, C. A. P. G.
2014-01-01
Summary Objectives We developed an adaptive computer assistant for the supervision of diabetics’ self-care, to support limiting illness and need for acute treatment, and improve health literacy. This assistant monitors self-care activities logged in the patient’s electronic diary. Accordingly, it provides context-aware feedback. The objective was to evaluate whether older adults in general can make use of the computer assistant and to compare an adaptive computer assistant with a fixed one, concerning its usability and contribution to health literacy. Methods We conducted a laboratory experiment in the Georgia Tech Aware Home wherein 28 older adults participated in a usability evaluation of the computer assistant, while engaged in scenarios reflecting normal and health-critical situations. We evaluated the assistant on effectiveness, efficiency, satisfaction, and educational value. Finally, we studied the moderating effects of the subjects’ personal characteristics. Results Logging self-care tasks and receiving feedback from the computer assistant enhanced the subjects’ knowledge of diabetes. The adaptive assistant was more effective in dealing with normal and health-critical situations, and, generally, it led to more time efficiency. Subjects’ personal characteristics had substantial effects on the effectiveness and efficiency of the two computer assistants. Conclusions Older adults were able to use the adaptive computer assistant. In addition, it had a positive effect on the development of health literacy. The assistant has the potential to support older diabetics’ self care while maintaining quality of life. PMID:18213433
Pattern-based integer sample motion search strategies in the context of HEVC
NASA Astrophysics Data System (ADS)
Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas
2015-09-01
The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
Parallel Implicit Runge-Kutta Methods Applied to Coupled Orbit/Attitude Propagation
NASA Astrophysics Data System (ADS)
Hatten, Noble; Russell, Ryan P.
2017-12-01
A variable-step Gauss-Legendre implicit Runge-Kutta (GLIRK) propagator is applied to coupled orbit/attitude propagation. Concepts previously shown to improve efficiency in 3DOF propagation are modified and extended to the 6DOF problem, including the use of variable-fidelity dynamics models. The impact of computing the stage dynamics of a single step in parallel is examined using up to 23 threads and 22 associated GLIRK stages; one thread is reserved for an extra dynamics function evaluation used in the estimation of the local truncation error. Efficiency is found to peak for typical examples when using approximately 8 to 12 stages for both serial and parallel implementations. Accuracy and efficiency compare favorably to explicit Runge-Kutta and linear-multistep solvers for representative scenarios. However, linear-multistep methods are found to be more efficient for some applications, particularly in a serial computing environment, or when parallelism can be applied across multiple trajectories.
NASA Astrophysics Data System (ADS)
Zhuang, Wei; Mountrakis, Giorgos
2014-09-01
Large footprint waveform LiDAR sensors have been widely used for numerous airborne studies. Ground peak identification in a large footprint waveform is a significant bottleneck in exploring full usage of the waveform datasets. In the current study, an accurate and computationally efficient algorithm was developed for ground peak identification, called Filtering and Clustering Algorithm (FICA). The method was evaluated on Land, Vegetation, and Ice Sensor (LVIS) waveform datasets acquired over Central NY. FICA incorporates a set of multi-scale second derivative filters and a k-means clustering algorithm in order to avoid detecting false ground peaks. FICA was tested in five different land cover types (deciduous trees, coniferous trees, shrub, grass and developed area) and showed more accurate results when compared to existing algorithms. More specifically, compared with Gaussian decomposition, the RMSE ground peak identification by FICA was 2.82 m (5.29 m for GD) in deciduous plots, 3.25 m (4.57 m for GD) in coniferous plots, 2.63 m (2.83 m for GD) in shrub plots, 0.82 m (0.93 m for GD) in grass plots, and 0.70 m (0.51 m for GD) in plots of developed areas. FICA performance was also relatively consistent under various slope and canopy coverage (CC) conditions. In addition, FICA showed better computational efficiency compared to existing methods. FICA's major computational and accuracy advantage is a result of the adopted multi-scale signal processing procedures that concentrate on local portions of the signal as opposed to the Gaussian decomposition that uses a curve-fitting strategy applied in the entire signal. The FICA algorithm is a good candidate for large-scale implementation on future space-borne waveform LiDAR sensors.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P
2010-12-22
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
Time and learning efficiency in Internet-based learning: a systematic review and meta-analysis.
Cook, David A; Levinson, Anthony J; Garside, Sarah
2010-12-01
Authors have claimed that Internet-based instruction promotes greater learning efficiency than non-computer methods. determine, through a systematic synthesis of evidence in health professions education, how Internet-based instruction compares with non-computer instruction in time spent learning, and what features of Internet-based instruction are associated with improved learning efficiency. we searched databases including MEDLINE, CINAHL, EMBASE, and ERIC from 1990 through November 2008. STUDY SELECTION AND DATA ABSTRACTION we included all studies quantifying learning time for Internet-based instruction for health professionals, compared with other instruction. Reviewers worked independently, in duplicate, to abstract information on interventions, outcomes, and study design. we identified 20 eligible studies. Random effects meta-analysis of 8 studies comparing Internet-based with non-Internet instruction (positive numbers indicating Internet longer) revealed pooled effect size (ES) for time -0.10 (p = 0.63). Among comparisons of two Internet-based interventions, providing feedback adds time (ES 0.67, p =0.003, two studies), and greater interactivity generally takes longer (ES 0.25, p = 0.089, five studies). One study demonstrated that adapting to learner prior knowledge saves time without significantly affecting knowledge scores. Other studies revealed that audio narration, video clips, interactive models, and animations increase learning time but also facilitate higher knowledge and/or satisfaction. Across all studies, time correlated positively with knowledge outcomes (r = 0.53, p = 0.021). on average, Internet-based instruction and non-computer instruction require similar time. Instructional strategies to enhance feedback and interactivity typically prolong learning time, but in many cases also enhance learning outcomes. Isolated examples suggest potential for improving efficiency in Internet-based instruction.
The Problem With the Placement Study.
ERIC Educational Resources Information Center
Miner, Norris
This study compared the effectiveness and efficiency of two alternative methods for determining the status of graduates of Seminole Community College. The first method involved the identification of graduates, design and mailing of a questionnaire, and analysis of response data, as mandated by the state. The second method compared computer data…
Do Computers Improve the Drawing of a Geometrical Figure for 10 Year-Old Children?
ERIC Educational Resources Information Center
Martin, Perrine; Velay, Jean-Luc
2012-01-01
Nowadays, computer aided design (CAD) is widely used by designers. Would children learn to draw more easily and more efficiently if they were taught with computerised tools? To answer this question, we made an experiment designed to compare two methods for children to do the same drawing: the classical "pen and paper" method and a CAD…
Extending Moore's Law via Computationally Error Tolerant Computing.
Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.; ...
2018-03-01
Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less
Extending Moore's Law via Computationally Error Tolerant Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.
Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less
Shafai, Fakhri; Oruc, Ipek
2018-02-01
The other-race effect is the finding of diminished performance in recognition of other-race faces compared to those of own-race. It has been suggested that the other-race effect stems from specialized expert processes being tuned exclusively to own-race faces. In the present study, we measured recognition contrast thresholds for own- and other-race faces as well as houses for Caucasian observers. We have factored face recognition performance into two invariant aspects of visual function: efficiency, which is related to neural computations and processing demanded by the task, and equivalent input noise, related to signal degradation within the visual system. We hypothesized that if expert processes are available only to own-race faces, this should translate into substantially greater recognition efficiencies for own-race compared to other-race faces. Instead, we found similar recognition efficiencies for both own- and other-race faces. The other-race effect manifested as increased equivalent input noise. These results argue against qualitatively distinct perceptual processes. Instead they suggest that for Caucasian observers, similar neural computations underlie recognition of own- and other-race faces. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, Jianwei; Remsing, Richard C.; Zhang, Yubo
2016-06-13
One atom or molecule binds to another through various types of bond, the strengths of which range from several meV to several eV. Although some computational methods can provide accurate descriptions of all bond types, those methods are not efficient enough for many studies (for example, large systems, ab initio molecular dynamics and high-throughput searches for functional materials). Here, we show that the recently developed non-empirical strongly constrained and appropriately normed (SCAN) meta-generalized gradient approximation (meta-GGA) within the density functional theory framework predicts accurate geometries and energies of diversely bonded molecules and materials (including covalent, metallic, ionic, hydrogen and vanmore » der Waals bonds). This represents a significant improvement at comparable efficiency over its predecessors, the GGAs that currently dominate materials computation. Often, SCAN matches or improves on the accuracy of a computationally expensive hybrid functional, at almost-GGA cost. SCAN is therefore expected to have a broad impact on chemistry and materials science.« less
Gui, Jiang; Andrew, Angeline S.; Andrews, Peter; Nelson, Heather M.; Kelsey, Karl T.; Karagas, Margaret R.; Moore, Jason H.
2010-01-01
Epistasis or gene-gene interaction is a fundamental component of the genetic architecture of complex traits such as disease susceptibility. Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free method to detect epistasis when there are no significant marginal genetic effects. However, in many studies of complex disease, other covariates like age of onset and smoking status could have a strong main effect and may potentially interfere with MDR's ability to achieve its goal. In this paper, we present a simple and computationally efficient sampling method to adjust for covariate effects in MDR. We use simulation to show that after adjustment, MDR has sufficient power to detect true gene-gene interactions. We also compare our method with the state-of-art technique in covariate adjustment. The results suggest that our proposed method performs similarly, but is more computationally efficient. We then apply this new method to an analysis of a population-based bladder cancer study in New Hampshire. PMID:20924193
NASA Astrophysics Data System (ADS)
Tong, Minh Q.; Hasan, M. Monirul; Gregory, Patrick D.; Shah, Jasmine; Park, B. Hyle; Hirota, Koji; Liu, Junze; Choi, Andy; Low, Karen; Nam, Jin
2017-02-01
We demonstrate a computationally-efficient optical coherence elastography (OCE) method based on fringe washout. By introducing ultrasound in alternating depth profile, we can obtain information on the mechanical properties of a sample within acquisition of a single image. This can be achieved by simply comparing the intensity in adjacent depth profiles in order to quantify the degree of fringe washout. Phantom agar samples with various densities were measured and quantified by our OCE technique, the correlation to Young's modulus measurement by atomic force micrscopy (AFM) were observed. Knee cartilage samples of monoiodo acetate-induced arthiritis (MIA) rat models were utilized to replicate cartilage damages where our proposed OCE technique along with intensity and birefringence analyses and AFM measurements were applied. The results indicate that our OCE technique shows a correlation to the techniques as polarization-sensitive OCT, AFM Young's modulus measurements and histology were promising. Our OCE is applicable to any of existing OCT systems and demonstrated to be computationally-efficient.
Sun, Jianwei; Remsing, Richard C; Zhang, Yubo; Sun, Zhaoru; Ruzsinszky, Adrienn; Peng, Haowei; Yang, Zenghui; Paul, Arpita; Waghmare, Umesh; Wu, Xifan; Klein, Michael L; Perdew, John P
2016-09-01
One atom or molecule binds to another through various types of bond, the strengths of which range from several meV to several eV. Although some computational methods can provide accurate descriptions of all bond types, those methods are not efficient enough for many studies (for example, large systems, ab initio molecular dynamics and high-throughput searches for functional materials). Here, we show that the recently developed non-empirical strongly constrained and appropriately normed (SCAN) meta-generalized gradient approximation (meta-GGA) within the density functional theory framework predicts accurate geometries and energies of diversely bonded molecules and materials (including covalent, metallic, ionic, hydrogen and van der Waals bonds). This represents a significant improvement at comparable efficiency over its predecessors, the GGAs that currently dominate materials computation. Often, SCAN matches or improves on the accuracy of a computationally expensive hybrid functional, at almost-GGA cost. SCAN is therefore expected to have a broad impact on chemistry and materials science.
NASA Astrophysics Data System (ADS)
Dimitrakopoulos, Panagiotis
2018-03-01
The calculation of polytropic efficiencies is a very important task, especially during the development of new compression units, like compressor impellers, stages and stage groups. Such calculations are also crucial for the determination of the performance of a whole compressor. As processors and computational capacities have substantially been improved in the last years, the need for a new, rigorous, robust, accurate and at the same time standardized method merged, regarding the computation of the polytropic efficiencies, especially based on thermodynamics of real gases. The proposed method is based on the rigorous definition of the polytropic efficiency. The input consists of pressure and temperature values at the end points of the compression path (suction and discharge), for a given working fluid. The average relative error for the studied cases was 0.536 %. Thus, this high-accuracy method is proposed for efficiency calculations related with turbocompressors and their compression units, especially when they are operating at high power levels, for example in jet engines and high-power plants.
A multilevel finite element method for Fredholm integral eigenvalue problems
NASA Astrophysics Data System (ADS)
Xie, Hehu; Zhou, Tao
2015-12-01
In this work, we proposed a multigrid finite element (MFE) method for solving the Fredholm integral eigenvalue problems. The main motivation for such studies is to compute the Karhunen-Loève expansions of random fields, which play an important role in the applications of uncertainty quantification. In our MFE framework, solving the eigenvalue problem is converted to doing a series of integral iterations and eigenvalue solving in the coarsest mesh. Then, any existing efficient integration scheme can be used for the associated integration process. The error estimates are provided, and the computational complexity is analyzed. It is noticed that the total computational work of our method is comparable with a single integration step in the finest mesh. Several numerical experiments are presented to validate the efficiency of the proposed numerical method.
A sub-space greedy search method for efficient Bayesian Network inference.
Zhang, Qing; Cao, Yong; Li, Yong; Zhu, Yanming; Sun, Samuel S M; Guo, Dianjing
2011-09-01
Bayesian network (BN) has been successfully used to infer the regulatory relationships of genes from microarray dataset. However, one major limitation of BN approach is the computational cost because the calculation time grows more than exponentially with the dimension of the dataset. In this paper, we propose a sub-space greedy search method for efficient Bayesian Network inference. Particularly, this method limits the greedy search space by only selecting gene pairs with higher partial correlation coefficients. Using both synthetic and real data, we demonstrate that the proposed method achieved comparable results with standard greedy search method yet saved ∼50% of the computational time. We believe that sub-space search method can be widely used for efficient BN inference in systems biology. Copyright © 2011 Elsevier Ltd. All rights reserved.
Computing the Baker-Campbell-Hausdorff series and the Zassenhaus product
NASA Astrophysics Data System (ADS)
Weyrauch, Michael; Scholz, Daniel
2009-09-01
The Baker-Campbell-Hausdorff (BCH) series and the Zassenhaus product are of fundamental importance for the theory of Lie groups and their applications in physics and physical chemistry. Standard methods for the explicit construction of the BCH and Zassenhaus terms yield polynomial representations, which must be translated into the usually required commutator representation. We prove that a new translation proposed recently yields a correct representation of the BCH and Zassenhaus terms. This representation entails fewer terms than the well-known Dynkin-Specht-Wever representation, which is of relevance for practical applications. Furthermore, various methods for the computation of the BCH and Zassenhaus terms are compared, and a new efficient approach for the calculation of the Zassenhaus terms is proposed. Mathematica implementations for the most efficient algorithms are provided together with comparisons of efficiency.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
NASA Astrophysics Data System (ADS)
Larger, Laurent; Baylón-Fuentes, Antonio; Martinenghi, Romain; Udaltsov, Vladimir S.; Chembo, Yanne K.; Jacquot, Maxime
2017-01-01
Reservoir computing, originally referred to as an echo state network or a liquid state machine, is a brain-inspired paradigm for processing temporal information. It involves learning a "read-out" interpretation for nonlinear transients developed by high-dimensional dynamics when the latter is excited by the information signal to be processed. This novel computational paradigm is derived from recurrent neural network and machine learning techniques. It has recently been implemented in photonic hardware for a dynamical system, which opens the path to ultrafast brain-inspired computing. We report on a novel implementation involving an electro-optic phase-delay dynamics designed with off-the-shelf optoelectronic telecom devices, thus providing the targeted wide bandwidth. Computational efficiency is demonstrated experimentally with speech-recognition tasks. State-of-the-art speed performances reach one million words per second, with very low word error rate. Additionally, to record speed processing, our investigations have revealed computing-efficiency improvements through yet-unexplored temporal-information-processing techniques, such as simultaneous multisample injection and pitched sampling at the read-out compared to information "write-in".
Efficient method for computing the electronic transport properties of a multiterminal system
NASA Astrophysics Data System (ADS)
Lima, Leandro R. F.; Dusko, Amintor; Lewenkopf, Caio
2018-04-01
We present a multiprobe recursive Green's function method to compute the transport properties of mesoscopic systems using the Landauer-Büttiker approach. By introducing an adaptive partition scheme, we map the multiprobe problem into the standard two-probe recursive Green's function method. We apply the method to compute the longitudinal and Hall resistances of a disordered graphene sample, a system of current interest. We show that the performance and accuracy of our method compares very well with other state-of-the-art schemes.
A Comparative Study of Multi-material Data Structures for Computational Physics Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garimella, Rao Veerabhadra; Robey, Robert W.
The data structures used to represent the multi-material state of a computational physics application can have a drastic impact on the performance of the application. We look at efficient data structures for sparse applications where there may be many materials, but only one or few in most computational cells. We develop simple performance models for use in selecting possible data structures and programming patterns. We verify the analytic models of performance through a small test program of the representative cases.
A novel iris localization algorithm using correlation filtering
NASA Astrophysics Data System (ADS)
Pohit, Mausumi; Sharma, Jitu
2015-06-01
Fast and efficient segmentation of iris from the eye images is a primary requirement for robust database independent iris recognition. In this paper we have presented a new algorithm for computing the inner and outer boundaries of the iris and locating the pupil centre. Pupil-iris boundary computation is based on correlation filtering approach, whereas iris-sclera boundary is determined through one dimensional intensity mapping. The proposed approach is computationally less extensive when compared with the existing algorithms like Hough transform.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
Efficient self-consistent viscous-inviscid solutions for unsteady transonic flow
NASA Technical Reports Server (NTRS)
Howlett, J. T.
1985-01-01
An improved method is presented for coupling a boundary layer code with an unsteady inviscid transonic computer code in a quasi-steady fashion. At each fixed time step, the boundary layer and inviscid equations are successively solved until the process converges. An explicit coupling of the equations is described which greatly accelerates the convergence process. Computer times for converged viscous-inviscid solutions are about 1.8 times the comparable inviscid values. Comparison of the results obtained with experimental data on three airfoils are presented. These comparisons demonstrate that the explicitly coupled viscous-inviscid solutions can provide efficient predictions of pressure distributions and lift for unsteady two-dimensional transonic flows.
Efficient self-consistent viscous-inviscid solutions for unsteady transonic flow
NASA Technical Reports Server (NTRS)
Howlett, J. T.
1985-01-01
An improved method is presented for coupling a boundary layer code with an unsteady inviscid transonic computer code in a quasi-steady fashion. At each fixed time step, the boundary layer and inviscid equations are successively solved until the process converges. An explicit coupling of the equations is described which greatly accelerates the convergence process. Computer times for converged viscous-inviscid solutions are about 1.8 times the comparable inviscid values. Comparison of the results obtained with experimental data on three airfoils are presented. These comparisons demonstrate that the explicitly coupled viscous-inviscid solutions can provide efficient predictions of pressure distributions and lift for unsteady two-dimensional transonic flow.
NASA Astrophysics Data System (ADS)
Caillol, J. M.; Levesque, D.
1992-01-01
The reliability and the efficiency of a new method suitable for the simulations of dielectric fluids and ionic solutions is established by numerical computations. The efficiency depends on the use of a simulation cell which is the surface of a four-dimensional sphere. The reliability originates from a charge-charge potential solution of the Poisson equation in this confining volume. The computation time, for systems of a few hundred molecules, is reduced by a factor of 2 or 3 compared to this of a simulation performed in a cubic volume with periodic boundary conditions and the Ewald charge-charge potential.
Efficient scatter model for simulation of ultrasound images from computed tomography data
NASA Astrophysics Data System (ADS)
D'Amato, J. P.; Lo Vercio, L.; Rubi, P.; Fernandez Vera, E.; Barbuzza, R.; Del Fresno, M.; Larrabide, I.
2015-12-01
Background and motivation: Real-time ultrasound simulation refers to the process of computationally creating fully synthetic ultrasound images instantly. Due to the high value of specialized low cost training for healthcare professionals, there is a growing interest in the use of this technology and the development of high fidelity systems that simulate the acquisitions of echographic images. The objective is to create an efficient and reproducible simulator that can run either on notebooks or desktops using low cost devices. Materials and methods: We present an interactive ultrasound simulator based on CT data. This simulator is based on ray-casting and provides real-time interaction capabilities. The simulation of scattering that is coherent with the transducer position in real time is also introduced. Such noise is produced using a simplified model of multiplicative noise and convolution with point spread functions (PSF) tailored for this purpose. Results: The computational efficiency of scattering maps generation was revised with an improved performance. This allowed a more efficient simulation of coherent scattering in the synthetic echographic images while providing highly realistic result. We describe some quality and performance metrics to validate these results, where a performance of up to 55fps was achieved. Conclusion: The proposed technique for real-time scattering modeling provides realistic yet computationally efficient scatter distributions. The error between the original image and the simulated scattering image was compared for the proposed method and the state-of-the-art, showing negligible differences in its distribution.
Saito, Atsushi; Nawano, Shigeru; Shimizu, Akinobu
2017-05-01
This paper addresses joint optimization for segmentation and shape priors, including translation, to overcome inter-subject variability in the location of an organ. Because a simple extension of the previous exact optimization method is too computationally complex, we propose a fast approximation for optimization. The effectiveness of the proposed approximation is validated in the context of gallbladder segmentation from a non-contrast computed tomography (CT) volume. After spatial standardization and estimation of the posterior probability of the target organ, simultaneous optimization of the segmentation, shape, and location priors is performed using a branch-and-bound method. Fast approximation is achieved by combining sampling in the eigenshape space to reduce the number of shape priors and an efficient computational technique for evaluating the lower bound. Performance was evaluated using threefold cross-validation of 27 CT volumes. Optimization in terms of translation of the shape prior significantly improved segmentation performance. The proposed method achieved a result of 0.623 on the Jaccard index in gallbladder segmentation, which is comparable to that of state-of-the-art methods. The computational efficiency of the algorithm is confirmed to be good enough to allow execution on a personal computer. Joint optimization of the segmentation, shape, and location priors was proposed, and it proved to be effective in gallbladder segmentation with high computational efficiency.
Scemama, Anthony; Caffarel, Michel; Oseret, Emmanuel; Jalby, William
2013-04-30
Various strategies to implement efficiently quantum Monte Carlo (QMC) simulations for large chemical systems are presented. These include: (i) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), (ii) the possibility of keeping the memory footprint minimal, (iii) the important enhancement of single-core performance when efficient optimization tools are used, and (iv) the definition of a universal, dynamic, fault-tolerant, and load-balanced framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056, and 1731 electrons). Using 10-80 k computing cores of the Curie machine (GENCI-TGCC-CEA, France), QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible. Copyright © 2013 Wiley Periodicals, Inc.
Jagiello, Karolina; Grzonkowska, Monika; Swirog, Marta; ...
2016-08-29
In this contribution, the advantages and limitations of two computational techniques that can be used for the investigation of nanoparticles activity and toxicity: classic nano-QSAR (Quantitative Structure–Activity Relationships employed for nanomaterials) and 3D nano-QSAR (three-dimensional Quantitative Structure–Activity Relationships, such us Comparative Molecular Field Analysis, CoMFA/Comparative Molecular Similarity Indices Analysis, CoMSIA analysis employed for nanomaterials) have been briefly summarized. Both approaches were compared according to the selected criteria, including: efficiency, type of experimental data, class of nanomaterials, time required for calculations and computational cost, difficulties in the interpretation. Taking into account the advantages and limitations of each method, we provide themore » recommendations for nano-QSAR modellers and QSAR model users to be able to determine a proper and efficient methodology to investigate biological activity of nanoparticles in order to describe the underlying interactions in the most reliable and useful manner.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jagiello, Karolina; Grzonkowska, Monika; Swirog, Marta
In this contribution, the advantages and limitations of two computational techniques that can be used for the investigation of nanoparticles activity and toxicity: classic nano-QSAR (Quantitative Structure–Activity Relationships employed for nanomaterials) and 3D nano-QSAR (three-dimensional Quantitative Structure–Activity Relationships, such us Comparative Molecular Field Analysis, CoMFA/Comparative Molecular Similarity Indices Analysis, CoMSIA analysis employed for nanomaterials) have been briefly summarized. Both approaches were compared according to the selected criteria, including: efficiency, type of experimental data, class of nanomaterials, time required for calculations and computational cost, difficulties in the interpretation. Taking into account the advantages and limitations of each method, we provide themore » recommendations for nano-QSAR modellers and QSAR model users to be able to determine a proper and efficient methodology to investigate biological activity of nanoparticles in order to describe the underlying interactions in the most reliable and useful manner.« less
Coleman, Mari Beth; Cherry, Rebecca A; Moore, Tara C; Park, Yujeong; Cihak, David F
2015-06-01
The purpose of this study was to compare the effects of teacher-directed simultaneous prompting to computer-assisted simultaneous prompting for teaching sight words to 3 elementary school students with intellectual disability. Activities in the computer-assisted condition were designed with Intellitools Classroom Suite software whereas traditional materials (i.e., flashcards) were used in the teacher-directed condition. Treatment conditions were compared using an adapted alternating treatments design. Acquisition of sight words occurred in both conditions for all 3 participants; however, each participant either clearly responded better in the teacher-directed condition or reported a preference for the teacher-directed condition when performance was similar with computer-assisted instruction being more efficient. Practical implications and directions for future research are discussed.
ERIC Educational Resources Information Center
Petscher, Yaacov; Foorman, Barbara R.; Truckenmiller, Adrea J.
2017-01-01
The objective of the present study was to evaluate the extent to which students who took a computer adaptive test of reading comprehension accounting for testlet effects were administered fewer passages and had a more precise estimate of their reading comprehension ability compared to students in the control condition. A randomized controlled…
Belle II grid computing: An overview of the distributed data management system.
NASA Astrophysics Data System (ADS)
Bansal, Vikas; Schram, Malachi; Belle Collaboration, II
2017-01-01
The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.
Solving the Coupled System Improves Computational Efficiency of the Bidomain Equations
Southern, James A.; Plank, Gernot; Vigmond, Edward J.; Whiteley, Jonathan P.
2017-01-01
The bidomain equations are frequently used to model the propagation of cardiac action potentials across cardiac tissue. At the whole organ level the size of the computational mesh required makes their solution a significant computational challenge. As the accuracy of the numerical solution cannot be compromised, efficiency of the solution technique is important to ensure that the results of the simulation can be obtained in a reasonable time whilst still encapsulating the complexities of the system. In an attempt to increase efficiency of the solver, the bidomain equations are often decoupled into one parabolic equation that is computationally very cheap to solve and an elliptic equation that is much more expensive to solve. In this study the performance of this uncoupled solution method is compared with an alternative strategy in which the bidomain equations are solved as a coupled system. This seems counter-intuitive as the alternative method requires the solution of a much larger linear system at each time step. However, in tests on two 3-D rabbit ventricle benchmarks it is shown that the coupled method is up to 80% faster than the conventional uncoupled method — and that parallel performance is better for the larger coupled problem. PMID:19457741
Development of efficient time-evolution method based on three-term recurrence relation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akama, Tomoko, E-mail: a.tomo---s-b-l-r@suou.waseda.jp; Kobayashi, Osamu; Nanbu, Shinkoh, E-mail: shinkoh.nanbu@sophia.ac.jp
The advantage of the real-time (RT) propagation method is a direct solution of the time-dependent Schrödinger equation which describes frequency properties as well as all dynamics of a molecular system composed of electrons and nuclei in quantum physics and chemistry. Its applications have been limited by computational feasibility, as the evaluation of the time-evolution operator is computationally demanding. In this article, a new efficient time-evolution method based on the three-term recurrence relation (3TRR) was proposed to reduce the time-consuming numerical procedure. The basic formula of this approach was derived by introducing a transformation of the operator using the arcsine function.more » Since this operator transformation causes transformation of time, we derived the relation between original and transformed time. The formula was adapted to assess the performance of the RT time-dependent Hartree-Fock (RT-TDHF) method and the time-dependent density functional theory. Compared to the commonly used fourth-order Runge-Kutta method, our new approach decreased computational time of the RT-TDHF calculation by about factor of four, showing the 3TRR formula to be an efficient time-evolution method for reducing computational cost.« less
Improving robustness and computational efficiency using modern C++
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paterno, M.; Kowalkowski, J.; Green, C.
2014-01-01
For nearly two decades, the C++ programming language has been the dominant programming language for experimental HEP. The publication of ISO/IEC 14882:2011, the current version of the international standard for the C++ programming language, makes available a variety of language and library facilities for improving the robustness, expressiveness, and computational efficiency of C++ code. However, much of the C++ written by the experimental HEP community does not take advantage of the features of the language to obtain these benefits, either due to lack of familiarity with these features or concern that these features must somehow be computationally inefficient. In thismore » paper, we address some of the features of modern C+-+, and show how they can be used to make programs that are both robust and computationally efficient. We compare and contrast simple yet realistic examples of some common implementation patterns in C, currently-typical C++, and modern C++, and show (when necessary, down to the level of generated assembly language code) the quality of the executable code produced by recent C++ compilers, with the aim of allowing the HEP community to make informed decisions on the costs and benefits of the use of modern C++.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krueger, Jens; Micikevicius, Paulius; Williams, Samuel
Reverse Time Migration (RTM) is one of the main approaches in the seismic processing industry for imaging the subsurface structure of the Earth. While RTM provides qualitative advantages over its predecessors, it has a high computational cost warranting implementation on HPC architectures. We focus on three progressively more complex kernels extracted from RTM: for isotropic (ISO), vertical transverse isotropic (VTI) and tilted transverse isotropic (TTI) media. In this work, we examine performance optimization of forward wave modeling, which describes the computational kernels used in RTM, on emerging multi- and manycore processors and introduce a novel common subexpression elimination optimization formore » TTI kernels. We compare attained performance and energy efficiency in both the single-node and distributed memory environments in order to satisfy industry’s demands for fidelity, performance, and energy efficiency. Moreover, we discuss the interplay between architecture (chip and system) and optimizations (both on-node computation) highlighting the importance of NUMA-aware approaches to MPI communication. Ultimately, our results show we can improve CPU energy efficiency by more than 10× on Magny Cours nodes while acceleration via multiple GPUs can surpass the energy-efficient Intel Sandy Bridge by as much as 3.6×.« less
Efficiencies of joint non-local update moves in Monte Carlo simulations of coarse-grained polymers
NASA Astrophysics Data System (ADS)
Austin, Kieran S.; Marenz, Martin; Janke, Wolfhard
2018-03-01
In this study four update methods are compared in their performance in a Monte Carlo simulation of polymers in continuum space. The efficiencies of the update methods and combinations thereof are compared with the aid of the autocorrelation time with a fixed (optimal) acceptance ratio. Results are obtained for polymer lengths N = 14, 28 and 42 and temperatures below, at and above the collapse transition. In terms of autocorrelation, the optimal acceptance ratio is approximately 0.4. Furthermore, an overview of the step sizes of the update methods that correspond to this optimal acceptance ratio is given. This shall serve as a guide for future studies that rely on efficient computer simulations.
Comparative analysis of autofocus functions in digital in-line phase-shifting holography.
Fonseca, Elsa S R; Fiadeiro, Paulo T; Pereira, Manuela; Pinheiro, António
2016-09-20
Numerical reconstruction of digital holograms relies on a precise knowledge of the original object position. However, there are a number of relevant applications where this parameter is not known in advance and an efficient autofocusing method is required. This paper addresses the problem of finding optimal focusing methods for use in reconstruction of digital holograms of macroscopic amplitude and phase objects, using digital in-line phase-shifting holography in transmission mode. Fifteen autofocus measures, including spatial-, spectral-, and sparsity-based methods, were evaluated for both synthetic and experimental holograms. The Fresnel transform and the angular spectrum reconstruction methods were compared. Evaluation criteria included unimodality, accuracy, resolution, and computational cost. Autofocusing under angular spectrum propagation tends to perform better with respect to accuracy and unimodality criteria. Phase objects are, generally, more difficult to focus than amplitude objects. The normalized variance, the standard correlation, and the Tenenbaum gradient are the most reliable spatial-based metrics, combining computational efficiency with good accuracy and resolution. A good trade-off between focus performance and computational cost was found for the Fresnelet sparsity method.
Efficient and Robust Optimization for Building Energy Simulation
Pourarian, Shokouh; Kearsley, Anthony; Wen, Jin; Pertzborn, Amanda
2016-01-01
Efficiently, robustly and accurately solving large sets of structured, non-linear algebraic and differential equations is one of the most computationally expensive steps in the dynamic simulation of building energy systems. Here, the efficiency, robustness and accuracy of two commonly employed solution methods are compared. The comparison is conducted using the HVACSIM+ software package, a component based building system simulation tool. The HVACSIM+ software presently employs Powell’s Hybrid method to solve systems of nonlinear algebraic equations that model the dynamics of energy states and interactions within buildings. It is shown here that the Powell’s method does not always converge to a solution. Since a myriad of other numerical methods are available, the question arises as to which method is most appropriate for building energy simulation. This paper finds considerable computational benefits result from replacing the Powell’s Hybrid method solver in HVACSIM+ with a solver more appropriate for the challenges particular to numerical simulations of buildings. Evidence is provided that a variant of the Levenberg-Marquardt solver has superior accuracy and robustness compared to the Powell’s Hybrid method presently used in HVACSIM+. PMID:27325907
Efficient and Robust Optimization for Building Energy Simulation.
Pourarian, Shokouh; Kearsley, Anthony; Wen, Jin; Pertzborn, Amanda
2016-06-15
Efficiently, robustly and accurately solving large sets of structured, non-linear algebraic and differential equations is one of the most computationally expensive steps in the dynamic simulation of building energy systems. Here, the efficiency, robustness and accuracy of two commonly employed solution methods are compared. The comparison is conducted using the HVACSIM+ software package, a component based building system simulation tool. The HVACSIM+ software presently employs Powell's Hybrid method to solve systems of nonlinear algebraic equations that model the dynamics of energy states and interactions within buildings. It is shown here that the Powell's method does not always converge to a solution. Since a myriad of other numerical methods are available, the question arises as to which method is most appropriate for building energy simulation. This paper finds considerable computational benefits result from replacing the Powell's Hybrid method solver in HVACSIM+ with a solver more appropriate for the challenges particular to numerical simulations of buildings. Evidence is provided that a variant of the Levenberg-Marquardt solver has superior accuracy and robustness compared to the Powell's Hybrid method presently used in HVACSIM+.
Towards Dynamic Remote Data Auditing in Computational Clouds
Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server. PMID:25121114
The multifacet graphically contracted function method. I. Formulation and implementation
NASA Astrophysics Data System (ADS)
Shepard, Ron; Gidofalvi, Gergely; Brozell, Scott R.
2014-08-01
The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that both the energy and the gradient computation scale as O(N2n4) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N2 dissociation, cubic H8 dissociation, the symmetric dissociation of H2O, and the insertion of Be into H2. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.
The multifacet graphically contracted function method. I. Formulation and implementation.
Shepard, Ron; Gidofalvi, Gergely; Brozell, Scott R
2014-08-14
The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that both the energy and the gradient computation scale as O(N(2)n(4)) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N2 dissociation, cubic H8 dissociation, the symmetric dissociation of H2O, and the insertion of Be into H2. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.
Towards dynamic remote data auditing in computational clouds.
Sookhak, Mehdi; Akhunzada, Adnan; Gani, Abdullah; Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.
Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.
Yin, Zekun; Lan, Haidong; Tan, Guangming; Lu, Mian; Vasilakos, Athanasios V; Liu, Weiguo
2017-01-01
The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics.
Computationally efficient multibody simulations
NASA Technical Reports Server (NTRS)
Ramakrishnan, Jayant; Kumar, Manoj
1994-01-01
Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.
NASA Astrophysics Data System (ADS)
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
2016-09-01
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
NASA Astrophysics Data System (ADS)
Sizov, Gennadi Y.
In this dissertation, a model-based multi-objective optimal design of permanent magnet ac machines, supplied by sine-wave current regulated drives, is developed and implemented. The design procedure uses an efficient electromagnetic finite element-based solver to accurately model nonlinear material properties and complex geometric shapes associated with magnetic circuit design. Application of an electromagnetic finite element-based solver allows for accurate computation of intricate performance parameters and characteristics. The first contribution of this dissertation is the development of a rapid computational method that allows accurate and efficient exploration of large multi-dimensional design spaces in search of optimum design(s). The computationally efficient finite element-based approach developed in this work provides a framework of tools that allow rapid analysis of synchronous electric machines operating under steady-state conditions. In the developed modeling approach, major steady-state performance parameters such as, winding flux linkages and voltages, average, cogging and ripple torques, stator core flux densities, core losses, efficiencies and saturated machine winding inductances, are calculated with minimum computational effort. In addition, the method includes means for rapid estimation of distributed stator forces and three-dimensional effects of stator and/or rotor skew on the performance of the machine. The second contribution of this dissertation is the development of the design synthesis and optimization method based on a differential evolution algorithm. The approach relies on the developed finite element-based modeling method for electromagnetic analysis and is able to tackle large-scale multi-objective design problems using modest computational resources. Overall, computational time savings of up to two orders of magnitude are achievable, when compared to current and prevalent state-of-the-art methods. These computational savings allow one to expand the optimization problem to achieve more complex and comprehensive design objectives. The method is used in the design process of several interior permanent magnet industrial motors. The presented case studies demonstrate that the developed finite element-based approach practically eliminates the need for using less accurate analytical and lumped parameter equivalent circuit models for electric machine design optimization. The design process and experimental validation of the case-study machines are detailed in the dissertation.
On modelling three-dimensional piezoelectric smart structures with boundary spectral element method
NASA Astrophysics Data System (ADS)
Zou, Fangxin; Aliabadi, M. H.
2017-05-01
The computational efficiency of the boundary element method in elastodynamic analysis can be significantly improved by employing high-order spectral elements for boundary discretisation. In this work, for the first time, the so-called boundary spectral element method is utilised to formulate the piezoelectric smart structures that are widely used in structural health monitoring (SHM) applications. The resultant boundary spectral element formulation has been validated by the finite element method (FEM) and physical experiments. The new formulation has demonstrated a lower demand on computational resources and a higher numerical stability than commercial FEM packages. Comparing to the conventional boundary element formulation, a significant reduction in computational expenses has been achieved. In summary, the boundary spectral element formulation presented in this paper provides a highly efficient and stable mathematical tool for the development of SHM applications.
Aerothermal modeling program, phase 2. Element B: Flow interaction experiment
NASA Technical Reports Server (NTRS)
Nikjooy, M.; Mongia, H. C.; Murthy, S. N. B.; Sullivan, J. P.
1986-01-01
The design process was improved and the efficiency, life, and maintenance costs of the turbine engine hot section was enhanced. Recently, there has been much emphasis on the need for improved numerical codes for the design of efficient combustors. For the development of improved computational codes, there is a need for an experimentally obtained data base to be used at test cases for the accuracy of the computations. The purpose of Element-B is to establish a benchmark quality velocity and scalar measurements of the flow interaction of circular jets with swirling flow typical of that in the dome region of annular combustor. In addition to the detailed experimental effort, extensive computations of the swirling flows are to be compared with the measurements for the purpose of assessing the accuracy of current and advanced turbulence and scalar transport models.
Higgins, Eleanor L; Raskind, Marshall H
2004-12-01
This study was conducted to assess the effectiveness of two programs developed by the Frostig Center Research Department to improve the reading and spelling of students with learning disabilities (LD): a computer Speech Recognition-based Program (SRBP) and a computer and text-based Automaticity Program (AP). Twenty-eight LD students with reading and spelling difficulties (aged 8 to 18) received each program for 17 weeks and were compared with 16 students in a contrast group who did not receive either program. After adjusting for age and IQ, both the SRBP and AP groups showed significant differences over the contrast group in improving word recognition and reading comprehension. Neither program showed significant differences over contrasts in spelling. The SRBP also improved the performance of the target group when compared with the contrast group on phonological elision and nonword reading efficiency tasks. The AP showed significant differences in all process and reading efficiency measures.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids
NASA Technical Reports Server (NTRS)
Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
TU-AB-303-08: GPU-Based Software Platform for Efficient Image-Guided Adaptive Radiation Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, S; Robinson, A; McNutt, T
2015-06-15
Purpose: In this study, we develop an integrated software platform for adaptive radiation therapy (ART) that combines fast and accurate image registration, segmentation, and dose computation/accumulation methods. Methods: The proposed system consists of three key components; 1) deformable image registration (DIR), 2) automatic segmentation, and 3) dose computation/accumulation. The computationally intensive modules including DIR and dose computation have been implemented on a graphics processing unit (GPU). All required patient-specific data including the planning CT (pCT) with contours, daily cone-beam CTs, and treatment plan are automatically queried and retrieved from their own databases. To improve the accuracy of DIR between pCTmore » and CBCTs, we use the double force demons DIR algorithm in combination with iterative CBCT intensity correction by local intensity histogram matching. Segmentation of daily CBCT is then obtained by propagating contours from the pCT. Daily dose delivered to the patient is computed on the registered pCT by a GPU-accelerated superposition/convolution algorithm. Finally, computed daily doses are accumulated to show the total delivered dose to date. Results: Since the accuracy of DIR critically affects the quality of the other processes, we first evaluated our DIR method on eight head-and-neck cancer cases and compared its performance. Normalized mutual-information (NMI) and normalized cross-correlation (NCC) computed as similarity measures, and our method produced overall NMI of 0.663 and NCC of 0.987, outperforming conventional methods by 3.8% and 1.9%, respectively. Experimental results show that our registration method is more consistent and roust than existing algorithms, and also computationally efficient. Computation time at each fraction took around one minute (30–50 seconds for registration and 15–25 seconds for dose computation). Conclusion: We developed an integrated GPU-accelerated software platform that enables accurate and efficient DIR, auto-segmentation, and dose computation, thus supporting an efficient ART workflow. This work was supported by NIH/NCI under grant R42CA137886.« less
Tsumori, Nobuhiro; Takahashi, Motoki; Sakuma, Yoshiki; Saiki, Toshiharu
2011-10-10
We examined the near-field collection efficiency of near-infrared radiation for an aperture probe. We used InAs quantum dots as ideal point light sources with emission wavelengths ranging from 1.1 to 1.6 μm. We experimentally investigated the wavelength dependence of the collection efficiency and compared the results with computational simulations that modeled the actual probe structure. The observed degradation in the collection efficiency is attributed to the cutoff characteristics of the gold-clad tapered waveguide, which approaches an ideal conductor at near-infrared wavelengths. © 2011 Optical Society of America
On Improving Efficiency of Differential Evolution for Aerodynamic Shape Optimization Applications
NASA Technical Reports Server (NTRS)
Madavan, Nateri K.
2004-01-01
Differential Evolution (DE) is a simple and robust evolutionary strategy that has been proven effective in determining the global optimum for several difficult optimization problems. Although DE offers several advantages over traditional optimization approaches, its use in applications such as aerodynamic shape optimization where the objective function evaluations are computationally expensive is limited by the large number of function evaluations often required. In this paper various approaches for improving the efficiency of DE are reviewed and discussed. These approaches are implemented in a DE-based aerodynamic shape optimization method that uses a Navier-Stokes solver for the objective function evaluations. Parallelization techniques on distributed computers are used to reduce turnaround times. Results are presented for the inverse design of a turbine airfoil. The efficiency improvements achieved by the different approaches are evaluated and compared.
A Secure and Efficient Audit Mechanism for Dynamic Shared Data in Cloud Storage
2014-01-01
With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data. PMID:24959630
Fast Steerable Principal Component Analysis
Zhao, Zhizhen; Shkolnisky, Yoel; Singer, Amit
2016-01-01
Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2-D images as large as a few hundred pixels in each direction. Here, we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of 2-D images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a dataset consisting of n images of size L × L pixels, the computational complexity of our algorithm is O(nL3 + L4), while existing algorithms take O(nL4). The new algorithm computes the expansion coefficients of the images in a Fourier–Bessel basis efficiently using the nonuniform fast Fourier transform. We compare the accuracy and efficiency of the new algorithm with traditional PCA and existing algorithms for steerable PCA. PMID:27570801
A secure and efficient audit mechanism for dynamic shared data in cloud storage.
Kwon, Ohmin; Koo, Dongyoung; Shin, Yongjoo; Yoon, Hyunsoo
2014-01-01
With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Coalescent: an open-science framework for importance sampling in coalescent theory.
Tewari, Susanta; Spouge, John L
2015-01-01
Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called "just-in-time delegation" available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the "2(8) programs problem" highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes.
NASA Astrophysics Data System (ADS)
Liu, Hongcheng; Dong, Peng; Xing, Lei
2017-08-01
{{\\ell }2,1} -minimization-based sparse optimization was employed to solve the beam angle optimization (BAO) in intensity-modulated radiation therapy (IMRT) planning. The technique approximates the exact BAO formulation with efficiently computable convex surrogates, leading to plans that are inferior to those attainable with recently proposed gradient-based greedy schemes. In this paper, we alleviate/reduce the nontrivial inconsistencies between the {{\\ell }2,1} -based formulations and the exact BAO model by proposing a new sparse optimization framework based on the most recent developments in group variable selection. We propose the incorporation of the group-folded concave penalty (gFCP) as a substitution to the {{\\ell }2,1} -minimization framework. The new formulation is then solved by a variation of an existing gradient method. The performance of the proposed scheme is evaluated by both plan quality and the computational efficiency using three IMRT cases: a coplanar prostate case, a coplanar head-and-neck case, and a noncoplanar liver case. Involved in the evaluation are two alternative schemes: the {{\\ell }2,1} -minimization approach and the gradient norm method (GNM). The gFCP-based scheme outperforms both counterpart approaches. In particular, gFCP generates better plans than those obtained using the {{\\ell }2,1} -minimization for all three cases with a comparable computation time. As compared to the GNM, the gFCP improves both the plan quality and computational efficiency. The proposed gFCP-based scheme provides a promising framework for BAO and promises to improve both planning time and plan quality.
Effective orthorhombic anisotropic models for wavefield extrapolation
NASA Astrophysics Data System (ADS)
Ibanez-Jacome, Wilson; Alkhalifah, Tariq; Waheed, Umair bin
2014-09-01
Wavefield extrapolation in orthorhombic anisotropic media incorporates complicated but realistic models to reproduce wave propagation phenomena in the Earth's subsurface. Compared with the representations used for simpler symmetries, such as transversely isotropic or isotropic, orthorhombic models require an extended and more elaborated formulation that also involves more expensive computational processes. The acoustic assumption yields more efficient description of the orthorhombic wave equation that also provides a simplified representation for the orthorhombic dispersion relation. However, such representation is hampered by the sixth-order nature of the acoustic wave equation, as it also encompasses the contribution of shear waves. To reduce the computational cost of wavefield extrapolation in such media, we generate effective isotropic inhomogeneous models that are capable of reproducing the first-arrival kinematic aspects of the orthorhombic wavefield. First, in order to compute traveltimes in vertical orthorhombic media, we develop a stable, efficient and accurate algorithm based on the fast marching method. The derived orthorhombic acoustic dispersion relation, unlike the isotropic or transversely isotropic ones, is represented by a sixth order polynomial equation with the fastest solution corresponding to outgoing P waves in acoustic media. The effective velocity models are then computed by evaluating the traveltime gradients of the orthorhombic traveltime solution, and using them to explicitly evaluate the corresponding inhomogeneous isotropic velocity field. The inverted effective velocity fields are source dependent and produce equivalent first-arrival kinematic descriptions of wave propagation in orthorhombic media. We extrapolate wavefields in these isotropic effective velocity models using the more efficient isotropic operator, and the results compare well, especially kinematically, with those obtained from the more expensive anisotropic extrapolator.
Liu, Hongcheng; Dong, Peng; Xing, Lei
2017-07-20
[Formula: see text]-minimization-based sparse optimization was employed to solve the beam angle optimization (BAO) in intensity-modulated radiation therapy (IMRT) planning. The technique approximates the exact BAO formulation with efficiently computable convex surrogates, leading to plans that are inferior to those attainable with recently proposed gradient-based greedy schemes. In this paper, we alleviate/reduce the nontrivial inconsistencies between the [Formula: see text]-based formulations and the exact BAO model by proposing a new sparse optimization framework based on the most recent developments in group variable selection. We propose the incorporation of the group-folded concave penalty (gFCP) as a substitution to the [Formula: see text]-minimization framework. The new formulation is then solved by a variation of an existing gradient method. The performance of the proposed scheme is evaluated by both plan quality and the computational efficiency using three IMRT cases: a coplanar prostate case, a coplanar head-and-neck case, and a noncoplanar liver case. Involved in the evaluation are two alternative schemes: the [Formula: see text]-minimization approach and the gradient norm method (GNM). The gFCP-based scheme outperforms both counterpart approaches. In particular, gFCP generates better plans than those obtained using the [Formula: see text]-minimization for all three cases with a comparable computation time. As compared to the GNM, the gFCP improves both the plan quality and computational efficiency. The proposed gFCP-based scheme provides a promising framework for BAO and promises to improve both planning time and plan quality.
Teodoro, George; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel
2014-01-01
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). PMID:25419088
Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.
Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie
2016-04-01
Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
NASA Astrophysics Data System (ADS)
Khursheed, Khursheed; Imran, Muhammad; Ahmad, Naeem; O'Nils, Mattias
2012-06-01
Wireless Visual Sensor Network (WVSN) is an emerging field which combines image sensor, on board computation unit, communication component and energy source. Compared to the traditional wireless sensor network, which operates on one dimensional data, such as temperature, pressure values etc., WVSN operates on two dimensional data (images) which requires higher processing power and communication bandwidth. Normally, WVSNs are deployed in areas where installation of wired solutions is not feasible. The energy budget in these networks is limited to the batteries, because of the wireless nature of the application. Due to the limited availability of energy, the processing at Visual Sensor Nodes (VSN) and communication from VSN to server should consume as low energy as possible. Transmission of raw images wirelessly consumes a lot of energy and requires higher communication bandwidth. Data compression methods reduce data efficiently and hence will be effective in reducing communication cost in WVSN. In this paper, we have compared the compression efficiency and complexity of six well known bi-level image compression methods. The focus is to determine the compression algorithms which can efficiently compress bi-level images and their computational complexity is suitable for computational platform used in WVSNs. These results can be used as a road map for selection of compression methods for different sets of constraints in WVSN.
Bozkaya, Uğur
2018-03-15
Efficient implementations of analytic gradients for the orbital-optimized MP3 and MP2.5 and their standard versions with the density-fitting approximation, which are denoted as DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5, are presented. The DF-MP3, DF-MP2.5, DF-OMP3, and DF-OMP2.5 methods are applied to a set of alkanes and noncovalent interaction complexes to compare the computational cost with the conventional MP3, MP2.5, OMP3, and OMP2.5. Our results demonstrate that density-fitted perturbation theory (DF-MP) methods considered substantially reduce the computational cost compared to conventional MP methods. The efficiency of our DF-MP methods arise from the reduced input/output (I/O) time and the acceleration of gradient related terms, such as computations of particle density and generalized Fock matrices (PDMs and GFM), solution of the Z-vector equation, back-transformations of PDMs and GFM, and evaluation of analytic gradients in the atomic orbital basis. Further, application results show that errors introduced by the DF approach are negligible. Mean absolute errors for bond lengths of a molecular set, with the cc-pCVQZ basis set, is 0.0001-0.0002 Å. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Kaplan, Abdullah; Özturk, Mesut; Ertör, Eren
2013-01-01
This study aims to compare computer-aided instruction, creative drama and traditional teaching methods in teaching of Integers to the seventh grade students. The study was conducted in a primary school with eighty-seven students (N=87) in a county of Agri, in spring term of academic year 2011-2012. A non equivalent control group quasi experimental…
Far-field radiation patterns of aperture antennas by the Winograd Fourier transform algorithm
NASA Technical Reports Server (NTRS)
Heisler, R.
1978-01-01
A more time-efficient algorithm for computing the discrete Fourier transform, the Winograd Fourier transform (WFT), is described. The WFT algorithm is compared with other transform algorithms. Results indicate that the WFT algorithm in antenna analysis appears to be a very successful application. Significant savings in cpu time will improve the computer turn around time and circumvent the need to resort to weekend runs.
Complex-energy approach to sum rules within nuclear density functional theory
Hinohara, Nobuo; Kortelainen, Markus; Nazarewicz, Witold; ...
2015-04-27
The linear response of the nucleus to an external field contains unique information about the effective interaction, correlations governing the behavior of the many-body system, and properties of its excited states. To characterize the response, it is useful to use its energy-weighted moments, or sum rules. By comparing computed sum rules with experimental values, the information content of the response can be utilized in the optimization process of the nuclear Hamiltonian or nuclear energy density functional (EDF). But the additional information comes at a price: compared to the ground state, computation of excited states is more demanding. To establish anmore » efficient framework to compute energy-weighted sum rules of the response that is adaptable to the optimization of the nuclear EDF and large-scale surveys of collective strength, we have developed a new technique within the complex-energy finite-amplitude method (FAM) based on the quasiparticle random- phase approximation. The proposed sum-rule technique based on the complex-energy FAM is a tool of choice when optimizing effective interactions or energy functionals. The method is very efficient and well-adaptable to parallel computing. As a result, the FAM formulation is especially useful when standard theorems based on commutation relations involving the nuclear Hamiltonian and external field cannot be used.« less
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems. PMID:25143977
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Memristor-Based Analog Computation and Neural Network Classification with a Dot Product Engine.
Hu, Miao; Graves, Catherine E; Li, Can; Li, Yunning; Ge, Ning; Montgomery, Eric; Davila, Noraica; Jiang, Hao; Williams, R Stanley; Yang, J Joshua; Xia, Qiangfei; Strachan, John Paul
2018-03-01
Using memristor crossbar arrays to accelerate computations is a promising approach to efficiently implement algorithms in deep neural networks. Early demonstrations, however, are limited to simulations or small-scale problems primarily due to materials and device challenges that limit the size of the memristor crossbar arrays that can be reliably programmed to stable and analog values, which is the focus of the current work. High-precision analog tuning and control of memristor cells across a 128 × 64 array is demonstrated, and the resulting vector matrix multiplication (VMM) computing precision is evaluated. Single-layer neural network inference is performed in these arrays, and the performance compared to a digital approach is assessed. Memristor computing system used here reaches a VMM accuracy equivalent of 6 bits, and an 89.9% recognition accuracy is achieved for the 10k MNIST handwritten digit test set. Forecasts show that with integrated (on chip) and scaled memristors, a computational efficiency greater than 100 trillion operations per second per Watt is possible. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Automatic Blocking Of QR and LU Factorizations for Locality
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yi, Q; Kennedy, K; You, H
2004-03-26
QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, moremore » benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raj, Abhijeet; Sander, Markus; Janardhanan, Vinod
2010-03-15
This paper presents a theoretical study on the physical interaction between polycyclic aromatic hydrocarbons (PAHs) and their clusters of different sizes in laminar premixed flames. Two models are employed for this study: a detailed PAH growth model, referred to as the kinetic Monte Carlo - aromatic site (KMC-ARS) model [Raj et al., Combust. Flame 156 (2009) 896-913]; and a multivariate PAH population balance model, referred to as the PAH - primary particle (PAH-PP) model. Both the models are solved by kinetic Monte Carlo methods. PAH mass spectra are generated using the PAH-PP model, and compared to the experimentally observed spectramore » for a laminar premixed ethylene flame. The position of the maxima of PAH dimers in the spectra and their concentrations are found to depend strongly on the collision efficiency of PAH coagulation. The variation in the collision efficiency with various flame and PAH parameters is studied to determine the factors on which it may depend. A correlation for the collision efficiency is proposed by comparing the computed and the observed spectra for an ethylene flame. With this correlation, a good agreement between the computed and the observed spectra for a number of laminar premixed ethylene flames is found. (author)« less
NASA Astrophysics Data System (ADS)
Zhang, H.-m.; Chen, X.-f.; Chang, S.
- It is difficult to compute synthetic seismograms for a layered half-space with sources and receivers at close to or the same depths using the generalized R/T coefficient method (Kennett, 1983; Luco and Apsel, 1983; Yao and Harkrider, 1983; Chen, 1993), because the wavenumber integration converges very slowly. A semi-analytic method for accelerating the convergence, in which part of the integration is implemented analytically, was adopted by some authors (Apsel and Luco, 1983; Hisada, 1994, 1995). In this study, based on the principle of the Repeated Averaging Method (Dahlquist and Björck, 1974; Chang, 1988), we propose an alternative, efficient, numerical method, the peak-trough averaging method (PTAM), to overcome the difficulty mentioned above. Compared with the semi-analytic method, PTAM is not only much simpler mathematically and easier to implement in practice, but also more efficient. Using numerical examples, we illustrate the validity, accuracy and efficiency of the new method.
Efficient Multiple Kernel Learning Algorithms Using Low-Rank Representation.
Niu, Wenjia; Xia, Kewen; Zu, Baokai; Bai, Jianchuan
2017-01-01
Unlike Support Vector Machine (SVM), Multiple Kernel Learning (MKL) allows datasets to be free to choose the useful kernels based on their distribution characteristics rather than a precise one. It has been shown in the literature that MKL holds superior recognition accuracy compared with SVM, however, at the expense of time consuming computations. This creates analytical and computational difficulties in solving MKL algorithms. To overcome this issue, we first develop a novel kernel approximation approach for MKL and then propose an efficient Low-Rank MKL (LR-MKL) algorithm by using the Low-Rank Representation (LRR). It is well-acknowledged that LRR can reduce dimension while retaining the data features under a global low-rank constraint. Furthermore, we redesign the binary-class MKL as the multiclass MKL based on pairwise strategy. Finally, the recognition effect and efficiency of LR-MKL are verified on the datasets Yale, ORL, LSVT, and Digit. Experimental results show that the proposed LR-MKL algorithm is an efficient kernel weights allocation method in MKL and boosts the performance of MKL largely.
Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup
Kudtarkar, Parul; DeLuca, Todd F.; Fusaro, Vincent A.; Tonellato, Peter J.; Wall, Dennis P.
2010-01-01
Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure. PMID:21258651
NASA Astrophysics Data System (ADS)
Limmer, Steffen; Fey, Dietmar
2013-07-01
Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Validation of a Computational Fluid Dynamics (CFD) Code for Supersonic Axisymmetric Base Flow
NASA Technical Reports Server (NTRS)
Tucker, P. Kevin
1993-01-01
The ability to accurately and efficiently calculate the flow structure in the base region of bodies of revolution in supersonic flight is a significant step in CFD code validation for applications ranging from base heating for rockets to drag for protectives. The FDNS code is used to compute such a flow and the results are compared to benchmark quality experimental data. Flowfield calculations are presented for a cylindrical afterbody at M = 2.46 and angle of attack a = O. Grid independent solutions are compared to mean velocity profiles in the separated wake area and downstream of the reattachment point. Additionally, quantities such as turbulent kinetic energy and shear layer growth rates are compared to the data. Finally, the computed base pressures are compared to the measured values. An effort is made to elucidate the role of turbulence models in the flowfield predictions. The level of turbulent eddy viscosity, and its origin, are used to contrast the various turbulence models and compare the results to the experimental data.
A strategy for improved computational efficiency of the method of anchored distributions
NASA Astrophysics Data System (ADS)
Over, Matthew William; Yang, Yarong; Chen, Xingyuan; Rubin, Yoram
2013-06-01
This paper proposes a strategy for improving the computational efficiency of model inversion using the method of anchored distributions (MAD) by "bundling" similar model parametrizations in the likelihood function. Inferring the likelihood function typically requires a large number of forward model (FM) simulations for each possible model parametrization; as a result, the process is quite expensive. To ease this prohibitive cost, we present an approximation for the likelihood function called bundling that relaxes the requirement for high quantities of FM simulations. This approximation redefines the conditional statement of the likelihood function as the probability of a set of similar model parametrizations "bundle" replicating field measurements, which we show is neither a model reduction nor a sampling approach to improving the computational efficiency of model inversion. To evaluate the effectiveness of these modifications, we compare the quality of predictions and computational cost of bundling relative to a baseline MAD inversion of 3-D flow and transport model parameters. Additionally, to aid understanding of the implementation we provide a tutorial for bundling in the form of a sample data set and script for the R statistical computing language. For our synthetic experiment, bundling achieved a 35% reduction in overall computational cost and had a limited negative impact on predicted probability distributions of the model parameters. Strategies for minimizing error in the bundling approximation, for enforcing similarity among the sets of model parametrizations, and for identifying convergence of the likelihood function are also presented.
A Very High Order, Adaptable MESA Implementation for Aeroacoustic Computations
NASA Technical Reports Server (NTRS)
Dydson, Roger W.; Goodrich, John W.
2000-01-01
Since computational efficiency and wave resolution scale with accuracy, the ideal would be infinitely high accuracy for problems with widely varying wavelength scales. Currently, many of the computational aeroacoustics methods are limited to 4th order accurate Runge-Kutta methods in time which limits their resolution and efficiency. However, a new procedure for implementing the Modified Expansion Solution Approximation (MESA) schemes, based upon Hermitian divided differences, is presented which extends the effective accuracy of the MESA schemes to 57th order in space and time when using 128 bit floating point precision. This new approach has the advantages of reducing round-off error, being easy to program. and is more computationally efficient when compared to previous approaches. Its accuracy is limited only by the floating point hardware. The advantages of this new approach are demonstrated by solving the linearized Euler equations in an open bi-periodic domain. A 500th order MESA scheme can now be created in seconds, making these schemes ideally suited for the next generation of high performance 256-bit (double quadruple) or higher precision computers. This ease of creation makes it possible to adapt the algorithm to the mesh in time instead of its converse: this is ideal for resolving varying wavelength scales which occur in noise generation simulations. And finally, the sources of round-off error which effect the very high order methods are examined and remedies provided that effectively increase the accuracy of the MESA schemes while using current computer technology.
Two-dimensional nonsteady viscous flow simulation on the Navier-Stokes computer miniNode
NASA Technical Reports Server (NTRS)
Nosenchuck, Daniel M.; Littman, Michael G.; Flannery, William
1986-01-01
The needs of large-scale scientific computation are outpacing the growth in performance of mainframe supercomputers. In particular, problems in fluid mechanics involving complex flow simulations require far more speed and capacity than that provided by current and proposed Class VI supercomputers. To address this concern, the Navier-Stokes Computer (NSC) was developed. The NSC is a parallel-processing machine, comprised of individual Nodes, each comparable in performance to current supercomputers. The global architecture is that of a hypercube, and a 128-Node NSC has been designed. New architectural features, such as a reconfigurable many-function ALU pipeline and a multifunction memory-ALU switch, have provided the capability to efficiently implement a wide range of algorithms. Efficient algorithms typically involve numerically intensive tasks, which often include conditional operations. These operations may be efficiently implemented on the NSC without, in general, sacrificing vector-processing speed. To illustrate the architecture, programming, and several of the capabilities of the NSC, the simulation of two-dimensional, nonsteady viscous flows on a prototype Node, called the miniNode, is presented.
Paraskevopoulou, Sivylla E; Barsakcioglu, Deren Y; Saberi, Mohammed R; Eftekhar, Amir; Constandinou, Timothy G
2013-04-30
Next generation neural interfaces aspire to achieve real-time multi-channel systems by integrating spike sorting on chip to overcome limitations in communication channel capacity. The feasibility of this approach relies on developing highly efficient algorithms for feature extraction and clustering with the potential of low-power hardware implementation. We are proposing a feature extraction method, not requiring any calibration, based on first and second derivative features of the spike waveform. The accuracy and computational complexity of the proposed method are quantified and compared against commonly used feature extraction methods, through simulation across four datasets (with different single units) at multiple noise levels (ranging from 5 to 20% of the signal amplitude). The average classification error is shown to be below 7% with a computational complexity of 2N-3, where N is the number of sample points of each spike. Overall, this method presents a good trade-off between accuracy and computational complexity and is thus particularly well-suited for hardware-efficient implementation. Copyright © 2013 Elsevier B.V. All rights reserved.
Computational Approaches to Simulation and Optimization of Global Aircraft Trajectories
NASA Technical Reports Server (NTRS)
Ng, Hok Kwan; Sridhar, Banavar
2016-01-01
This study examines three possible approaches to improving the speed in generating wind-optimal routes for air traffic at the national or global level. They are: (a) using the resources of a supercomputer, (b) running the computations on multiple commercially available computers and (c) implementing those same algorithms into NASAs Future ATM Concepts Evaluation Tool (FACET) and compares those to a standard implementation run on a single CPU. Wind-optimal aircraft trajectories are computed using global air traffic schedules. The run time and wait time on the supercomputer for trajectory optimization using various numbers of CPUs ranging from 80 to 10,240 units are compared with the total computational time for running the same computation on a single desktop computer and on multiple commercially available computers for potential computational enhancement through parallel processing on the computer clusters. This study also re-implements the trajectory optimization algorithm for further reduction of computational time through algorithm modifications and integrates that with FACET to facilitate the use of the new features which calculate time-optimal routes between worldwide airport pairs in a wind field for use with existing FACET applications. The implementations of trajectory optimization algorithms use MATLAB, Python, and Java programming languages. The performance evaluations are done by comparing their computational efficiencies and based on the potential application of optimized trajectories. The paper shows that in the absence of special privileges on a supercomputer, a cluster of commercially available computers provides a feasible approach for national and global air traffic system studies.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
FAST: framework for heterogeneous medical image computing and visualization.
Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank
2015-11-01
Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
Comparative study of minutiae selection algorithms for ISO fingerprint templates
NASA Astrophysics Data System (ADS)
Vibert, B.; Charrier, C.; Le Bars, J.-M.; Rosenberger, C.
2015-03-01
We address the selection of fingerprint minutiae given a fingerprint ISO template. Minutiae selection plays a very important role when a secure element (i.e. a smart-card) is used. Because of the limited capability of computation and memory, the number of minutiae of a stored reference in the secure element is limited. We propose in this paper a comparative study of 6 minutiae selection methods including 2 methods from the literature and 1 like reference (No Selection). Experimental results on 3 fingerprint databases from the Fingerprint Verification Competition show their relative efficiency in terms of performance and computation time.
Using 3D computer simulations to enhance ophthalmic training.
Glittenberg, C; Binder, S
2006-01-01
To develop more effective methods of demonstrating and teaching complex topics in ophthalmology with the use of computer aided three-dimensional (3D) animation and interactive multimedia technologies. We created 3D animations and interactive computer programmes demonstrating the neuroophthalmological nature of the oculomotor system, including the anatomy, physiology and pathophysiology of the extra-ocular eye muscles and the oculomotor cranial nerves, as well as pupillary symptoms of neurological diseases. At the University of Vienna we compared their teaching effectiveness to conventional teaching methods in a comparative study involving 100 medical students, a multiple choice exam and a survey. The comparative study showed that our students achieved significantly better test results (80%) than the control group (63%) (diff. = 17 +/- 5%, p = 0.004). The survey showed a positive reaction to the software and a strong preference to have more subjects and techniques demonstrated in this fashion. Three-dimensional computer animation technology can significantly increase the quality and efficiency of the education and demonstration of complex topics in ophthalmology.
SLIC superpixels compared to state-of-the-art superpixel methods.
Achanta, Radhakrishna; Shaji, Appu; Smith, Kevin; Lucchi, Aurelien; Fua, Pascal; Süsstrunk, Sabine
2012-11-01
Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
A network of spiking neurons for computing sparse representations in an energy efficient way
Hu, Tao; Genkin, Alexander; Chklovskii, Dmitri B.
2013-01-01
Computing sparse redundant representations is an important problem both in applied mathematics and neuroscience. In many applications, this problem must be solved in an energy efficient way. Here, we propose a hybrid distributed algorithm (HDA), which solves this problem on a network of simple nodes communicating via low-bandwidth channels. HDA nodes perform both gradient-descent-like steps on analog internal variables and coordinate-descent-like steps via quantized external variables communicated to each other. Interestingly, such operation is equivalent to a network of integrate-and-fire neurons, suggesting that HDA may serve as a model of neural computation. We compare the numerical performance of HDA with existing algorithms and show that in the asymptotic regime the representation error of HDA decays with time, t, as 1/t. We show that HDA is stable against time-varying noise, specifically, the representation error decays as 1/t for Gaussian white noise. PMID:22920853
Probabilistic simulation of multi-scale composite behavior
NASA Technical Reports Server (NTRS)
Liaw, D. G.; Shiao, M. C.; Singhal, S. N.; Chamis, Christos C.
1993-01-01
A methodology is developed to computationally assess the probabilistic composite material properties at all composite scale levels due to the uncertainties in the constituent (fiber and matrix) properties and in the fabrication process variables. The methodology is computationally efficient for simulating the probability distributions of material properties. The sensitivity of the probabilistic composite material property to each random variable is determined. This information can be used to reduce undesirable uncertainties in material properties at the macro scale of the composite by reducing the uncertainties in the most influential random variables at the micro scale. This methodology was implemented into the computer code PICAN (Probabilistic Integrated Composite ANalyzer). The accuracy and efficiency of this methodology are demonstrated by simulating the uncertainties in the material properties of a typical laminate and comparing the results with the Monte Carlo simulation method. The experimental data of composite material properties at all scales fall within the scatters predicted by PICAN.
Computationally Efficient 2D DOA Estimation with Uniform Rectangular Array in Low-Grazing Angle.
Shi, Junpeng; Hu, Guoping; Zhang, Xiaofei; Sun, Fenggang; Xiao, Yu
2017-02-26
In this paper, we propose a computationally efficient spatial differencing matrix set (SDMS) method for two-dimensional direction of arrival (2D DOA) estimation with uniform rectangular arrays (URAs) in a low-grazing angle (LGA) condition. By rearranging the auto-correlation and cross-correlation matrices in turn among different subarrays, the SDMS method can estimate the two parameters independently with one-dimensional (1D) subspace-based estimation techniques, where we only perform difference for auto-correlation matrices and the cross-correlation matrices are kept completely. Then, the pair-matching of two parameters is achieved by extracting the diagonal elements of URA. Thus, the proposed method can decrease the computational complexity, suppress the effect of additive noise and also have little information loss. Simulation results show that, in LGA, compared to other methods, the proposed methods can achieve performance improvement in the white or colored noise conditions.
Computationally Efficient 2D DOA Estimation with Uniform Rectangular Array in Low-Grazing Angle
Shi, Junpeng; Hu, Guoping; Zhang, Xiaofei; Sun, Fenggang; Xiao, Yu
2017-01-01
In this paper, we propose a computationally efficient spatial differencing matrix set (SDMS) method for two-dimensional direction of arrival (2D DOA) estimation with uniform rectangular arrays (URAs) in a low-grazing angle (LGA) condition. By rearranging the auto-correlation and cross-correlation matrices in turn among different subarrays, the SDMS method can estimate the two parameters independently with one-dimensional (1D) subspace-based estimation techniques, where we only perform difference for auto-correlation matrices and the cross-correlation matrices are kept completely. Then, the pair-matching of two parameters is achieved by extracting the diagonal elements of URA. Thus, the proposed method can decrease the computational complexity, suppress the effect of additive noise and also have little information loss. Simulation results show that, in LGA, compared to other methods, the proposed methods can achieve performance improvement in the white or colored noise conditions. PMID:28245634
PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension.
Chen, Feng; Wang, Chenghong; Dai, Wenrui; Jiang, Xiaoqian; Mohammed, Noman; Al Aziz, Md Momin; Sadat, Md Nazmus; Sahinalp, Cenk; Lauter, Kristin; Wang, Shuang
2017-07-26
Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.
Cascaded spintronic logic with low-dimensional carbon
NASA Astrophysics Data System (ADS)
Friedman, Joseph S.; Girdhar, Anuj; Gelfand, Ryan M.; Memik, Gokhan; Mohseni, Hooman; Taflove, Allen; Wessels, Bruce W.; Leburton, Jean-Pierre; Sahakian, Alan V.
2017-06-01
Remarkable breakthroughs have established the functionality of graphene and carbon nanotube transistors as replacements to silicon in conventional computing structures, and numerous spintronic logic gates have been presented. However, an efficient cascaded logic structure that exploits electron spin has not yet been demonstrated. In this work, we introduce and analyse a cascaded spintronic computing system composed solely of low-dimensional carbon materials. We propose a spintronic switch based on the recent discovery of negative magnetoresistance in graphene nanoribbons, and demonstrate its feasibility through tight-binding calculations of the band structure. Covalently connected carbon nanotubes create magnetic fields through graphene nanoribbons, cascading logic gates through incoherent spintronic switching. The exceptional material properties of carbon materials permit Terahertz operation and two orders of magnitude decrease in power-delay product compared to cutting-edge microprocessors. We hope to inspire the fabrication of these cascaded logic circuits to stimulate a transformative generation of energy-efficient computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
2016-09-01
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Optoelectronic Reservoir Computing
Paquot, Y.; Duport, F.; Smerieri, A.; Dambre, J.; Schrauwen, B.; Haelterman, M.; Massar, S.
2012-01-01
Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an optoelectronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations. PMID:22371825
NASA Astrophysics Data System (ADS)
Shcherbakov, V.; Ahlkrona, J.
2016-12-01
In this work we develop a highly efficient meshfree approach to ice sheet modeling. Traditionally mesh based methods such as finite element methods are employed to simulate glacier and ice sheet dynamics. These methods are mature and well developed. However, despite of numerous advantages these methods suffer from some drawbacks such as necessity to remesh the computational domain every time it changes its shape, which significantly complicates the implementation on moving domains, or a costly assembly procedure for nonlinear problems. We introduce a novel meshfree approach that frees us from all these issues. The approach is built upon a radial basis function (RBF) method that, thanks to its meshfree nature, allows for an efficient handling of moving margins and free ice surface. RBF methods are also accurate and easy to implement. Since the formulation is stated in strong form it allows for a substantial reduction of the computational cost associated with the linear system assembly inside the nonlinear solver. We implement a global RBF method that defines an approximation on the entire computational domain. This method exhibits high accuracy properties. However, it suffers from a disadvantage that the coefficient matrix is dense, and therefore the computational efficiency decreases. In order to overcome this issue we also implement a localized RBF method that rests upon a partition of unity approach to subdivide the domain into several smaller subdomains. The radial basis function partition of unity method (RBF-PUM) inherits high approximation characteristics form the global RBF method while resulting in a sparse system of equations, which essentially increases the computational efficiency. To demonstrate the usefulness of the RBF methods we model the velocity field of ice flow in the Haut Glacier d'Arolla. We assume that the flow is governed by the nonlinear Blatter-Pattyn equations. We test the methods for different basal conditions and for a free moving surface. Both RBF methods are compared with a classical finite element method in terms of accuracy and efficiency. We find that the RBF methods are more efficient than the finite element method and well suited for ice dynamics modeling, especially the partition of unity approach.
On Improving Efficiency of Differential Evolution for Aerodynamic Shape Optimization Applications
NASA Technical Reports Server (NTRS)
Madavan, Nateri K.
2004-01-01
Differential Evolution (DE) is a simple and robust evolutionary strategy that has been provEn effective in determining the global optimum for several difficult optimization problems. Although DE offers several advantages over traditional optimization approaches, its use in applications such as aerodynamic shape optimization where the objective function evaluations are computationally expensive is limited by the large number of function evaluations often required. In this paper various approaches for improving the efficiency of DE are reviewed and discussed. Several approaches that have proven effective for other evolutionary algorithms are modified and implemented in a DE-based aerodynamic shape optimization method that uses a Navier-Stokes solver for the objective function evaluations. Parallelization techniques on distributed computers are used to reduce turnaround times. Results are presented for standard test optimization problems and for the inverse design of a turbine airfoil. The efficiency improvements achieved by the different approaches are evaluated and compared.
NASA Astrophysics Data System (ADS)
Meng, Zeng; Yang, Dixiong; Zhou, Huanlin; Yu, Bo
2018-05-01
The first order reliability method has been extensively adopted for reliability-based design optimization (RBDO), but it shows inaccuracy in calculating the failure probability with highly nonlinear performance functions. Thus, the second order reliability method is required to evaluate the reliability accurately. However, its application for RBDO is quite challenge owing to the expensive computational cost incurred by the repeated reliability evaluation and Hessian calculation of probabilistic constraints. In this article, a new improved stability transformation method is proposed to search the most probable point efficiently, and the Hessian matrix is calculated by the symmetric rank-one update. The computational capability of the proposed method is illustrated and compared to the existing RBDO approaches through three mathematical and two engineering examples. The comparison results indicate that the proposed method is very efficient and accurate, providing an alternative tool for RBDO of engineering structures.
The design of sport and touring aircraft
NASA Technical Reports Server (NTRS)
Eppler, R.; Guenther, W.
1984-01-01
General considerations concerning the design of a new aircraft are discussed, taking into account the objective to develop an aircraft can satisfy economically a certain spectrum of tasks. Requirements related to the design of sport and touring aircraft included in the past mainly a high cruising speed and short take-off and landing runs. Additional requirements for new aircraft are now low fuel consumption and optimal efficiency. A computer program for the computation of flight performance makes it possible to vary automatically a number of parameters, such as flight altitude, wing area, and wing span. The appropriate design characteristics are to a large extent determined by the selection of the flight altitude. Three different wing profiles are compared. Potential improvements with respect to the performance of the aircraft and its efficiency are related to the use of fiber composites, the employment of better propeller profiles, more efficient engines, and the utilization of suitable instrumentation for optimal flight conduction.
Increasing thermal efficiency of solar flat plate collectors
NASA Astrophysics Data System (ADS)
Pona, J.
A study of methods to increase the efficiency of heat transfer in flat plate solar collectors is presented. In order to increase the heat transfer from the absorber plate to the working fluid inside the tubes, turbulent flow was induced by installing baffles within the tubes. The installation of the baffles resulted in a 7 to 12% increase in collector efficiency. Experiments were run on both 1 sq ft and 2 sq ft collectors each fitted with either slotted baffles or tubular baffles. A computer program was run comparing the baffled collector to the standard collector. The results obtained from the computer show that the baffled collectors have a 2.7% increase in life cycle cost (LCC) savings and a 3.6% increase in net cash flow for use in domestic hot water systems, and even greater increases when used in solar heating systems.
Crack Damage Detection Method via Multiple Visual Features and Efficient Multi-Task Learning Model.
Wang, Baoxian; Zhao, Weigang; Gao, Po; Zhang, Yufeng; Wang, Zhe
2018-06-02
This paper proposes an effective and efficient model for concrete crack detection. The presented work consists of two modules: multi-view image feature extraction and multi-task crack region detection. Specifically, multiple visual features (such as texture, edge, etc.) of image regions are calculated, which can suppress various background noises (such as illumination, pockmark, stripe, blurring, etc.). With the computed multiple visual features, a novel crack region detector is advocated using a multi-task learning framework, which involves restraining the variability for different crack region features and emphasizing the separability between crack region features and complex background ones. Furthermore, the extreme learning machine is utilized to construct this multi-task learning model, thereby leading to high computing efficiency and good generalization. Experimental results of the practical concrete images demonstrate that the developed algorithm can achieve favorable crack detection performance compared with traditional crack detectors.
DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data.
Gaspar, John M; Hart, Ronald P
2017-11-29
DNA methylation is an epigenetic modification that is studied at a single-base resolution with bisulfite treatment followed by high-throughput sequencing. After alignment of the sequence reads to a reference genome, methylation counts are analyzed to determine genomic regions that are differentially methylated between two or more biological conditions. Even though a variety of software packages is available for different aspects of the bioinformatics analysis, they often produce results that are biased or require excessive computational requirements. DMRfinder is a novel computational pipeline that identifies differentially methylated regions efficiently. Following alignment, DMRfinder extracts methylation counts and performs a modified single-linkage clustering of methylation sites into genomic regions. It then compares methylation levels using beta-binomial hierarchical modeling and Wald tests. Among its innovative attributes are the analyses of novel methylation sites and methylation linkage, as well as the simultaneous statistical analysis of multiple sample groups. To demonstrate its efficiency, DMRfinder is benchmarked against other computational approaches using a large published dataset. Contrasting two replicates of the same sample yielded minimal genomic regions with DMRfinder, whereas two alternative software packages reported a substantial number of false positives. Further analyses of biological samples revealed fundamental differences between DMRfinder and another software package, despite the fact that they utilize the same underlying statistical basis. For each step, DMRfinder completed the analysis in a fraction of the time required by other software. Among the computational approaches for identifying differentially methylated regions from high-throughput bisulfite sequencing datasets, DMRfinder is the first that integrates all the post-alignment steps in a single package. Compared to other software, DMRfinder is extremely efficient and unbiased in this process. DMRfinder is free and open-source software, available on GitHub ( github.com/jsh58/DMRfinder ); it is written in Python and R, and is supported on Linux.
Noise studies of communication systems using the SYSTID computer aided analysis program
NASA Technical Reports Server (NTRS)
Tranter, W. H.; Dawson, C. T.
1973-01-01
SYSTID computer aided design is a simple program for simulating data systems and communication links. A trial of the efficiency of the method was carried out by simulating a linear analog communication system to determine its noise performance and by comparing the SYSTID result with the result arrived at by theoretical calculation. It is shown that the SYSTID program is readily applicable to the analysis of these types of systems.
Resolution Study of a Hyperspectral Sensor using Computed Tomography in the Presence of Noise
2012-06-14
diffraction efficiency is dependent on wavelength. Compared to techniques developed by later work, simple algebraic reconstruction techniques were used...spectral di- mension, using computed tomography (CT) techniques with only a finite number of diverse images. CTHIS require a reconstruction algorithm in...many frames are needed to reconstruct the spectral cube of a simple object using a theoretical lower bound. In this research a new algorithm is derived
Vectorization of transport and diffusion computations on the CDC Cyber 205
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abu-Shumays, I.K.
1986-01-01
The development and testing of alternative numerical methods and computational algorithms specifically designed for the vectorization of transport and diffusion computations on a Control Data Corporation (CDC) Cyber 205 vector computer are described. Two solution methods for the discrete ordinates approximation to the transport equation are summarized and compared. Factors of 4 to 7 reduction in run times for certain large transport problems were achieved on a Cyber 205 as compared with run times on a CDC-7600. The solution of tridiagonal systems of linear equations, central to several efficient numerical methods for multidimensional diffusion computations and essential for fluid flowmore » and other physics and engineering problems, is also dealt with. Among the methods tested, a combined odd-even cyclic reduction and modified Cholesky factorization algorithm for solving linear symmetric positive definite tridiagonal systems is found to be the most effective for these systems on a Cyber 205. For large tridiagonal systems, computation with this algorithm is an order of magnitude faster on a Cyber 205 than computation with the best algorithm for tridiagonal systems on a CDC-7600.« less
Dugernier, Jonathan; Hesse, Michel; Jumetz, Thibaud; Bialais, Emilie; Roeseler, Jean; Depoortere, Virginie; Michotte, Jean-Bernard; Wittebole, Xavier; Ehrmann, Stephan; Laterre, Pierre-François; Jamar, François; Reychler, Gregory
2017-10-01
High-flow nasal cannula use is developing in ICUs. The aim of this study was to compare aerosol efficiency by using two nebulizers through a high-flow nasal cannula: the most commonly used jet nebulizer (JN) and a more efficient vibrating-mesh nebulizer (VN). Aerosol delivery of diethylenetriaminepentaacetic acid labeled with technetium-99m (4 mCi/4 mL) to the lungs by using a VN (Aerogen Solo ® ; Aerogen Ltd., Galway, Ireland) and a constant-output JN (Opti-Mist Plus Nebulizer ® ; ConvaTec, Bridgewater, NJ) through a high-flow nasal cannula (Optiflow ® ; Fisher & Paykel, New Zealand) was compared in six healthy subjects. Flow rate was set at 30 L/min through the heated humidified circuit. Pulmonary and extrapulmonary deposition was measured by single-photon emission computed tomography combined with a low-dose computed tomographic scan and by planar scintigraphy. Lung deposition was only 3.6 (2.1-4.4) and 1 (0.7-2)% of the nominal dose with the VN and the JN, respectively (p < 0.05). The JN showed higher retained doses than the VN. However, both nebulizers were associated with substantial deposition in the single limb circuit, the humidification chamber, and the nasal cannula [58.2 (51.6-61.6)% of the nominal dose with the VN versus 19.2 (15.8-22.9)% of the nominal dose with the JN, p < 0.05] and in the upper respiratory tract [17.6 (13.4-27.9)% of the nominal dose with the VN and 8.6 (6.0-11.0)% of the nominal dose with the JN, p < 0.05], especially in the nasal cavity. In the specific conditions of the study, pulmonary drug delivery through the high-flow nasal cannula is about 1%-4% of the initial amount of drugs placed in the nebulizer, despite the higher efficiency of the VN as compared with the JN.
1-RAAP: An Efficient 1-Round Anonymous Authentication Protocol for Wireless Body Area Networks
Liu, Jingwei; Zhang, Lihuan; Sun, Rong
2016-01-01
Thanks to the rapid technological convergence of wireless communications, medical sensors and cloud computing, Wireless Body Area Networks (WBANs) have emerged as a novel networking paradigm enabling ubiquitous Internet services, allowing people to receive medical care, monitor health status in real-time, analyze sports data and even enjoy online entertainment remotely. However, because of the mobility and openness of wireless communications, WBANs are inevitably exposed to a large set of potential attacks, significantly undermining their utility and impeding their widespread deployment. To prevent attackers from threatening legitimate WBAN users or abusing WBAN services, an efficient and secure authentication protocol termed 1-Round Anonymous Authentication Protocol (1-RAAP) is proposed in this paper. In particular, 1-RAAP preserves anonymity, mutual authentication, non-repudiation and some other desirable security properties, while only requiring users to perform several low cost computational operations. More importantly, 1-RAAP is provably secure thanks to its design basis, which is resistant to the anonymous in the random oracle model. To validate the computational efficiency of 1-RAAP, a set of comprehensive comparative studies between 1-RAAP and other authentication protocols is conducted, and the results clearly show that 1-RAAP achieves the best performance in terms of computational overhead. PMID:27213384
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-07-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
1-RAAP: An Efficient 1-Round Anonymous Authentication Protocol for Wireless Body Area Networks.
Liu, Jingwei; Zhang, Lihuan; Sun, Rong
2016-05-19
Thanks to the rapid technological convergence of wireless communications, medical sensors and cloud computing, Wireless Body Area Networks (WBANs) have emerged as a novel networking paradigm enabling ubiquitous Internet services, allowing people to receive medical care, monitor health status in real-time, analyze sports data and even enjoy online entertainment remotely. However, because of the mobility and openness of wireless communications, WBANs are inevitably exposed to a large set of potential attacks, significantly undermining their utility and impeding their widespread deployment. To prevent attackers from threatening legitimate WBAN users or abusing WBAN services, an efficient and secure authentication protocol termed 1-Round Anonymous Authentication Protocol (1-RAAP) is proposed in this paper. In particular, 1-RAAP preserves anonymity, mutual authentication, non-repudiation and some other desirable security properties, while only requiring users to perform several low cost computational operations. More importantly, 1-RAAP is provably secure thanks to its design basis, which is resistant to the anonymous in the random oracle model. To validate the computational efficiency of 1-RAAP, a set of comprehensive comparative studies between 1-RAAP and other authentication protocols is conducted, and the results clearly show that 1-RAAP achieves the best performance in terms of computational overhead.
a Novel Discrete Optimal Transport Method for Bayesian Inverse Problems
NASA Astrophysics Data System (ADS)
Bui-Thanh, T.; Myers, A.; Wang, K.; Thiery, A.
2017-12-01
We present the Augmented Ensemble Transform (AET) method for generating approximate samples from a high-dimensional posterior distribution as a solution to Bayesian inverse problems. Solving large-scale inverse problems is critical for some of the most relevant and impactful scientific endeavors of our time. Therefore, constructing novel methods for solving the Bayesian inverse problem in more computationally efficient ways can have a profound impact on the science community. This research derives the novel AET method for exploring a posterior by solving a sequence of linear programming problems, resulting in a series of transport maps which map prior samples to posterior samples, allowing for the computation of moments of the posterior. We show both theoretical and numerical results, indicating this method can offer superior computational efficiency when compared to other SMC methods. Most of this efficiency is derived from matrix scaling methods to solve the linear programming problem and derivative-free optimization for particle movement. We use this method to determine inter-well connectivity in a reservoir and the associated uncertainty related to certain parameters. The attached file shows the difference between the true parameter and the AET parameter in an example 3D reservoir problem. The error is within the Morozov discrepancy allowance with lower computational cost than other particle methods.
An efficient hybrid technique in RCS predictions of complex targets at high frequencies
NASA Astrophysics Data System (ADS)
Algar, María-Jesús; Lozano, Lorena; Moreno, Javier; González, Iván; Cátedra, Felipe
2017-09-01
Most computer codes in Radar Cross Section (RCS) prediction use Physical Optics (PO) and Physical theory of Diffraction (PTD) combined with Geometrical Optics (GO) and Geometrical Theory of Diffraction (GTD). The latter approaches are computationally cheaper and much more accurate for curved surfaces, but not applicable for the computation of the RCS of all surfaces of a complex object due to the presence of caustic problems in the analysis of concave surfaces or flat surfaces in the far field. The main contribution of this paper is the development of a hybrid method based on a new combination of two asymptotic techniques: GTD and PO, considering the advantages and avoiding the disadvantages of each of them. A very efficient and accurate method to analyze the RCS of complex structures at high frequencies is obtained with the new combination. The proposed new method has been validated comparing RCS results obtained for some simple cases using the proposed approach and RCS using the rigorous technique of Method of Moments (MoM). Some complex cases have been examined at high frequencies contrasting the results with PO. This study shows the accuracy and the efficiency of the hybrid method and its suitability for the computation of the RCS at really large and complex targets at high frequencies.
NASA Astrophysics Data System (ADS)
Ercan, İlke; Suyabatmaz, Enes
2018-06-01
The saturation in the efficiency and performance scaling of conventional electronic technologies brings about the development of novel computational paradigms. Brownian circuits are among the promising alternatives that can exploit fluctuations to increase the efficiency of information processing in nanocomputing. A Brownian cellular automaton, where signals propagate randomly and are driven by local transition rules, can be made computationally universal by embedding arbitrary asynchronous circuits on it. One of the potential realizations of such circuits is via single electron tunneling (SET) devices since SET technology enable simulation of noise and fluctuations in a fashion similar to Brownian search. In this paper, we perform a physical-information-theoretic analysis on the efficiency limitations in a Brownian NAND and half-adder circuits implemented using SET technology. The method we employed here establishes a solid ground that enables studying computational and physical features of this emerging technology on an equal footing, and yield fundamental lower bounds that provide valuable insights into how far its efficiency can be improved in principle. In order to provide a basis for comparison, we also analyze a NAND gate and half-adder circuit implemented in complementary metal oxide semiconductor technology to show how the fundamental bound of the Brownian circuit compares against a conventional paradigm.
NASA Astrophysics Data System (ADS)
Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong
2017-07-01
The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
The multifacet graphically contracted function method. I. Formulation and implementation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shepard, Ron; Brozell, Scott R.; Gidofalvi, Gergely
2014-08-14
The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that bothmore » the energy and the gradient computation scale as O(N{sup 2}n{sup 4}) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N{sub 2} dissociation, cubic H{sub 8} dissociation, the symmetric dissociation of H{sub 2}O, and the insertion of Be into H{sub 2}. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tong, Dudu; Yang, Sichun; Lu, Lanyuan
2016-06-20
Structure modellingviasmall-angle X-ray scattering (SAXS) data generally requires intensive computations of scattering intensity from any given biomolecular structure, where the accurate evaluation of SAXS profiles using coarse-grained (CG) methods is vital to improve computational efficiency. To date, most CG SAXS computing methods have been based on a single-bead-per-residue approximation but have neglected structural correlations between amino acids. To improve the accuracy of scattering calculations, accurate CG form factors of amino acids are now derived using a rigorous optimization strategy, termed electron-density matching (EDM), to best fit electron-density distributions of protein structures. This EDM method is compared with and tested againstmore » other CG SAXS computing methods, and the resulting CG SAXS profiles from EDM agree better with all-atom theoretical SAXS data. By including the protein hydration shell represented by explicit CG water molecules and the correction of protein excluded volume, the developed CG form factors also reproduce the selected experimental SAXS profiles with very small deviations. Taken together, these EDM-derived CG form factors present an accurate and efficient computational approach for SAXS computing, especially when higher molecular details (represented by theqrange of the SAXS data) become necessary for effective structure modelling.« less
Privacy-preserving search for chemical compound databases.
Shimizu, Kana; Nuida, Koji; Arai, Hiromi; Mitsunari, Shigeo; Attrapadung, Nuttapong; Hamada, Michiaki; Tsuda, Koji; Hirokawa, Takatsugu; Sakuma, Jun; Hanaoka, Goichiro; Asai, Kiyoshi
2015-01-01
Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information.
Privacy-preserving search for chemical compound databases
2015-01-01
Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650
The Effect of Computer Automation on Institutional Review Board (IRB) Office Efficiency
ERIC Educational Resources Information Center
Oder, Karl; Pittman, Stephanie
2015-01-01
Companies purchase computer systems to make their processes more efficient through automation. Some academic medical centers (AMC) have purchased computer systems for their institutional review boards (IRB) to increase efficiency and compliance with regulations. IRB computer systems are expensive to purchase, deploy, and maintain. An AMC should…
Target Identification Using Harmonic Wavelet Based ISAR Imaging
NASA Astrophysics Data System (ADS)
Shreyamsha Kumar, B. K.; Prabhakar, B.; Suryanarayana, K.; Thilagavathi, V.; Rajagopal, R.
2006-12-01
A new approach has been proposed to reduce the computations involved in the ISAR imaging, which uses harmonic wavelet-(HW) based time-frequency representation (TFR). Since the HW-based TFR falls into a category of nonparametric time-frequency (T-F) analysis tool, it is computationally efficient compared to parametric T-F analysis tools such as adaptive joint time-frequency transform (AJTFT), adaptive wavelet transform (AWT), and evolutionary AWT (EAWT). Further, the performance of the proposed method of ISAR imaging is compared with the ISAR imaging by other nonparametric T-F analysis tools such as short-time Fourier transform (STFT) and Choi-Williams distribution (CWD). In the ISAR imaging, the use of HW-based TFR provides similar/better results with significant (92%) computational advantage compared to that obtained by CWD. The ISAR images thus obtained are identified using a neural network-based classification scheme with feature set invariant to translation, rotation, and scaling.
NASA Astrophysics Data System (ADS)
Valasek, Lukas; Glasa, Jan
2017-12-01
Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.
Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B
2013-01-01
A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
NASA Astrophysics Data System (ADS)
Wang, N.; Shen, Y.; Yang, D.; Bao, X.; Li, J.; Zhang, W.
2017-12-01
Accurate and efficient forward modeling methods are important for high resolution full waveform inversion. Compared with the elastic case, solving anelastic wave equation requires more computational time, because of the need to compute additional material-independent anelastic functions. A numerical scheme with a large Courant-Friedrichs-Lewy (CFL) condition number enables us to use a large time step to simulate wave propagation, which improves computational efficiency. In this work, we apply the fourth-order strong stability preserving Runge-Kutta method with an optimal CFL coeffiecient to solve the anelastic wave equation. We use a fourth order DRP/opt MacCormack scheme for the spatial discretization, and we approximate the rheological behaviors of the Earth by using the generalized Maxwell body model. With a larger CFL condition number, we find that the computational efficient is significantly improved compared with the traditional fourth-order Runge-Kutta method. Then, we apply the scattering-integral method for calculating travel time and amplitude sensitivity kernels with respect to velocity and attenuation structures. For each source, we carry out one forward simulation and save the time-dependent strain tensor. For each station, we carry out three `backward' simulations for the three components and save the corresponding strain tensors. The sensitivity kernels at each point in the medium are the convolution of the two sets of the strain tensors. Finally, we show several synthetic tests to verify the effectiveness of the strong stability preserving Runge-Kutta method in generating accurate synthetics in full waveform modeling, and in generating accurate strain tensors for calculating sensitivity kernels at regional and global scales.
LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.
El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher
2016-11-01
The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long
2016-04-01
Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Discontinuous Galerkin Methods and High-Speed Turbulent Flows
NASA Astrophysics Data System (ADS)
Atak, Muhammed; Larsson, Johan; Munz, Claus-Dieter
2014-11-01
Discontinuous Galerkin methods gain increasing importance within the CFD community as they combine arbitrary high order of accuracy in complex geometries with parallel efficiency. Particularly the discontinuous Galerkin spectral element method (DGSEM) is a promising candidate for both the direct numerical simulation (DNS) and large eddy simulation (LES) of turbulent flows due to its excellent scaling attributes. In this talk, we present a DNS of a compressible turbulent boundary layer along a flat plate at a free-stream Mach number of M = 2.67 and assess the computational efficiency of the DGSEM at performing high-fidelity simulations of both transitional and turbulent boundary layers. We compare the accuracy of the results as well as the computational performance to results using a high order finite difference method.
NASA Astrophysics Data System (ADS)
Gorthi, Sai Siva; Rajshekhar, Gannavarpu; Rastogi, Pramod
2010-06-01
Recently, a high-order instantaneous moments (HIM)-operator-based method was proposed for accurate phase estimation in digital holographic interferometry. The method relies on piece-wise polynomial approximation of phase and subsequent evaluation of the polynomial coefficients from the HIM operator using single-tone frequency estimation. The work presents a comparative analysis of the performance of different single-tone frequency estimation techniques, like Fourier transform followed by optimization, estimation of signal parameters by rotational invariance technique (ESPRIT), multiple signal classification (MUSIC), and iterative frequency estimation by interpolation on Fourier coefficients (IFEIF) in HIM-operator-based methods for phase estimation. Simulation and experimental results demonstrate the potential of the IFEIF technique with respect to computational efficiency and estimation accuracy.
Luminescence properties of Eu3+-doped SiO2-LiYF4 glass-ceramic microrods
NASA Astrophysics Data System (ADS)
Secu, C. E.; Secu, M.
2015-09-01
Photoluminescence properties of the glass-ceramics microrods containing Eu3+-doped LiYF4 nanocrystals have been studied and characterized. Judd-Ofelt parameters and quantum efficiency has been computed from luminescence spectra and discussed by comparison to the glass ceramic bulk and pellet. The radiative decay rate Arad is higher in the glass ceramic rods (221 s-1) than in the glass ceramic bulk (130 s-1) but the quantum efficiency computed is very low (21%) compared to the glass-ceramic bulk (97%). There are effective non-radiative decay channels that might be related to an influence of the dimensional constraints imposed by the membrane pores during xerogel formation and subsequent glass ceramization.
Design of a reversible single precision floating point subtractor.
Anantha Lakshmi, Av; Sudha, Gf
2014-01-04
In recent years, Reversible logic has emerged as a major area of research due to its ability to reduce the power dissipation which is the main requirement in the low power digital circuit design. It has wide applications like low power CMOS design, Nano-technology, Digital signal processing, Communication, DNA computing and Optical computing. Floating-point operations are needed very frequently in nearly all computing disciplines, and studies have shown floating-point addition/subtraction to be the most used floating-point operation. However, few designs exist on efficient reversible BCD subtractors but no work on reversible floating point subtractor. In this paper, it is proposed to present an efficient reversible single precision floating-point subtractor. The proposed design requires reversible designs of an 8-bit and a 24-bit comparator unit, an 8-bit and a 24-bit subtractor, and a normalization unit. For normalization, a 24-bit Reversible Leading Zero Detector and a 24-bit reversible shift register is implemented to shift the mantissas. To realize a reversible 1-bit comparator, in this paper, two new 3x3 reversible gates are proposed The proposed reversible 1-bit comparator is better and optimized in terms of the number of reversible gates used, the number of transistor count and the number of garbage outputs. The proposed work is analysed in terms of number of reversible gates, garbage outputs, constant inputs and quantum costs. Using these modules, an efficient design of a reversible single precision floating point subtractor is proposed. Proposed circuits have been simulated using Modelsim and synthesized using Xilinx Virtex5vlx30tff665-3. The total on-chip power consumed by the proposed 32-bit reversible floating point subtractor is 0.410 W.
NASA Astrophysics Data System (ADS)
Kim, Euiyoung; Cho, Maenghyo
2017-11-01
In most non-linear analyses, the construction of a system matrix uses a large amount of computation time, comparable to the computation time required by the solving process. If the process for computing non-linear internal force matrices is substituted with an effective equivalent model that enables the bypass of numerical integrations and assembly processes used in matrix construction, efficiency can be greatly enhanced. A stiffness evaluation procedure (STEP) establishes non-linear internal force models using polynomial formulations of displacements. To efficiently identify an equivalent model, the method has evolved such that it is based on a reduced-order system. The reduction process, however, makes the equivalent model difficult to parameterize, which significantly affects the efficiency of the optimization process. In this paper, therefore, a new STEP, E-STEP, is proposed. Based on the element-wise nature of the finite element model, the stiffness evaluation is carried out element-by-element in the full domain. Since the unit of computation for the stiffness evaluation is restricted by element size, and since the computation is independent, the equivalent model can be constructed efficiently in parallel, even in the full domain. Due to the element-wise nature of the construction procedure, the equivalent E-STEP model is easily characterized by design parameters. Various reduced-order modeling techniques can be applied to the equivalent system in a manner similar to how they are applied in the original system. The reduced-order model based on E-STEP is successfully demonstrated for the dynamic analyses of non-linear structural finite element systems under varying design parameters.
NASA Astrophysics Data System (ADS)
Kokkinaki, A.; Sleep, B. E.; Chambers, J. E.; Cirpka, O. A.; Nowak, W.
2010-12-01
Electrical Resistance Tomography (ERT) is a popular method for investigating subsurface heterogeneity. The method relies on measuring electrical potential differences and obtaining, through inverse modeling, the underlying electrical conductivity field, which can be related to hydraulic conductivities. The quality of site characterization strongly depends on the utilized inversion technique. Standard ERT inversion methods, though highly computationally efficient, do not consider spatial correlation of soil properties; as a result, they often underestimate the spatial variability observed in earth materials, thereby producing unrealistic subsurface models. Also, these methods do not quantify the uncertainty of the estimated properties, thus limiting their use in subsequent investigations. Geostatistical inverse methods can be used to overcome both these limitations; however, they are computationally expensive, which has hindered their wide use in practice. In this work, we compare a standard Gauss-Newton smoothness constrained least squares inversion method against the quasi-linear geostatistical approach using the three-dimensional ERT dataset of the SABRe (Source Area Bioremediation) project. The two methods are evaluated for their ability to: a) produce physically realistic electrical conductivity fields that agree with the wide range of data available for the SABRe site while being computationally efficient, and b) provide information on the spatial statistics of other parameters of interest, such as hydraulic conductivity. To explore the trade-off between inversion quality and computational efficiency, we also employ a 2.5-D forward model with corrections for boundary conditions and source singularities. The 2.5-D model accelerates the 3-D geostatistical inversion method. New adjoint equations are developed for the 2.5-D forward model for the efficient calculation of sensitivities. Our work shows that spatial statistics can be incorporated in large-scale ERT inversions to improve the inversion results without making them computationally prohibitive.
Determination of aerodynamic sensitivity coefficients in the transonic and supersonic regimes
NASA Technical Reports Server (NTRS)
Elbanna, Hesham M.; Carlson, Leland A.
1989-01-01
The quasi-analytical approach is developed to compute airfoil aerodynamic sensitivity coefficients in the transonic and supersonic flight regimes. Initial investigation verifies the feasibility of this approach as applied to the transonic small perturbation residual expression. Results are compared to those obtained by the direct (finite difference) approach and both methods are evaluated to determine their computational accuracies and efficiencies. The quasi-analytical approach is shown to be superior and worth further investigation.
Nonperturbative methods in HZE ion transport
NASA Technical Reports Server (NTRS)
Wilson, John W.; Badavi, Francis F.; Costen, Robert C.; Shinn, Judy L.
1993-01-01
A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport. The code is established to operate on the Langley Research Center nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code is highly efficient and compares well with the perturbation approximations.
Hu, Shengshan; Wang, Qian; Wang, Jingjun; Qin, Zhan; Ren, Kui
2016-05-13
Advances in cloud computing have greatly motivated data owners to outsource their huge amount of personal multimedia data and/or computationally expensive tasks onto the cloud by leveraging its abundant resources for cost saving and flexibility. Despite the tremendous benefits, the outsourced multimedia data and its originated applications may reveal the data owner's private information, such as the personal identity, locations or even financial profiles. This observation has recently aroused new research interest on privacy-preserving computations over outsourced multimedia data. In this paper, we propose an effective and practical privacy-preserving computation outsourcing protocol for the prevailing scale-invariant feature transform (SIFT) over massive encrypted image data. We first show that previous solutions to this problem have either efficiency/security or practicality issues, and none can well preserve the important characteristics of the original SIFT in terms of distinctiveness and robustness. We then present a new scheme design that achieves efficiency and security requirements simultaneously with the preservation of its key characteristics, by randomly splitting the original image data, designing two novel efficient protocols for secure multiplication and comparison, and carefully distributing the feature extraction computations onto two independent cloud servers. We both carefully analyze and extensively evaluate the security and effectiveness of our design. The results show that our solution is practically secure, outperforms the state-of-theart, and performs comparably to the original SIFT in terms of various characteristics, including rotation invariance, image scale invariance, robust matching across affine distortion, addition of noise and change in 3D viewpoint and illumination.
Hu, Shengshan; Wang, Qian; Wang, Jingjun; Qin, Zhan; Ren, Kui
2016-05-13
Advances in cloud computing have greatly motivated data owners to outsource their huge amount of personal multimedia data and/or computationally expensive tasks onto the cloud by leveraging its abundant resources for cost saving and flexibility. Despite the tremendous benefits, the outsourced multimedia data and its originated applications may reveal the data owner's private information, such as the personal identity, locations or even financial profiles. This observation has recently aroused new research interest on privacy-preserving computations over outsourced multimedia data. In this paper, we propose an effective and practical privacy-preserving computation outsourcing protocol for the prevailing scale-invariant feature transform (SIFT) over massive encrypted image data. We first show that previous solutions to this problem have either efficiency/security or practicality issues, and none can well preserve the important characteristics of the original SIFT in terms of distinctiveness and robustness. We then present a new scheme design that achieves efficiency and security requirements simultaneously with the preservation of its key characteristics, by randomly splitting the original image data, designing two novel efficient protocols for secure multiplication and comparison, and carefully distributing the feature extraction computations onto two independent cloud servers. We both carefully analyze and extensively evaluate the security and effectiveness of our design. The results show that our solution is practically secure, outperforms the state-of-theart, and performs comparably to the original SIFT in terms of various characteristics, including rotation invariance, image scale invariance, robust matching across affine distortion, addition of noise and change in 3D viewpoint and illumination.
A comparative study of serial and parallel aeroelastic computations of wings
NASA Technical Reports Server (NTRS)
Byun, Chansup; Guruswamy, Guru P.
1994-01-01
A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
Experimental, Theoretical, and Computational Investigation of Separated Nozzle Flows
NASA Technical Reports Server (NTRS)
Hunter, Craig A.
2004-01-01
A detailed experimental, theoretical, and computational study of separated nozzle flows has been conducted. Experimental testing was performed at the NASA Langley 16-Foot Transonic Tunnel Complex. As part of a comprehensive static performance investigation, force, moment, and pressure measurements were made and schlieren flow visualization was obtained for a sub-scale, non-axisymmetric, two-dimensional, convergent- divergent nozzle. In addition, two-dimensional numerical simulations were run using the computational fluid dynamics code PAB3D with two-equation turbulence closure and algebraic Reynolds stress modeling. For reference, experimental and computational results were compared with theoretical predictions based on one-dimensional gas dynamics and an approximate integral momentum boundary layer method. Experimental results from this study indicate that off-design overexpanded nozzle flow was dominated by shock induced boundary layer separation, which was divided into two distinct flow regimes; three- dimensional separation with partial reattachment, and fully detached two-dimensional separation. The test nozzle was observed to go through a marked transition in passing from one regime to the other. In all cases, separation provided a significant increase in static thrust efficiency compared to the ideal prediction. Results indicate that with controlled separation, the entire overexpanded range of nozzle performance would be within 10% of the peak thrust efficiency. By offering savings in weight and complexity over a conventional mechanical exhaust system, this may allow a fixed geometry nozzle to cover an entire flight envelope. The computational simulation was in excellent agreement with experimental data over most of the test range, and did a good job of modeling internal flow and thrust performance. An exception occurred at low nozzle pressure ratios, where the two-dimensional computational model was inconsistent with the three-dimensional separation observed in the experiment. In general, the computation captured the physics of the shock boundary layer interaction and shock induced boundary layer separation in the nozzle, though there were some differences in shock structure compared to experiment. Though minor, these differences could be important for studies involving flow control or thrust vectoring of separated nozzles. Combined with other observations, this indicates that more detailed, three-dimensional computational modeling needs to be conducted to more realistically simulate shock-separated nozzle flows.
An adaptive grid to improve the efficiency and accuracy of modelling underwater noise from shipping
NASA Astrophysics Data System (ADS)
Trigg, Leah; Chen, Feng; Shapiro, Georgy; Ingram, Simon; Embling, Clare
2017-04-01
Underwater noise from shipping is becoming a significant concern and has been listed as a pollutant under Descriptor 11 of the Marine Strategy Framework Directive. Underwater noise models are an essential tool to assess and predict noise levels for regulatory procedures such as environmental impact assessments and ship noise monitoring. There are generally two approaches to noise modelling. The first is based on simplified energy flux models, assuming either spherical or cylindrical propagation of sound energy. These models are very quick but they ignore important water column and seabed properties, and produce significant errors in the areas subject to temperature stratification (Shapiro et al., 2014). The second type of model (e.g. ray-tracing and parabolic equation) is based on an advanced physical representation of sound propagation. However, these acoustic propagation models are computationally expensive to execute. Shipping noise modelling requires spatial discretization in order to group noise sources together using a grid. A uniform grid size is often selected to achieve either the greatest efficiency (i.e. speed of computations) or the greatest accuracy. In contrast, this work aims to produce efficient and accurate noise level predictions by presenting an adaptive grid where cell size varies with distance from the receiver. The spatial range over which a certain cell size is suitable was determined by calculating the distance from the receiver at which propagation loss becomes uniform across a grid cell. The computational efficiency and accuracy of the resulting adaptive grid was tested by comparing it to uniform 1 km and 5 km grids. These represent an accurate and computationally efficient grid respectively. For a case study of the Celtic Sea, an application of the adaptive grid over an area of 160×160 km reduced the number of model executions required from 25600 for a 1 km grid to 5356 in December and to between 5056 and 13132 in August, which represents a 2 to 5-fold increase in efficiency. The 5 km grid reduces the number of model executions further to 1024. However, over the first 25 km the 5 km grid produces errors of up to 13.8 dB when compared to the highly accurate but inefficient 1 km grid. The newly developed adaptive grid generates much smaller errors of less than 0.5 dB while demonstrating high computational efficiency. Our results show that the adaptive grid provides the ability to retain the accuracy of noise level predictions and improve the efficiency of the modelling process. This can help safeguard sensitive marine ecosystems from noise pollution by improving the underwater noise predictions that inform management activities. References Shapiro, G., Chen, F., Thain, R., 2014. The Effect of Ocean Fronts on Acoustic Wave Propagation in a Shallow Sea, Journal of Marine System, 139: 217 - 226. http://dx.doi.org/10.1016/j.jmarsys.2014.06.007.
OASIS: A GEOGRAPHICAL DECISION SUPPORT SYSTEM FOR GROUND-WATER CONTAMINANT MODELING
Three new software technologies were applied to develop an efficient and easy to use decision support system for ground-water contaminant modeling. Graphical interfaces create a more intuitive and effective form of communication with the computer compared to text-based interfaces...
Inference of epidemiological parameters from household stratified data
Walker, James N.; Ross, Joshua V.
2017-01-01
We consider a continuous-time Markov chain model of SIR disease dynamics with two levels of mixing. For this so-called stochastic households model, we provide two methods for inferring the model parameters—governing within-household transmission, recovery, and between-household transmission—from data of the day upon which each individual became infectious and the household in which each infection occurred, as might be available from First Few Hundred studies. Each method is a form of Bayesian Markov Chain Monte Carlo that allows us to calculate a joint posterior distribution for all parameters and hence the household reproduction number and the early growth rate of the epidemic. The first method performs exact Bayesian inference using a standard data-augmentation approach; the second performs approximate Bayesian inference based on a likelihood approximation derived from branching processes. These methods are compared for computational efficiency and posteriors from each are compared. The branching process is shown to be a good approximation and remains computationally efficient as the amount of data is increased. PMID:29045456
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suryanarayana, Phanish, E-mail: phanish.suryanarayana@ce.gatech.edu; Phanish, Deepa
We present an Augmented Lagrangian formulation and its real-space implementation for non-periodic Orbital-Free Density Functional Theory (OF-DFT) calculations. In particular, we rewrite the constrained minimization problem of OF-DFT as a sequence of minimization problems without any constraint, thereby making it amenable to powerful unconstrained optimization algorithms. Further, we develop a parallel implementation of this approach for the Thomas–Fermi–von Weizsacker (TFW) kinetic energy functional in the framework of higher-order finite-differences and the conjugate gradient method. With this implementation, we establish that the Augmented Lagrangian approach is highly competitive compared to the penalty and Lagrange multiplier methods. Additionally, we show that higher-ordermore » finite-differences represent a computationally efficient discretization for performing OF-DFT simulations. Overall, we demonstrate that the proposed formulation and implementation are both efficient and robust by studying selected examples, including systems consisting of thousands of atoms. We validate the accuracy of the computed energies and forces by comparing them with those obtained by existing plane-wave methods.« less
Compressive Spectral Method for the Simulation of the Nonlinear Gravity Waves
Bayındır, Cihan
2016-01-01
In this paper an approach for decreasing the computational effort required for the spectral simulations of the fully nonlinear ocean waves is introduced. The proposed approach utilizes the compressive sampling algorithm and depends on the idea of using a smaller number of spectral components compared to the classical spectral method. After performing the time integration with a smaller number of spectral components and using the compressive sampling technique, it is shown that the ocean wave field can be reconstructed with a significantly better efficiency compared to the classical spectral method. For the sparse ocean wave model in the frequency domain the fully nonlinear ocean waves with Jonswap spectrum is considered. By implementation of a high-order spectral method it is shown that the proposed methodology can simulate the linear and the fully nonlinear ocean waves with negligible difference in the accuracy and with a great efficiency by reducing the computation time significantly especially for large time evolutions. PMID:26911357
Using hybrid implicit Monte Carlo diffusion to simulate gray radiation hydrodynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Gentile, Nick
This work describes how to couple a hybrid Implicit Monte Carlo Diffusion (HIMCD) method with a Lagrangian hydrodynamics code to evaluate the coupled radiation hydrodynamics equations. This HIMCD method dynamically applies Implicit Monte Carlo Diffusion (IMD) [1] to regions of a problem that are opaque and diffusive while applying standard Implicit Monte Carlo (IMC) [2] to regions where the diffusion approximation is invalid. We show that this method significantly improves the computational efficiency as compared to a standard IMC/Hydrodynamics solver, when optically thick diffusive material is present, while maintaining accuracy. Two test cases are used to demonstrate the accuracy andmore » performance of HIMCD as compared to IMC and IMD. The first is the Lowrie semi-analytic diffusive shock [3]. The second is a simple test case where the source radiation streams through optically thin material and heats a thick diffusive region of material causing it to rapidly expand. We found that HIMCD proves to be accurate, robust, and computationally efficient for these test problems.« less
Secure multiparty computation of a comparison problem.
Liu, Xin; Li, Shundong; Liu, Jian; Chen, Xiubo; Xu, Gang
2016-01-01
Private comparison is fundamental to secure multiparty computation. In this study, we propose novel protocols to privately determine [Formula: see text], or [Formula: see text] in one execution. First, a 0-1-vector encoding method is introduced to encode a number into a vector, and the Goldwasser-Micali encryption scheme is used to compare integers privately. Then, we propose a protocol by using a geometric method to compare rational numbers privately, and the protocol is information-theoretical secure. Using the simulation paradigm, we prove the privacy-preserving property of our protocols in the semi-honest model. The complexity analysis shows that our protocols are more efficient than previous solutions.
NASA Astrophysics Data System (ADS)
Du, X.; Landecker, T. L.; Robishaw, T.; Gray, A. D.; Douglas, K. A.; Wolleben, M.
2016-11-01
Measurement of the brightness temperature of extended radio emission demands knowledge of the gain (or aperture efficiency) of the telescope and measurement of the polarized component of the emission requires correction for the conversion of unpolarized emission from sky and ground to apparently polarized signal. Radiation properties of the John A. Galt Telescope at the Dominion Radio Astrophysical Observatory were studied through analysis and measurement in order to provide absolute calibration of a survey of polarized emission from the entire northern sky from 1280 to 1750 MHz, and to understand the polarization performance of the telescope. Electromagnetic simulation packages CST and GRASP-10 were used to compute complete radiation patterns of the telescope in all Stokes parameters, and thereby to establish gain and aperture efficiency. Aperture efficiency was also evaluated using geometrical optics and ray tracing analysis and was measured based on the known flux density of Cyg A. Measured aperture efficiency varied smoothly with frequency between values of 0.49 and 0.54; GRASP-10 yielded values 6.5% higher but with closely similar variation with frequency. Overall error across the frequency band is 3%, but values at any two frequencies are relatively correct to ˜1%. Dominant influences on aperture efficiency are the illumination taper of the feed radiation pattern and the shadowing of the reflector from the feed by the feed-support struts. A model of emission from the ground was developed based on measurements and on empirical data obtained from remote sensing of the Earth from satellite-borne telescopes. This model was convolved with the computed antenna response to estimate conversion of ground emission into spurious polarized signal. The computed spurious signal is comparable to measured values, but is not accurate enough to be used to correct observations. A simpler model, in which the ground is considered as an unpolarized emitter with a brightness temperature of ˜240 K, is shown to have useful accuracy when compared to measurements.
Study of a rare-gas transverse fast discharge
NASA Technical Reports Server (NTRS)
Chubb, D. L.; Michels, C. J.
1979-01-01
An experimental and analytical study of a Blumlein-type transverse fast discharge operating with He and Xe are presented. An electro-optical voltage probe was used to measure the discharge voltage, and the measured voltages were in agreement with the computed voltages. The analytical model was used to predict the dependence of the discharge efficiency for producing metastables and ions on the important plasma and external circuit parameters. In He the ion efficiency is greater than the metastable efficiency, while in Xe it is the opposite; the He ion efficiencies are much larger than in Xe, while Xe metastable efficiencies are much larger than in He. These differences between Xe and He are accounted by the large dissociative recombination rate of Xe compared with He.
Analytical Cost Metrics : Days of Future Past
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prajapati, Nirmal; Rajopadhye, Sanjay; Djidjev, Hristo Nikolov
As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore’s law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore the major challenge that we face in computing systems researchmore » is: “how to solve massive-scale computational problems in the most time/power/energy efficient manner?”« less
Large-scale inverse model analyses employing fast randomized data reduction
NASA Astrophysics Data System (ADS)
Lin, Youzuo; Le, Ellen B.; O'Malley, Daniel; Vesselinov, Velimir V.; Bui-Thanh, Tan
2017-08-01
When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called "sketching" matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.
A-VCI: A flexible method to efficiently compute vibrational spectra
NASA Astrophysics Data System (ADS)
Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier
2017-06-01
The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm-1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm-1 is the most accurate computation that exists today on such systems.
A-VCI: A flexible method to efficiently compute vibrational spectra.
Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier
2017-06-07
The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm -1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm -1 is the most accurate computation that exists today on such systems.
Computational efficiency improvements for image colorization
NASA Astrophysics Data System (ADS)
Yu, Chao; Sharma, Gaurav; Aly, Hussein
2013-03-01
We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
Computationally-Efficient Minimum-Time Aircraft Routes in the Presence of Winds
NASA Technical Reports Server (NTRS)
Jardin, Matthew R.
2004-01-01
A computationally efficient algorithm for minimizing the flight time of an aircraft in a variable wind field has been invented. The algorithm, referred to as Neighboring Optimal Wind Routing (NOWR), is based upon neighboring-optimal-control (NOC) concepts and achieves minimum-time paths by adjusting aircraft heading according to wind conditions at an arbitrary number of wind measurement points along the flight route. The NOWR algorithm may either be used in a fast-time mode to compute minimum- time routes prior to flight, or may be used in a feedback mode to adjust aircraft heading in real-time. By traveling minimum-time routes instead of direct great-circle (direct) routes, flights across the United States can save an average of about 7 minutes, and as much as one hour of flight time during periods of strong jet-stream winds. The neighboring optimal routes computed via the NOWR technique have been shown to be within 1.5 percent of the absolute minimum-time routes for flights across the continental United States. On a typical 450-MHz Sun Ultra workstation, the NOWR algorithm produces complete minimum-time routes in less than 40 milliseconds. This corresponds to a rate of 25 optimal routes per second. The closest comparable optimization technique runs approximately 10 times slower. Airlines currently use various trial-and-error search techniques to determine which of a set of commonly traveled routes will minimize flight time. These algorithms are too computationally expensive for use in real-time systems, or in systems where many optimal routes need to be computed in a short amount of time. Instead of operating in real-time, airlines will typically plan a trajectory several hours in advance using wind forecasts. If winds change significantly from forecasts, the resulting flights will no longer be minimum-time. The need for a computationally efficient wind-optimal routing algorithm is even greater in the case of new air-traffic-control automation concepts. For air-traffic-control automation, thousands of wind-optimal routes may need to be computed and checked for conflicts in just a few minutes. These factors motivated the need for a more efficient wind-optimal routing algorithm.
The application of dynamic programming in production planning
NASA Astrophysics Data System (ADS)
Wu, Run
2017-05-01
Nowadays, with the popularity of the computers, various industries and fields are widely applying computer information technology, which brings about huge demand for a variety of application software. In order to develop software meeting various needs with most economical cost and best quality, programmers must design efficient algorithms. A superior algorithm can not only soul up one thing, but also maximize the benefits and generate the smallest overhead. As one of the common algorithms, dynamic programming algorithms are used to solving problems with some sort of optimal properties. When solving problems with a large amount of sub-problems that needs repetitive calculations, the ordinary sub-recursive method requires to consume exponential time, and dynamic programming algorithm can reduce the time complexity of the algorithm to the polynomial level, according to which we can conclude that dynamic programming algorithm is a very efficient compared to other algorithms reducing the computational complexity and enriching the computational results. In this paper, we expound the concept, basic elements, properties, core, solving steps and difficulties of the dynamic programming algorithm besides, establish the dynamic programming model of the production planning problem.
Vecharynski, Eugene; Yang, Chao; Pask, John E.
2015-02-25
Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greenough, Jeffrey A.; de Supinski, Bronis R.; Yates, Robert K.
2005-04-25
We describe the performance of the block-structured Adaptive Mesh Refinement (AMR) code Raptor on the 32k node IBM BlueGene/L computer. This machine represents a significant step forward towards petascale computing. As such, it presents Raptor with many challenges for utilizing the hardware efficiently. In terms of performance, Raptor shows excellent weak and strong scaling when running in single level mode (no adaptivity). Hardware performance monitors show Raptor achieves an aggregate performance of 3:0 Tflops in the main integration kernel on the 32k system. Results from preliminary AMR runs on a prototype astrophysical problem demonstrate the efficiency of the current softwaremore » when running at large scale. The BG/L system is enabling a physics problem to be considered that represents a factor of 64 increase in overall size compared to the largest ones of this type computed to date. Finally, we provide a description of the development work currently underway to address our inefficiencies.« less
Electronic Structure at Electrode/Electrolyte Interfaces in Magnesium based Batteries
NASA Astrophysics Data System (ADS)
Balachandran, Janakiraman; Siegel, Donald
2015-03-01
Magnesium is a promising multivalent element for use in next generation electrochemical energy storage systems. However, a wide range of challenges such as low coulombic efficiency, low/varying capacity and cyclability need to be resolved in order to realize Mg based batteries. Many of these issues can be related to interfacial phenomena between the Mg anode and common electrolytes. Ab-initio based computational models of these interfaces can provide insights on the interfacial interactions that can be difficult to probe experimentally. In this work we present ab-initio computations of common electrolyte solvents (THF, DME) in contact with two model electrode surfaces namely -- (i) an ``SEI-free'' electrode based on Mg metal and, (ii) a ``passivated'' electrode consisting of MgO. We perform GW calculations to predict the reorganization of the molecular orbitals (HOMO/LUMO) upon contact with the these surfaces and their alignment with respect to the Fermi energy of the electrodes. These computations are in turn compared with more efficient GGA (PBE) & Hybrid (HSE) functional calculations. The results obtained from these computations enable us to qualitatively describe the stability of these solvent molecules at electrode-electrolyte interfaces
Adaptive CFD schemes for aerospace propulsion
NASA Astrophysics Data System (ADS)
Ferrero, A.; Larocca, F.
2017-05-01
The flow fields which can be observed inside several components of aerospace propulsion systems are characterised by the presence of very localised phenomena (boundary layers, shock waves,...) which can deeply influence the performances of the system. In order to accurately evaluate these effects by means of Computational Fluid Dynamics (CFD) simulations, it is necessary to locally refine the computational mesh. In this way the degrees of freedom related to the discretisation are focused in the most interesting regions and the computational cost of the simulation remains acceptable. In the present work, a discontinuous Galerkin (DG) discretisation is used to numerically solve the equations which describe the flow field. The local nature of the DG reconstruction makes it possible to efficiently exploit several adaptive schemes in which the size of the elements (h-adaptivity) and the order of reconstruction (p-adaptivity) are locally changed. After a review of the main adaptation criteria, some examples related to compressible flows in turbomachinery are presented. An hybrid hp-adaptive algorithm is also proposed and compared with a standard h-adaptive scheme in terms of computational efficiency.
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang
2017-10-01
The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Yan, Zai You; Hung, Kin Chew; Zheng, Hui
2003-05-01
Regularization of the hypersingular integral in the normal derivative of the conventional Helmholtz integral equation through a double surface integral method or regularization relationship has been studied. By introducing the new concept of discretized operator matrix, evaluation of the double surface integrals is reduced to calculate the product of two discretized operator matrices. Such a treatment greatly improves the computational efficiency. As the number of frequencies to be computed increases, the computational cost of solving the composite Helmholtz integral equation is comparable to that of solving the conventional Helmholtz integral equation. In this paper, the detailed formulation of the proposed regularization method is presented. The computational efficiency and accuracy of the regularization method are demonstrated for a general class of acoustic radiation and scattering problems. The radiation of a pulsating sphere, an oscillating sphere, and a rigid sphere insonified by a plane acoustic wave are solved using the new method with curvilinear quadrilateral isoparametric elements. It is found that the numerical results rapidly converge to the corresponding analytical solutions as finer meshes are applied.
Improving the Efficiency of Free Energy Calculations in the Amber Molecular Dynamics Package.
Kaus, Joseph W; Pierce, Levi T; Walker, Ross C; McCammont, J Andrew
2013-09-10
Alchemical transformations are widely used methods to calculate free energies. Amber has traditionally included support for alchemical transformations as part of the sander molecular dynamics (MD) engine. Here we describe the implementation of a more efficient approach to alchemical transformations in the Amber MD package. Specifically we have implemented this new approach within the more computational efficient and scalable pmemd MD engine that is included with the Amber MD package. The majority of the gain in efficiency comes from the improved design of the calculation, which includes better parallel scaling and reduction in the calculation of redundant terms. This new implementation is able to reproduce results from equivalent simulations run with the existing functionality, but at 2.5 times greater computational efficiency. This new implementation is also able to run softcore simulations at the λ end states making direct calculation of free energies more accurate, compared to the extrapolation required in the existing implementation. The updated alchemical transformation functionality will be included in the next major release of Amber (scheduled for release in Q1 2014) and will be available at http://ambermd.org, under the Amber license.
Improving the Efficiency of Free Energy Calculations in the Amber Molecular Dynamics Package
Pierce, Levi T.; Walker, Ross C.; McCammont, J. Andrew
2013-01-01
Alchemical transformations are widely used methods to calculate free energies. Amber has traditionally included support for alchemical transformations as part of the sander molecular dynamics (MD) engine. Here we describe the implementation of a more efficient approach to alchemical transformations in the Amber MD package. Specifically we have implemented this new approach within the more computational efficient and scalable pmemd MD engine that is included with the Amber MD package. The majority of the gain in efficiency comes from the improved design of the calculation, which includes better parallel scaling and reduction in the calculation of redundant terms. This new implementation is able to reproduce results from equivalent simulations run with the existing functionality, but at 2.5 times greater computational efficiency. This new implementation is also able to run softcore simulations at the λ end states making direct calculation of free energies more accurate, compared to the extrapolation required in the existing implementation. The updated alchemical transformation functionality will be included in the next major release of Amber (scheduled for release in Q1 2014) and will be available at http://ambermd.org, under the Amber license. PMID:24185531
Singh, Dadabhai T; Trehan, Rahul; Schmidt, Bertil; Bretschneider, Timo
2008-01-01
Preparedness for a possible global pandemic caused by viruses such as the highly pathogenic influenza A subtype H5N1 has become a global priority. In particular, it is critical to monitor the appearance of any new emerging subtypes. Comparative phyloinformatics can be used to monitor, analyze, and possibly predict the evolution of viruses. However, in order to utilize the full functionality of available analysis packages for large-scale phyloinformatics studies, a team of computer scientists, biostatisticians and virologists is needed--a requirement which cannot be fulfilled in many cases. Furthermore, the time complexities of many algorithms involved leads to prohibitive runtimes on sequential computer platforms. This has so far hindered the use of comparative phyloinformatics as a commonly applied tool in this area. In this paper the graphical-oriented workflow design system called Quascade and its efficient usage for comparative phyloinformatics are presented. In particular, we focus on how this task can be effectively performed in a distributed computing environment. As a proof of concept, the designed workflows are used for the phylogenetic analysis of neuraminidase of H5N1 isolates (micro level) and influenza viruses (macro level). The results of this paper are hence twofold. Firstly, this paper demonstrates the usefulness of a graphical user interface system to design and execute complex distributed workflows for large-scale phyloinformatics studies of virus genes. Secondly, the analysis of neuraminidase on different levels of complexity provides valuable insights of this virus's tendency for geographical based clustering in the phylogenetic tree and also shows the importance of glycan sites in its molecular evolution. The current study demonstrates the efficiency and utility of workflow systems providing a biologist friendly approach to complex biological dataset analysis using high performance computing. In particular, the utility of the platform Quascade for deploying distributed and parallelized versions of a variety of computationally intensive phylogenetic algorithms has been shown. Secondly, the analysis of the utilized H5N1 neuraminidase datasets at macro and micro levels has clearly indicated a pattern of spatial clustering of the H5N1 viral isolates based on geographical distribution rather than temporal or host range based clustering.
NASA Technical Reports Server (NTRS)
Radhakrishnan, K.
1984-01-01
The efficiency and accuracy of several algorithms recently developed for the efficient numerical integration of stiff ordinary differential equations are compared. The methods examined include two general-purpose codes, EPISODE and LSODE, and three codes (CHEMEQ, CREK1D, and GCKP84) developed specifically to integrate chemical kinetic rate equations. The codes are applied to two test problems drawn from combustion kinetics. The comparisons show that LSODE is the fastest code currently available for the integration of combustion kinetic rate equations. An important finding is that an interactive solution of the algebraic energy conservation equation to compute the temperature does not result in significant errors. In addition, this method is more efficient than evaluating the temperature by integrating its time derivative. Significant reductions in computational work are realized by updating the rate constants (k = at(supra N) N exp(-E/RT) only when the temperature change exceeds an amount delta T that is problem dependent. An approximate expression for the automatic evaluation of delta T is derived and is shown to result in increased efficiency.
TLD efficiency calculations for heavy ions: an analytical approach
Boscolo, Daria; Scifoni, Emanuele; Carlino, Antonio; ...
2015-12-18
The use of thermoluminescent dosimeters (TLDs) in heavy charged particles’ dosimetry is limited by their non-linear dose response curve and by their response dependence on the radiation quality. Thus, in order to use TLDs with particle beams, a model that can reproduce the behavior of these detectors under different conditions is needed. Here a new, simple and completely analytical algorithm for the calculation of the relative TL-efficiency depending on the ion charge Z and energy E is presented. In addition, the detector response is evaluated starting from the single ion case, where the computed effectiveness values have been compared withmore » experimental data as well as with predictions from a different method. The main advantage of this approach is that, being fully analytical, it is computationally fast and can be efficiently integrated into treatment planning verification tools. In conclusion, the calculated efficiency values have been then implemented in the treatment planning code TRiP98 and dose calculations on a macroscopic target irradiated with an extended carbon ion field have been performed and verified against experimental data.« less
Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil
2012-01-01
The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
Experimental Realization of High-Efficiency Counterfactual Computation.
Kong, Fei; Ju, Chenyong; Huang, Pu; Wang, Pengfei; Kong, Xi; Shi, Fazhan; Jiang, Liang; Du, Jiangfeng
2015-08-21
Counterfactual computation (CFC) exemplifies the fascinating quantum process by which the result of a computation may be learned without actually running the computer. In previous experimental studies, the counterfactual efficiency is limited to below 50%. Here we report an experimental realization of the generalized CFC protocol, in which the counterfactual efficiency can break the 50% limit and even approach unity in principle. The experiment is performed with the spins of a negatively charged nitrogen-vacancy color center in diamond. Taking advantage of the quantum Zeno effect, the computer can remain in the not-running subspace due to the frequent projection by the environment, while the computation result can be revealed by final detection. The counterfactual efficiency up to 85% has been demonstrated in our experiment, which opens the possibility of many exciting applications of CFC, such as high-efficiency quantum integration and imaging.
Experimental Realization of High-Efficiency Counterfactual Computation
NASA Astrophysics Data System (ADS)
Kong, Fei; Ju, Chenyong; Huang, Pu; Wang, Pengfei; Kong, Xi; Shi, Fazhan; Jiang, Liang; Du, Jiangfeng
2015-08-01
Counterfactual computation (CFC) exemplifies the fascinating quantum process by which the result of a computation may be learned without actually running the computer. In previous experimental studies, the counterfactual efficiency is limited to below 50%. Here we report an experimental realization of the generalized CFC protocol, in which the counterfactual efficiency can break the 50% limit and even approach unity in principle. The experiment is performed with the spins of a negatively charged nitrogen-vacancy color center in diamond. Taking advantage of the quantum Zeno effect, the computer can remain in the not-running subspace due to the frequent projection by the environment, while the computation result can be revealed by final detection. The counterfactual efficiency up to 85% has been demonstrated in our experiment, which opens the possibility of many exciting applications of CFC, such as high-efficiency quantum integration and imaging.
Efficient Implementation of the Pairing on Mobilephones Using BREW
NASA Astrophysics Data System (ADS)
Yoshitomi, Motoi; Takagi, Tsuyoshi; Kiyomoto, Shinsaku; Tanaka, Toshiaki
Pairing based cryptosystems can accomplish novel security applications such as ID-based cryptosystems, which have not been constructed efficiently without the pairing. The processing speed of the pairing based cryptosystems is relatively slow compared with the other conventional public key cryptosystems. However, several efficient algorithms for computing the pairing have been proposed, namely Duursma-Lee algorithm and its variant ηT pairing. In this paper, we present an efficient implementation of the pairing over some mobilephones. Moreover, we compare the processing speed of the pairing with that of the other standard public key cryptosystems, i. e. RSA cryptosystem and elliptic curve cryptosystem. Indeed the processing speed of our implementation in ARM9 processors on BREW achieves under 100 milliseconds using the supersingular curve over F397. In addition, the pairing is more efficient than the other public key cryptosystems, and the pairing can be achieved enough also on BREW mobilephones. It has become efficient enough to implement security applications, such as short signature, ID-based cryptosystems or broadcast encryption, using the pairing on BREW mobilephones.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Proteinortho: detection of (co-)orthologs in large-scale analysis.
Lechner, Marcus; Findeiss, Sven; Steiner, Lydia; Marz, Manja; Stadler, Peter F; Prohaska, Sonja J
2011-04-28
Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Efficient volumetric estimation from plenoptic data
NASA Astrophysics Data System (ADS)
Anglin, Paul; Reeves, Stanley J.; Thurow, Brian S.
2013-03-01
The commercial release of the Lytro camera, and greater availability of plenoptic imaging systems in general, have given the image processing community cost-effective tools for light-field imaging. While this data is most commonly used to generate planar images at arbitrary focal depths, reconstruction of volumetric fields is also possible. Similarly, deconvolution is a technique that is conventionally used in planar image reconstruction, or deblurring, algorithms. However, when leveraged with the ability of a light-field camera to quickly reproduce multiple focal planes within an imaged volume, deconvolution offers a computationally efficient method of volumetric reconstruction. Related research has shown than light-field imaging systems in conjunction with tomographic reconstruction techniques are also capable of estimating the imaged volume and have been successfully applied to particle image velocimetry (PIV). However, while tomographic volumetric estimation through algorithms such as multiplicative algebraic reconstruction techniques (MART) have proven to be highly accurate, they are computationally intensive. In this paper, the reconstruction problem is shown to be solvable by deconvolution. Deconvolution offers significant improvement in computational efficiency through the use of fast Fourier transforms (FFTs) when compared to other tomographic methods. This work describes a deconvolution algorithm designed to reconstruct a 3-D particle field from simulated plenoptic data. A 3-D extension of existing 2-D FFT-based refocusing techniques is presented to further improve efficiency when computing object focal stacks and system point spread functions (PSF). Reconstruction artifacts are identified; their underlying source and methods of mitigation are explored where possible, and reconstructions of simulated particle fields are provided.
Diamond, Alan; Nowotny, Thomas; Schmuker, Michael
2016-01-01
Neuromorphic computing employs models of neuronal circuits to solve computing problems. Neuromorphic hardware systems are now becoming more widely available and “neuromorphic algorithms” are being developed. As they are maturing toward deployment in general research environments, it becomes important to assess and compare them in the context of the applications they are meant to solve. This should encompass not just task performance, but also ease of implementation, speed of processing, scalability, and power efficiency. Here, we report our practical experience of implementing a bio-inspired, spiking network for multivariate classification on three different platforms: the hybrid digital/analog Spikey system, the digital spike-based SpiNNaker system, and GeNN, a meta-compiler for parallel GPU hardware. We assess performance using a standard hand-written digit classification task. We found that whilst a different implementation approach was required for each platform, classification performances remained in line. This suggests that all three implementations were able to exercise the model's ability to solve the task rather than exposing inherent platform limits, although differences emerged when capacity was approached. With respect to execution speed and power consumption, we found that for each platform a large fraction of the computing time was spent outside of the neuromorphic device, on the host machine. Time was spent in a range of combinations of preparing the model, encoding suitable input spiking data, shifting data, and decoding spike-encoded results. This is also where a large proportion of the total power was consumed, most markedly for the SpiNNaker and Spikey systems. We conclude that the simulation efficiency advantage of the assessed specialized hardware systems is easily lost in excessive host-device communication, or non-neuronal parts of the computation. These results emphasize the need to optimize the host-device communication architecture for scalability, maximum throughput, and minimum latency. Moreover, our results indicate that special attention should be paid to minimize host-device communication when designing and implementing networks for efficient neuromorphic computing. PMID:26778950
NASA Astrophysics Data System (ADS)
Popov, Igor; Sukov, Sergey
2018-02-01
A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
Implications of a quadratic stream definition in radiative transfer theory.
NASA Technical Reports Server (NTRS)
Whitney, C.
1972-01-01
An explicit definition of the radiation-stream concept is stated and applied to approximate the integro-differential equation of radiative transfer with a set of twelve coupled differential equations. Computational efficiency is enhanced by distributing the corresponding streams in three-dimensional space in a totally symmetric way. Polarization is then incorporated in this model. A computer program based on the model is briefly compared with a Monte Carlo program for simulation of horizon scans of the earth's atmosphere. It is found to be considerably faster.
Organization and use of a Software/Hardware Avionics Research Program (SHARP)
NASA Technical Reports Server (NTRS)
Karmarkar, J. S.; Kareemi, M. N.
1975-01-01
The organization and use is described of the software/hardware avionics research program (SHARP) developed to duplicate the automatic portion of the STOLAND simulator system, on a general-purpose computer system (i.e., IBM 360). The program's uses are: (1) to conduct comparative evaluation studies of current and proposed airborne and ground system concepts via single run or Monte Carlo simulation techniques, and (2) to provide a software tool for efficient algorithm evaluation and development for the STOLAND avionics computer.
Implementation of a block Lanczos algorithm for Eigenproblem solution of gyroscopic systems
NASA Technical Reports Server (NTRS)
Gupta, Kajal K.; Lawson, Charles L.
1987-01-01
The details of implementation of a general numerical procedure developed for the accurate and economical computation of natural frequencies and associated modes of any elastic structure rotating along an arbitrary axis are described. A block version of the Lanczos algorithm is derived for the solution that fully exploits associated matrix sparsity and employs only real numbers in all relevant computations. It is also capable of determining multiple roots and proves to be most efficient when compared to other, similar, exisiting techniques.
Approximate Green's function methods for HZE transport in multilayered materials
NASA Technical Reports Server (NTRS)
Wilson, John W.; Badavi, Francis F.; Shinn, Judy L.; Costen, Robert C.
1993-01-01
A nonperturbative analytic solution of the high charge and energy (HZE) Green's function is used to implement a computer code for laboratory ion beam transport in multilayered materials. The code is established to operate on the Langley nuclear fragmentation model used in engineering applications. Computational procedures are established to generate linear energy transfer (LET) distributions for a specified ion beam and target for comparison with experimental measurements. The code was found to be highly efficient and compared well with the perturbation approximation.
Validations of CFD against detailed velocity and pressure measurements in water turbine runner flow
NASA Astrophysics Data System (ADS)
Nilsson, H.; Davidson, L.
2003-03-01
This work compares CFD results with experimental results of the flow in two different kinds of water turbine runners. The runners studied are the GAMM Francis runner and the Hölleforsen Kaplan runner. The GAMM Francis runner was used as a test case in the 1989 GAMM Workshop on 3D Computation of Incompressible Internal Flows where the geometry and detailed best efficiency measurements were made available. In addition to the best efficiency measurements, four off-design operating condition measurements are used for the comparisons in this work. The Hölleforsen Kaplan runner was used at the 1999 Turbine 99 and 2001 Turbine 99 - II workshops on draft tube flow, where detailed measurements made after the runner were used as inlet boundary conditions for the draft tube computations. The measurements are used here to validate computations of the flow in the runner.The computations are made in a single runner blade passage where the inlet boundary conditions are obtained from an extrapolation of detailed measurements (GAMM) or from separate guide vane computations (Hölleforsen). The steady flow in a rotating co-ordinate system is computed. The effects of turbulence are modelled by a low-Reynolds number k- turbulence model, which removes some of the assumptions of the commonly used wall function approach and brings the computations one step further.
The Metadata Cloud: The Last Piece of a Distributed Data System Model
NASA Astrophysics Data System (ADS)
King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.
2012-12-01
Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
NASA Astrophysics Data System (ADS)
Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel
2017-07-01
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) [13], we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Karamintziou, Sofia D; Custódio, Ana Luísa; Piallat, Brigitte; Polosan, Mircea; Chabardès, Stéphan; Stathis, Pantelis G; Tagaris, George A; Sakas, Damianos E; Polychronaki, Georgia E; Tsirogiannis, George L; David, Olivier; Nikita, Konstantina S
2017-01-01
Advances in the field of closed-loop neuromodulation call for analysis and modeling approaches capable of confronting challenges related to the complex neuronal response to stimulation and the presence of strong internal and measurement noise in neural recordings. Here we elaborate on the algorithmic aspects of a noise-resistant closed-loop subthalamic nucleus deep brain stimulation system for advanced Parkinson's disease and treatment-refractory obsessive-compulsive disorder, ensuring remarkable performance in terms of both efficiency and selectivity of stimulation, as well as in terms of computational speed. First, we propose an efficient method drawn from dynamical systems theory, for the reliable assessment of significant nonlinear coupling between beta and high-frequency subthalamic neuronal activity, as a biomarker for feedback control. Further, we present a model-based strategy through which optimal parameters of stimulation for minimum energy desynchronizing control of neuronal activity are being identified. The strategy integrates stochastic modeling and derivative-free optimization of neural dynamics based on quadratic modeling. On the basis of numerical simulations, we demonstrate the potential of the presented modeling approach to identify, at a relatively low computational cost, stimulation settings potentially associated with a significantly higher degree of efficiency and selectivity compared with stimulation settings determined post-operatively. Our data reinforce the hypothesis that model-based control strategies are crucial for the design of novel stimulation protocols at the backstage of clinical applications.
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation
NASA Technical Reports Server (NTRS)
Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.
1994-01-01
Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S.; Yoginath, Srikanth B.
Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
Diffused junction p(+)-n solar cells in bulk GaAs. II - Device characterization and modelling
NASA Technical Reports Server (NTRS)
Keeney, R.; Sundaram, L. M. G.; Rode, H.; Bhat, I.; Ghandhi, S. K.; Borrego, J. M.
1984-01-01
The photovoltaic characteristics of p(+)-n junction solar cells fabricated on bulk GaAs by an open tube diffusion technique are presented in detail. Quantum efficiency measurements were analyzed and compared to computer simulations of the cell structure in order to determine material parameters such as diffusion length, surface recombination velocity and junction depth. From the results obtained it is projected that proper optimization of the cell parameters can increase the efficiency of the cells to close to 20 percent.
A hybrid framework for coupling arbitrary summation-by-parts schemes on general meshes
NASA Astrophysics Data System (ADS)
Lundquist, Tomas; Malan, Arnaud; Nordström, Jan
2018-06-01
We develop a general interface procedure to couple both structured and unstructured parts of a hybrid mesh in a non-collocated, multi-block fashion. The target is to gain optimal computational efficiency in fluid dynamics simulations involving complex geometries. While guaranteeing stability, the proposed procedure is optimized for accuracy and requires minimal algorithmic modifications to already existing schemes. Initial numerical investigations confirm considerable efficiency gains compared to non-hybrid calculations of up to an order of magnitude.
Amisaki, Takashi; Toyoda, Shinjiro; Miyagawa, Hiroh; Kitamura, Kunihiro
2003-04-15
Evaluation of long-range Coulombic interactions still represents a bottleneck in the molecular dynamics (MD) simulations of biological macromolecules. Despite the advent of sophisticated fast algorithms, such as the fast multipole method (FMM), accurate simulations still demand a great amount of computation time due to the accuracy/speed trade-off inherently involved in these algorithms. Unless higher order multipole expansions, which are extremely expensive to evaluate, are employed, a large amount of the execution time is still spent in directly calculating particle-particle interactions within the nearby region of each particle. To reduce this execution time for pair interactions, we developed a computation unit (board), called MD-Engine II, that calculates nonbonded pairwise interactions using a specially designed hardware. Four custom arithmetic-processors and a processor for memory manipulation ("particle processor") are mounted on the computation board. The arithmetic processors are responsible for calculation of the pair interactions. The particle processor plays a central role in realizing efficient cooperation with the FMM. The results of a series of 50-ps MD simulations of a protein-water system (50,764 atoms) indicated that a more stringent setting of accuracy in FMM computation, compared with those previously reported, was required for accurate simulations over long time periods. Such a level of accuracy was efficiently achieved using the cooperative calculations of the FMM and MD-Engine II. On an Alpha 21264 PC, the FMM computation at a moderate but tolerable level of accuracy was accelerated by a factor of 16.0 using three boards. At a high level of accuracy, the cooperative calculation achieved a 22.7-fold acceleration over the corresponding conventional FMM calculation. In the cooperative calculations of the FMM and MD-Engine II, it was possible to achieve more accurate computation at a comparable execution time by incorporating larger nearby regions. Copyright 2003 Wiley Periodicals, Inc. J Comput Chem 24: 582-592, 2003
Model Comparison for Electron Thermal Transport
NASA Astrophysics Data System (ADS)
Moses, Gregory; Chenhall, Jeffrey; Cao, Duc; Delettrez, Jacques
2015-11-01
Four electron thermal transport models are compared for their ability to accurately and efficiently model non-local behavior in ICF simulations. Goncharov's transport model has accurately predicted shock timing in implosion simulations but is computationally slow and limited to 1D. The iSNB (implicit Schurtz Nicolai Busquet electron thermal transport method of Cao et al. uses multigroup diffusion to speed up the calculation. Chenhall has expanded upon the iSNB diffusion model to a higher order simplified P3 approximation and a Monte Carlo transport model, to bridge the gap between the iSNB and Goncharov models while maintaining computational efficiency. Comparisons of the above models for several test problems will be presented. This work was supported by Sandia National Laboratory - Albuquerque and the University of Rochester Laboratory for Laser Energetics.
Alternative Modal Basis Selection Procedures for Nonlinear Random Response Simulation
NASA Technical Reports Server (NTRS)
Przekop, Adam; Guo, Xinyun; Rizzi, Stephen A.
2010-01-01
Three procedures to guide selection of an efficient modal basis in a nonlinear random response analysis are examined. One method is based only on proper orthogonal decomposition, while the other two additionally involve smooth orthogonal decomposition. Acoustic random response problems are employed to assess the performance of the three modal basis selection approaches. A thermally post-buckled beam exhibiting snap-through behavior, a shallowly curved arch in the auto-parametric response regime and a plate structure are used as numerical test articles. The results of the three reduced-order analyses are compared with the results of the computationally taxing simulation in the physical degrees of freedom. For the cases considered, all three methods are shown to produce modal bases resulting in accurate and computationally efficient reduced-order nonlinear simulations.
NASA Technical Reports Server (NTRS)
Muellerschoen, R. J.
1988-01-01
A unified method to permute vector-stored upper-triangular diagonal factorized covariance (UD) and vector stored upper-triangular square-root information filter (SRIF) arrays is presented. The method involves cyclical permutation of the rows and columns of the arrays and retriangularization with appropriate square-root-free fast Givens rotations or elementary slow Givens reflections. A minimal amount of computation is performed and only one scratch vector of size N is required, where N is the column dimension of the arrays. To make the method efficient for large SRIF arrays on a virtual memory machine, three additional scratch vectors each of size N are used to avoid expensive paging faults. The method discussed is compared with the methods and routines of Bierman's Estimation Subroutine Library (ESL).
NASA Astrophysics Data System (ADS)
Peng, Xuefeng; Wu, Pinghui; Han, Yinxia; Hu, Guoqiang
2014-11-01
The properties of amplified spontaneous emission (ASE) in CdSe/ZnS quantum dot (QD) doped step-index polymer optical fibers (POFs) were computationally analyzed in this paper. A theoretical model based on the rate equations between two main energy levels of CdSe/ZnS QD was built in terms of time (t), distance traveled by light (z) and wavelength (λ), which can describe the ASE successfully. Through analyzing the spectral evolution with distance of the pulses propagating along the CdSe/ZnS QD doped POFs, dependences of the ASE threshold and the slope efficiency on the numerical aperture were obtained. Compared to the ASE in common dye-doped POFs, the pump threshold was just about 1/1000, but the slope efficiency was much higher.
Toward an efficient Photometric Supernova Classifier
NASA Astrophysics Data System (ADS)
McClain, Bradley
2018-01-01
The Sloan Digital Sky Survey Supernova Survey (SDSS) discovered more than 1,000 Type Ia Supernovae, yet less than half of these have spectroscopic measurements. As wide-field imaging telescopes such as The Dark Energy Survey (DES) and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) discover more supernovae, the need for accurate and computationally cheap photometric classifiers increases. My goal is to use a photometric classification algorithm based on Sncosmo, a python library for supernova cosmology analysis, to reclassify previously identified Hubble SN and other non-spectroscopically confirmed surveys. My results will be compared to other photometric classifiers such as PSNID and STARDUST. In the near future, I expect to have the algorithm validated with simulated data, optimized for efficiency, and applied with high performance computing to real data.
Kruskal-Wallis-based computationally efficient feature selection for face recognition.
Ali Khan, Sajid; Hussain, Ayyaz; Basit, Abdul; Akram, Sheeraz
2014-01-01
Face recognition in today's technological world, and face recognition applications attain much more importance. Most of the existing work used frontal face images to classify face image. However these techniques fail when applied on real world face images. The proposed technique effectively extracts the prominent facial features. Most of the features are redundant and do not contribute to representing face. In order to eliminate those redundant features, computationally efficient algorithm is used to select the more discriminative face features. Extracted features are then passed to classification step. In the classification step, different classifiers are ensemble to enhance the recognition accuracy rate as single classifier is unable to achieve the high accuracy. Experiments are performed on standard face database images and results are compared with existing techniques.
Computational Investigation of Fluidic Counterflow Thrust Vectoring
NASA Technical Reports Server (NTRS)
Hunter, Craig A.; Deere, Karen A.
1999-01-01
A computational study of fluidic counterflow thrust vectoring has been conducted. Two-dimensional numerical simulations were run using the computational fluid dynamics code PAB3D with two-equation turbulence closure and linear Reynolds stress modeling. For validation, computational results were compared to experimental data obtained at the NASA Langley Jet Exit Test Facility. In general, computational results were in good agreement with experimental performance data, indicating that efficient thrust vectoring can be obtained with low secondary flow requirements (less than 1% of the primary flow). An examination of the computational flowfield has revealed new details about the generation of a countercurrent shear layer, its relation to secondary suction, and its role in thrust vectoring. In addition to providing new information about the physics of counterflow thrust vectoring, this work appears to be the first documented attempt to simulate the counterflow thrust vectoring problem using computational fluid dynamics.
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
Adaptive [theta]-methods for pricing American options
NASA Astrophysics Data System (ADS)
Khaliq, Abdul Q. M.; Voss, David A.; Kazmi, Kamran
2008-12-01
We develop adaptive [theta]-methods for solving the Black-Scholes PDE for American options. By adding a small, continuous term, the Black-Scholes PDE becomes an advection-diffusion-reaction equation on a fixed spatial domain. Standard implementation of [theta]-methods would require a Newton-type iterative procedure at each time step thereby increasing the computational complexity of the methods. Our linearly implicit approach avoids such complications. We establish a general framework under which [theta]-methods satisfy a discrete version of the positivity constraint characteristic of American options, and numerically demonstrate the sensitivity of the constraint. The positivity results are established for the single-asset and independent two-asset models. In addition, we have incorporated and analyzed an adaptive time-step control strategy to increase the computational efficiency. Numerical experiments are presented for one- and two-asset American options, using adaptive exponential splitting for two-asset problems. The approach is compared with an iterative solution of the two-asset problem in terms of computational efficiency.
Solution of nonlinear time-dependent PDEs through componentwise approximation of matrix functions
NASA Astrophysics Data System (ADS)
Cibotarica, Alexandru; Lambers, James V.; Palchak, Elisabeth M.
2016-09-01
Exponential propagation iterative (EPI) methods provide an efficient approach to the solution of large stiff systems of ODEs, compared to standard integrators. However, the bulk of the computational effort in these methods is due to products of matrix functions and vectors, which can become very costly at high resolution due to an increase in the number of Krylov projection steps needed to maintain accuracy. In this paper, it is proposed to modify EPI methods by using Krylov subspace spectral (KSS) methods, instead of standard Krylov projection methods, to compute products of matrix functions and vectors. Numerical experiments demonstrate that this modification causes the number of Krylov projection steps to become bounded independently of the grid size, thus dramatically improving efficiency and scalability. As a result, for each test problem featured, as the total number of grid points increases, the growth in computation time is just below linear, while other methods achieved this only on selected test problems or not at all.
A Comparison of Solver Performance for Complex Gastric Electrophysiology Models
Sathar, Shameer; Cheng, Leo K.; Trew, Mark L.
2016-01-01
Computational techniques for solving systems of equations arising in gastric electrophysiology have not been studied for efficient solution process. We present a computationally challenging problem of simulating gastric electrophysiology in anatomically realistic stomach geometries with multiple intracellular and extracellular domains. The multiscale nature of the problem and mesh resolution required to capture geometric and functional features necessitates efficient solution methods if the problem is to be tractable. In this study, we investigated and compared several parallel preconditioners for the linear systems arising from tetrahedral discretisation of electrically isotropic and anisotropic problems, with and without stimuli. The results showed that the isotropic problem was computationally less challenging than the anisotropic problem and that the application of extracellular stimuli increased workload considerably. Preconditioning based on block Jacobi and algebraic multigrid solvers were found to have the best overall solution times and least iteration counts, respectively. The algebraic multigrid preconditioner would be expected to perform better on large problems. PMID:26736543
NASA Technical Reports Server (NTRS)
Rogers, S. E.; Kwak, D.; Chang, J. L. C.
1986-01-01
The method of pseudocompressibility has been shown to be an efficient method for obtaining a steady-state solution to the incompressible Navier-Stokes equations. Recent improvements to this method include the use of a diagonal scheme for the inversion of the equations at each iteration. The necessary transformations have been derived for the pseudocompressibility equations in generalized coordinates. The diagonal algorithm reduces the computing time necessary to obtain a steady-state solution by a factor of nearly three. Implicit viscous terms are maintained in the equations, and it has become possible to use fourth-order implicit dissipation. The steady-state solution is unchanged by the approximations resulting from the diagonalization of the equations. Computed results for flow over a two-dimensional backward-facing step and a three-dimensional cylinder mounted normal to a flat plate are presented for both the old and new algorithms. The accuracy and computing efficiency of these algorithms are compared.
Łazarski, Roman; Burow, Asbjörn Manfred; Grajciar, Lukáš; Sierka, Marek
2016-10-30
A full implementation of analytical energy gradients for molecular and periodic systems is reported in the TURBOMOLE program package within the framework of Kohn-Sham density functional theory using Gaussian-type orbitals as basis functions. Its key component is a combination of density fitting (DF) approximation and continuous fast multipole method (CFMM) that allows for an efficient calculation of the Coulomb energy gradient. For exchange-correlation part the hierarchical numerical integration scheme (Burow and Sierka, Journal of Chemical Theory and Computation 2011, 7, 3097) is extended to energy gradients. Computational efficiency and asymptotic O(N) scaling behavior of the implementation is demonstrated for various molecular and periodic model systems, with the largest unit cell of hematite containing 640 atoms and 19,072 basis functions. The overall computational effort of energy gradient is comparable to that of the Kohn-Sham matrix formation. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Newman, M. B.; Pipano, A.
1973-01-01
A new eigensolution routine, FEER (Fast Eigensolution Extraction Routine), used in conjunction with NASTRAN at Israel Aircraft Industries is described. The FEER program is based on an automatic matrix reduction scheme whereby the lower modes of structures with many degrees of freedom can be accurately extracted from a tridiagonal eigenvalue problem whose size is of the same order of magnitude as the number of required modes. The process is effected without arbitrary lumping of masses at selected node points or selection of nodes to be retained in the analysis set. The results of computational efficiency studies are presented, showing major arithmetic operation counts and actual computer run times of FEER as compared to other methods of eigenvalue extraction, including those available in the NASTRAN READ module. It is concluded that the tridiagonal reduction method used in FEER would serve as a valuable addition to NASTRAN for highly increased efficiency in obtaining structural vibration modes.
Ong, Eng Teo; Lee, Heow Pueh; Lim, Kian Meng
2004-09-01
This article presents a fast algorithm for the efficient solution of the Helmholtz equation. The method is based on the translation theory of the multipole expansions. Here, the speedup comes from the convolution nature of the translation operators, which can be evaluated rapidly using fast Fourier transform algorithms. Also, the computations of the translation operators are accelerated by using the recursive formulas developed recently by Gumerov and Duraiswami [SIAM J. Sci. Comput. 25, 1344-1381(2003)]. It is demonstrated that the algorithm can produce good accuracy with a relatively low order of expansion. Efficiency analyses of the algorithm reveal that it has computational complexities of O(Na), where a ranges from 1.05 to 1.24. However, this method requires substantially more memory to store the translation operators as compared to the fast multipole method. Hence, despite its simplicity in implementation, this memory requirement issue may limit the application of this algorithm to solving very large-scale problems.
A computable expression of closure to efficient causation.
Mossio, Matteo; Longo, Giuseppe; Stewart, John
2009-04-07
In this paper, we propose a mathematical expression of closure to efficient causation in terms of lambda-calculus; we argue that this opens up the perspective of developing principled computer simulations of systems closed to efficient causation in an appropriate programming language. An important implication of our formulation is that, by exhibiting an expression in lambda-calculus, which is a paradigmatic formalism for computability and programming, we show that there are no conceptual or principled problems in realizing a computer simulation or model of closure to efficient causation. We conclude with a brief discussion of the question whether closure to efficient causation captures all relevant properties of living systems. We suggest that it might not be the case, and that more complex definitions could indeed create crucial some obstacles to computability.
A power-efficient ZF precoding scheme for multi-user indoor visible light communication systems
NASA Astrophysics Data System (ADS)
Zhao, Qiong; Fan, Yangyu; Deng, Lijun; Kang, Bochao
2017-02-01
In this study, we propose a power-efficient ZF precoding scheme for visible light communication (VLC) downlink multi-user multiple-input-single-output (MU-MISO) systems, which incorporates the zero-forcing (ZF) and the characteristics of VLC systems. The main idea of this scheme is that the channel matrix used to perform pseudoinverse comes from the set of optical Access Points (APs) shared by more than one user, instead of the set of all involved serving APs as the existing ZF precoding schemes often used. By doing this, the waste of power, which is caused by the transmission of one user's data in the un-serving APs, can be avoided. In addition, the size of the channel matrix needs to perform pseudoinverse becomes smaller, which helps to reduce the computation complexity. Simulation results in two scenarios show that the proposed ZF precoding scheme has higher power efficiency, better bit error rate (BER) performance and lower computation complexity compared with traditional ZF precoding schemes.
Initial comparison of single cylinder Stirling engine computer model predictions with test results
NASA Technical Reports Server (NTRS)
Tew, R. C., Jr.; Thieme, L. G.; Miao, D.
1979-01-01
A NASA developed digital computer code for a Stirling engine, modelling the performance of a single cylinder rhombic drive ground performance unit (GPU), is presented and its predictions are compared to test results. The GPU engine incorporates eight regenerator/cooler units and the engine working space is modelled by thirteen control volumes. The model calculates indicated power and efficiency for a given engine speed, mean pressure, heater and expansion space metal temperatures and cooler water inlet temperature and flow rate. Comparison of predicted and observed powers implies that the reference pressure drop calculations underestimate actual pressure drop, possibly due to oil contamination in the regenerator/cooler units, methane contamination in the working gas or the underestimation of mechanical loss. For a working gas of hydrogen, the predicted values of brake power are from 0 to 6% higher than experimental values, and brake efficiency is 6 to 16% higher, while for helium the predicted brake power and efficiency are 2 to 15% higher than the experimental.
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
An efficient algorithm using matrix methods to solve wind tunnel force-balance equations
NASA Technical Reports Server (NTRS)
Smith, D. L.
1972-01-01
An iterative procedure applying matrix methods to accomplish an efficient algorithm for automatic computer reduction of wind-tunnel force-balance data has been developed. Balance equations are expressed in a matrix form that is convenient for storing balance sensitivities and interaction coefficient values for online or offline batch data reduction. The convergence of the iterative values to a unique solution of this system of equations is investigated, and it is shown that for balances which satisfy the criteria discussed, this type of solution does occur. Methods for making sensitivity adjustments and initial load effect considerations in wind-tunnel applications are also discussed, and the logic for determining the convergence accuracy limits for the iterative solution is given. This more efficient data reduction program is compared with the technique presently in use at the NASA Langley Research Center, and computational times on the order of one-third or less are demonstrated by use of this new program.
Efficient detection of a CW signal with a linear frequency drift
NASA Technical Reports Server (NTRS)
Swarztrauber, Paul N.; Bailey, David H.
1989-01-01
An efficient method is presented for the detection of a continuous wave (CW) signal with a frequency drift that is linear in time. Signals of this type occur in transmissions between any two locations that are accelerating relative to one another, e.g., transmissions from the Voyager spacecraft. We assume that both the frequency and the drift are unknown. We also assume that the signal is weak compared to the Gaussian noise. The signal is partitioned into subsequences whose discrete Fourier transforms provide a sequence of instantaneous spectra at equal time intervals. These spectra are then accumulated with a shift that is proportional to time. When the shift is equal to the frequency drift, the signal to noise ratio increases and detection occurs. Here, we show how to compute these accumulations for many shifts in an efficient manner using a variety of Fast Fourier Transformations (FFT). Computing time is proportional to L log L where L is the length of the time series.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3
NASA Technical Reports Server (NTRS)
Lin, Shu
1998-01-01
Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
Direct numerical simulation of particulate flows with an overset grid method
NASA Astrophysics Data System (ADS)
Koblitz, A. R.; Lovett, S.; Nikiforakis, N.; Henshaw, W. D.
2017-08-01
We evaluate an efficient overset grid method for two-dimensional and three-dimensional particulate flows for small numbers of particles at finite Reynolds number. The rigid particles are discretised using moving overset grids overlaid on a Cartesian background grid. This allows for strongly-enforced boundary conditions and local grid refinement at particle surfaces, thereby accurately capturing the viscous boundary layer at modest computational cost. The incompressible Navier-Stokes equations are solved with a fractional-step scheme which is second-order-accurate in space and time, while the fluid-solid coupling is achieved with a partitioned approach including multiple sub-iterations to increase stability for light, rigid bodies. Through a series of benchmark studies we demonstrate the accuracy and efficiency of this approach compared to other boundary conformal and static grid methods in the literature. In particular, we find that fully resolving boundary layers at particle surfaces is crucial to obtain accurate solutions to many common test cases. With our approach we are able to compute accurate solutions using as little as one third the number of grid points as uniform grid computations in the literature. A detailed convergence study shows a 13-fold decrease in CPU time over a uniform grid test case whilst maintaining comparable solution accuracy.
Progress of High Efficiency Centrifugal Compressor Simulations Using TURBO
NASA Technical Reports Server (NTRS)
Kulkarni, Sameer; Beach, Timothy A.
2017-01-01
Three-dimensional, time-accurate, and phase-lagged computational fluid dynamics (CFD) simulations of the High Efficiency Centrifugal Compressor (HECC) stage were generated using the TURBO solver. Changes to the TURBO Parallel Version 4 source code were made in order to properly model the no-slip boundary condition along the spinning hub region for centrifugal impellers. A startup procedure was developed to generate a converged flow field in TURBO. This procedure initialized computations on a coarsened mesh generated by the Turbomachinery Gridding System (TGS) and relied on a method of systematically increasing wheel speed and backpressure. Baseline design-speed TURBO results generally overpredicted total pressure ratio, adiabatic efficiency, and the choking flow rate of the HECC stage as compared with the design-intent CFD results of Code Leo. Including diffuser fillet geometry in the TURBO computation resulted in a 0.6 percent reduction in the choking flow rate and led to a better match with design-intent CFD. Diffuser fillets reduced annulus cross-sectional area but also reduced corner separation, and thus blockage, in the diffuser passage. It was found that the TURBO computations are somewhat insensitive to inlet total pressure changing from the TURBO default inlet pressure of 14.7 pounds per square inch (101.35 kilopascals) down to 11.0 pounds per square inch (75.83 kilopascals), the inlet pressure of the component test. Off-design tip clearance was modeled in TURBO in two computations: one in which the blade tip geometry was trimmed by 12 mils (0.3048 millimeters), and another in which the hub flow path was moved to reflect a 12-mil axial shift in the impeller hub, creating a step at the hub. The one-dimensional results of these two computations indicate non-negligible differences between the two modeling approaches.
CREKID: A computer code for transient, gas-phase combustion of kinetics
NASA Technical Reports Server (NTRS)
Pratt, D. T.; Radhakrishnan, K.
1984-01-01
A new algorithm was developed for fast, automatic integration of chemical kinetic rate equations describing homogeneous, gas-phase combustion at constant pressure. Particular attention is paid to the distinguishing physical and computational characteristics of the induction, heat-release and equilibration regimes. The two-part predictor-corrector algorithm, based on an exponentially-fitted trapezoidal rule, includes filtering of ill-posed initial conditions, automatic selection of Newton-Jacobi or Newton iteration for convergence to achieve maximum computational efficiency while observing a prescribed error tolerance. The new algorithm was found to compare favorably with LSODE on two representative test problems drawn from combustion kinetics.
Cloud-based large-scale air traffic flow optimization
NASA Astrophysics Data System (ADS)
Cao, Yi
The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
Hierarchical layered and semantic-based image segmentation using ergodicity map
NASA Astrophysics Data System (ADS)
Yadegar, Jacob; Liu, Xiaoqing
2010-04-01
Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.
Scaling vectors of attoJoule per bit modulators
NASA Astrophysics Data System (ADS)
Sorger, Volker J.; Amin, Rubab; Khurgin, Jacob B.; Ma, Zhizhen; Dalir, Hamed; Khan, Sikandar
2018-01-01
Electro-optic modulation performs the conversion between the electrical and optical domain with applications in data communication for optical interconnects, but also for novel optical computing algorithms such as providing nonlinearity at the output stage of optical perceptrons in neuromorphic analog optical computing. While resembling an optical transistor, the weak light-matter-interaction makes modulators 105 times larger compared to their electronic counterparts. Since the clock frequency for photonics on-chip has a power-overhead sweet-spot around tens of GHz, ultrafast modulation may only be required in long-distance communication, not for short on-chip links. Hence, the search is open for power-efficient on-chip modulators beyond the solutions offered by foundries to date. Here, we show scaling vectors towards atto-Joule per bit efficient modulators on-chip as well as some experimental demonstrations of novel plasmonic modulators with sub-fJ/bit efficiencies. Our parametric study of placing different actively modulated materials into plasmonic versus photonic optical modes shows that 2D materials overcompensate their miniscule modal overlap by their unity-high index change. Furthermore, we reveal that the metal used in plasmonic-based modulators not only serves as an electrical contact, but also enables low electrical series resistances leading to near-ideal capacitors. We then discuss the first experimental demonstration of a photon-plasmon-hybrid graphene-based electro-absorption modulator on silicon. The device shows a sub-1 V steep switching enabled by near-ideal electrostatics delivering a high 0.05 dB V-1 μm-1 performance requiring only 110 aJ/bit. Improving on this demonstration, we discuss a plasmonic slot-based graphene modulator design, where the polarization of the plasmonic mode aligns with graphene’s in-plane dimension; where a push-pull dual-gating scheme enables 2 dB V-1 μm-1 efficient modulation allowing the device to be just 770 nm short for 3 dB small signal modulation. Lastly, comparing the switching energy of transistors to modulators shows that modulators based on emerging materials and plasmonic-silicon hybrid integration perform on-par relative to their electronic counter parts. This in turn allows for a device-enabled two orders-of-magnitude improvement of electrical-optical co-integrated network-on-chips over electronic-only architectures. The latter opens technological opportunities in cognitive computing, dynamic data-driven applications systems, and optical analog computer engines including neuromorphic photonic computing.
Valuation of exotic options in the framework of Levy processes
NASA Astrophysics Data System (ADS)
Milev, Mariyan; Georgieva, Svetla; Markovska, Veneta
2013-12-01
In this paper we explore a straightforward procedure to price derivatives by using the Monte Carlo approach when the underlying process is a jump-diffusion. We have compared the Black-Scholes model with one of its extensions that is the Merton model. The latter model is better in capturing the market's phenomena and is comparative to stochastic volatility models in terms of pricing accuracy. We have presented simulations of asset paths and pricing of barrier options for both Geometric Brownian motion and exponential Levy processes as it is the concrete case of the Merton model. A desired level of accuracy is obtained with simple computer operations in MATLAB for efficient computational time.
CBESW: sequence alignment on the Playstation 3.
Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil
2008-09-17
The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. For large datasets, our implementation on the PlayStation 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. The results from our experiments demonstrate that the PlayStation 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.
CBESW: Sequence Alignment on the Playstation 3
Wirawan, Adrianto; Kwoh, Chee Keong; Hieu, Nim Tri; Schmidt, Bertil
2008-01-01
Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications. PMID:18798993
Neural Network Training by Integration of Adjoint Systems of Equations Forward in Time
NASA Technical Reports Server (NTRS)
Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)
1999-01-01
A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically. it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved. but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. Tbc trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
Neural network training by integration of adjoint systems of equations forward in time
NASA Technical Reports Server (NTRS)
Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)
1992-01-01
A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically, it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved, but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. The trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Patrick
2014-01-31
The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla
2016-11-01
Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
Amat-ur-Rasool, Hafsa; Ahmed, Mehboob
2015-01-01
Alzheimer's disease (AD), a big cause of memory loss, is a progressive neurodegenerative disorder. The disease leads to irreversible loss of neurons that result in reduced level of acetylcholine neurotransmitter (ACh). The reduction of ACh level impairs brain functioning. One aspect of AD therapy is to maintain ACh level up to a safe limit, by blocking acetylcholinesterase (AChE), an enzyme that is naturally responsible for its degradation. This research presents an in-silico screening and designing of hAChE inhibitors as potential anti-Alzheimer drugs. Molecular docking results of the database retrieved (synthetic chemicals and dietary phytochemicals) and self-drawn ligands were compared with Food and Drug Administration (FDA) approved drugs against AD as controls. Furthermore, computational ADME studies were performed on the hits to assess their safety. Human AChE was found to be most approptiate target site as compared to commonly used Torpedo AChE. Among the tested dietry phytochemicals, berberastine, berberine, yohimbine, sanguinarine, elemol and naringenin are the worth mentioning phytochemicals as potential anti-Alzheimer drugs The synthetic leads were mostly dual binding site inhibitors with two binding subunits linked by a carbon chain i.e. second generation AD drugs. Fifteen new heterodimers were designed that were computationally more efficient inhibitors than previously reported compounds. Using computational methods, compounds present in online chemical databases can be screened to design more efficient and safer drugs against cognitive symptoms of AD. PMID:26325402
Amat-Ur-Rasool, Hafsa; Ahmed, Mehboob
2015-01-01
Alzheimer's disease (AD), a big cause of memory loss, is a progressive neurodegenerative disorder. The disease leads to irreversible loss of neurons that result in reduced level of acetylcholine neurotransmitter (ACh). The reduction of ACh level impairs brain functioning. One aspect of AD therapy is to maintain ACh level up to a safe limit, by blocking acetylcholinesterase (AChE), an enzyme that is naturally responsible for its degradation. This research presents an in-silico screening and designing of hAChE inhibitors as potential anti-Alzheimer drugs. Molecular docking results of the database retrieved (synthetic chemicals and dietary phytochemicals) and self-drawn ligands were compared with Food and Drug Administration (FDA) approved drugs against AD as controls. Furthermore, computational ADME studies were performed on the hits to assess their safety. Human AChE was found to be most approptiate target site as compared to commonly used Torpedo AChE. Among the tested dietry phytochemicals, berberastine, berberine, yohimbine, sanguinarine, elemol and naringenin are the worth mentioning phytochemicals as potential anti-Alzheimer drugs The synthetic leads were mostly dual binding site inhibitors with two binding subunits linked by a carbon chain i.e. second generation AD drugs. Fifteen new heterodimers were designed that were computationally more efficient inhibitors than previously reported compounds. Using computational methods, compounds present in online chemical databases can be screened to design more efficient and safer drugs against cognitive symptoms of AD.
Dahlö, Martin; Scofield, Douglas G; Schaal, Wesley; Spjuth, Ola
2018-05-01
Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
2018-01-01
Abstract Background Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. Results The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases. PMID:29659792
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDowell, Rocklan; Dutech, Guy
2014-09-01
Ordered mesoporous carbon (OMC) nanoscaffolds are engineered agglomerates of carbon nanotubes held together by small carbon nanofibers with uniform pore sizes, high pore volume, and high channel permeability. These materials exhibit very high affinity for the adsorption of gold from aqueous acidic mixtures. The efficiency of gold recovery is comparable to those typically accomplished using biopolymer-based adsorbents. The adsorption efficiency for other precious metals such as palladium and platinum is lower. Studies on the precious metal (Au, Pd) adsorption on OMC materials from actual liquors of leached electronics will be presented. Adsorption properties will be compared for several different sorbentsmore » used for the recovery of precious metals. The leach liquor compositions for three different types of electronic scrap materials (personal computer board, cell phone and tv input/output board) will be presented. The sorption efficiencies for Au, Pd, together with a spectrum of competing and non-competing metals, from such leach mixtures will be compared.« less
The thermodynamic efficiency of computations made in cells across the range of life
NASA Astrophysics Data System (ADS)
Kempes, Christopher P.; Wolpert, David; Cohen, Zachary; Pérez-Mercader, Juan
2017-11-01
Biological organisms must perform computation as they grow, reproduce and evolve. Moreover, ever since Landauer's bound was proposed, it has been known that all computation has some thermodynamic cost-and that the same computation can be achieved with greater or smaller thermodynamic cost depending on how it is implemented. Accordingly an important issue concerning the evolution of life is assessing the thermodynamic efficiency of the computations performed by organisms. This issue is interesting both from the perspective of how close life has come to maximally efficient computation (presumably under the pressure of natural selection), and from the practical perspective of what efficiencies we might hope that engineered biological computers might achieve, especially in comparison with current computational systems. Here we show that the computational efficiency of translation, defined as free energy expended per amino acid operation, outperforms the best supercomputers by several orders of magnitude, and is only about an order of magnitude worse than the Landauer bound. However, this efficiency depends strongly on the size and architecture of the cell in question. In particular, we show that the useful efficiency of an amino acid operation, defined as the bulk energy per amino acid polymerization, decreases for increasing bacterial size and converges to the polymerization cost of the ribosome. This cost of the largest bacteria does not change in cells as we progress through the major evolutionary shifts to both single- and multicellular eukaryotes. However, the rates of total computation per unit mass are non-monotonic in bacteria with increasing cell size, and also change across different biological architectures, including the shift from unicellular to multicellular eukaryotes. This article is part of the themed issue 'Reconceptualizing the origins of life'.
Air-gas exchange reevaluated: clinically important results of a computer simulation.
Shunmugam, Manoharan; Shunmugam, Sudhakaran; Williamson, Tom H; Laidlaw, D Alistair
2011-10-21
The primary aim of this study was to evaluate the efficiency of air-gas exchange techniques and the factors that influence the final concentration of an intraocular gas tamponade. Parameters were varied to find the optimum method of performing an air-gas exchange in ideal circumstances. A computer model of the eye was designed using 3D software with fluid flow analysis capabilities. Factors such as angular distance between ports, gas infusion gauge, exhaust vent gauge and depth were varied in the model. Flow rate and axial length were also modulated to simulate faster injections and more myopic eyes, respectively. The flush volume of gas required to achieve a 97% intraocular gas fraction concentration were compared. Modulating individual factors did not reveal any clinically significant difference in the angular distance between ports, exhaust vent size, and depth or rate of gas injection. In combination, however, there was a 28% increase in air-gas exchange efficiency comparing the most efficient with the least efficient studied parameters in this model. The gas flush volume required to achieve a 97% gas fill also increased proportionately at a ratio of 5.5 to 6.2 times the volume of the eye. A 35-mL flush is adequate for eyes up to 25 mm in axial length; however, eyes longer than this would require a much greater flush volume, and surgeons should consider using two separate 50-mL gas syringes to ensure optimal gas concentration for eyes greater than 25 mm in axial length.
Search for Directed Networks by Different Random Walk Strategies
NASA Astrophysics Data System (ADS)
Zhu, Zi-Qi; Jin, Xiao-Ling; Huang, Zhi-Long
2012-03-01
A comparative study is carried out on the efficiency of five different random walk strategies searching on directed networks constructed based on several typical complex networks. Due to the difference in search efficiency of the strategies rooted in network clustering, the clustering coefficient in a random walker's eye on directed networks is defined and computed to be half of the corresponding undirected networks. The search processes are performed on the directed networks based on Erdös—Rényi model, Watts—Strogatz model, Barabási—Albert model and clustered scale-free network model. It is found that self-avoiding random walk strategy is the best search strategy for such directed networks. Compared to unrestricted random walk strategy, path-iteration-avoiding random walks can also make the search process much more efficient. However, no-triangle-loop and no-quadrangle-loop random walks do not improve the search efficiency as expected, which is different from those on undirected networks since the clustering coefficient of directed networks are smaller than that of undirected networks.
NASA Technical Reports Server (NTRS)
1979-01-01
The current program had the objective to modify a discrete vortex wake method to efficiently compute the aerodynamic forces and moments on high fineness ratio bodies (f approximately 10.0). The approach is to increase computational efficiency by structuring the program to take advantage of new computer vector software and by developing new algorithms when vector software can not efficiently be used. An efficient program was written and substantial savings achieved. Several test cases were run for fineness ratios up to f = 16.0 and angles of attack up to 50 degrees.
Protecting genomic data analytics in the cloud: state of the art and opportunities.
Tang, Haixu; Jiang, Xiaoqian; Wang, Xiaofeng; Wang, Shuang; Sofia, Heidi; Fox, Dov; Lauter, Kristin; Malin, Bradley; Telenti, Amalio; Xiong, Li; Ohno-Machado, Lucila
2016-10-13
The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, 'anonymization' and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.
NASA Astrophysics Data System (ADS)
Wang, Jinting; Lu, Liqiao; Zhu, Fei
2018-01-01
Finite element (FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations (RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time (TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method (CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ (λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.
NASA Astrophysics Data System (ADS)
Jaber, Khalid Mohammad; Alia, Osama Moh'd.; Shuaib, Mohammed Mahmod
2018-03-01
Finding the optimal parameters that can reproduce experimental data (such as the velocity-density relation and the specific flow rate) is a very important component of the validation and calibration of microscopic crowd dynamic models. Heavy computational demand during parameter search is a known limitation that exists in a previously developed model known as the Harmony Search-Based Social Force Model (HS-SFM). In this paper, a parallel-based mechanism is proposed to reduce the computational time and memory resource utilisation required to find these parameters. More specifically, two MATLAB-based multicore techniques (parfor and create independent jobs) using shared memory are developed by taking advantage of the multithreading capabilities of parallel computing, resulting in a new framework called the Parallel Harmony Search-Based Social Force Model (P-HS-SFM). The experimental results show that the parfor-based P-HS-SFM achieved a better computational time of about 26 h, an efficiency improvement of ? 54% and a speedup factor of 2.196 times in comparison with the HS-SFM sequential processor. The performance of the P-HS-SFM using the create independent jobs approach is also comparable to parfor with a computational time of 26.8 h, an efficiency improvement of about 30% and a speedup of 2.137 times.
Solving Constraint Satisfaction Problems with Networks of Spiking Neurons
Jonke, Zeno; Habenschuss, Stefan; Maass, Wolfgang
2016-01-01
Network of neurons in the brain apply—unlike processors in our current generation of computer hardware—an event-based processing strategy, where short pulses (spikes) are emitted sparsely by neurons to signal the occurrence of an event at a particular point in time. Such spike-based computations promise to be substantially more power-efficient than traditional clocked processing schemes. However, it turns out to be surprisingly difficult to design networks of spiking neurons that can solve difficult computational problems on the level of single spikes, rather than rates of spikes. We present here a new method for designing networks of spiking neurons via an energy function. Furthermore, we show how the energy function of a network of stochastically firing neurons can be shaped in a transparent manner by composing the networks of simple stereotypical network motifs. We show that this design approach enables networks of spiking neurons to produce approximate solutions to difficult (NP-hard) constraint satisfaction problems from the domains of planning/optimization and verification/logical inference. The resulting networks employ noise as a computational resource. Nevertheless, the timing of spikes plays an essential role in their computations. Furthermore, networks of spiking neurons carry out for the Traveling Salesman Problem a more efficient stochastic search for good solutions compared with stochastic artificial neural networks (Boltzmann machines) and Gibbs sampling. PMID:27065785
NASA Astrophysics Data System (ADS)
Guo, Jie; Zhu, Chang`an
2016-01-01
The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
Quantum computing on encrypted data
NASA Astrophysics Data System (ADS)
Fisher, K. A. G.; Broadbent, A.; Shalm, L. K.; Yan, Z.; Lavoie, J.; Prevedel, R.; Jennewein, T.; Resch, K. J.
2014-01-01
The ability to perform computations on encrypted data is a powerful tool for protecting privacy. Recently, protocols to achieve this on classical computing systems have been found. Here, we present an efficient solution to the quantum analogue of this problem that enables arbitrary quantum computations to be carried out on encrypted quantum data. We prove that an untrusted server can implement a universal set of quantum gates on encrypted quantum bits (qubits) without learning any information about the inputs, while the client, knowing the decryption key, can easily decrypt the results of the computation. We experimentally demonstrate, using single photons and linear optics, the encryption and decryption scheme on a set of gates sufficient for arbitrary quantum computations. As our protocol requires few extra resources compared with other schemes it can be easily incorporated into the design of future quantum servers. These results will play a key role in enabling the development of secure distributed quantum systems.
Computational Fragment-Based Drug Design: Current Trends, Strategies, and Applications.
Bian, Yuemin; Xie, Xiang-Qun Sean
2018-04-09
Fragment-based drug design (FBDD) has become an effective methodology for drug development for decades. Successful applications of this strategy brought both opportunities and challenges to the field of Pharmaceutical Science. Recent progress in the computational fragment-based drug design provide an additional approach for future research in a time- and labor-efficient manner. Combining multiple in silico methodologies, computational FBDD possesses flexibilities on fragment library selection, protein model generation, and fragments/compounds docking mode prediction. These characteristics provide computational FBDD superiority in designing novel and potential compounds for a certain target. The purpose of this review is to discuss the latest advances, ranging from commonly used strategies to novel concepts and technologies in computational fragment-based drug design. Particularly, in this review, specifications and advantages are compared between experimental and computational FBDD, and additionally, limitations and future prospective are discussed and emphasized.
Quantum computing on encrypted data.
Fisher, K A G; Broadbent, A; Shalm, L K; Yan, Z; Lavoie, J; Prevedel, R; Jennewein, T; Resch, K J
2014-01-01
The ability to perform computations on encrypted data is a powerful tool for protecting privacy. Recently, protocols to achieve this on classical computing systems have been found. Here, we present an efficient solution to the quantum analogue of this problem that enables arbitrary quantum computations to be carried out on encrypted quantum data. We prove that an untrusted server can implement a universal set of quantum gates on encrypted quantum bits (qubits) without learning any information about the inputs, while the client, knowing the decryption key, can easily decrypt the results of the computation. We experimentally demonstrate, using single photons and linear optics, the encryption and decryption scheme on a set of gates sufficient for arbitrary quantum computations. As our protocol requires few extra resources compared with other schemes it can be easily incorporated into the design of future quantum servers. These results will play a key role in enabling the development of secure distributed quantum systems.
On the Computation of Comprehensive Boolean Gröbner Bases
NASA Astrophysics Data System (ADS)
Inoue, Shutaro
We show that a comprehensive Boolean Gröbner basis of an ideal I in a Boolean polynomial ring B (bar A,bar X) with main variables bar X and parameters bar A can be obtained by simply computing a usual Boolean Gröbner basis of I regarding both bar X and bar A as variables with a certain block term order such that bar X ≫ bar A. The result together with a fact that a finite Boolean ring is isomorphic to a direct product of the Galois field mathbb{GF}_2 enables us to compute a comprehensive Boolean Gröbner basis by only computing corresponding Gröbner bases in a polynomial ring over mathbb{GF}_2. Our implementation in a computer algebra system Risa/Asir shows that our method is extremely efficient comparing with existing computation algorithms of comprehensive Boolean Gröbner bases.
The set of commercially available chemical substances in commerce that may have significant global warming potential (GWP) is not well defined. Although there are currently over 200 chemicals with high GWP reported by the Intergovernmental Panel on Climate Change, World Meteorological Organization, or Environmental Protection Agency, there may be hundreds of additional chemicals that may also have significant GWP. Evaluation of various approaches to estimate radiative efficiency (RE) and atmospheric lifetime will help to refine GWP estimates for compounds where no measured IR spectrum is available. This study compares values of RE calculated using computational chemistry techniques for 235 chemical compounds against the best available values. It is important to assess the reliability of the underlying computational methods for computing RE to understand the sources of deviations from the best available values. Computed vibrational frequency data is used to estimate RE values using several Pinnock-type models. The values derived using these models are found to be in reasonable agreement with reported RE values (though significant improvement is obtained through scaling). The effect of varying the computational method and basis set used to calculate the frequency data is also discussed. It is found that the vibrational intensities have a strong dependence on basis set and are largely responsible for differences in computed values of RE in this study. Deviations of
CCOMP: An efficient algorithm for complex roots computation of determinantal equations
NASA Astrophysics Data System (ADS)
Zouros, Grigorios P.
2018-01-01
In this paper a free Python algorithm, entitled CCOMP (Complex roots COMPutation), is developed for the efficient computation of complex roots of determinantal equations inside a prescribed complex domain. The key to the method presented is the efficient determination of the candidate points inside the domain which, in their close neighborhood, a complex root may lie. Once these points are detected, the algorithm proceeds to a two-dimensional minimization problem with respect to the minimum modulus eigenvalue of the system matrix. In the core of CCOMP exist three sub-algorithms whose tasks are the efficient estimation of the minimum modulus eigenvalues of the system matrix inside the prescribed domain, the efficient computation of candidate points which guarantee the existence of minima, and finally, the computation of minima via bound constrained minimization algorithms. Theoretical results and heuristics support the development and the performance of the algorithm, which is discussed in detail. CCOMP supports general complex matrices, and its efficiency, applicability and validity is demonstrated to a variety of microwave applications.
Calvello, Simone; Piccardo, Matteo; Rao, Shashank Vittal; Soncini, Alessandro
2018-03-05
We have developed and implemented a new ab initio code, Ceres (Computational Emulator of Rare Earth Systems), completely written in C++11, which is dedicated to the efficient calculation of the electronic structure and magnetic properties of the crystal field states arising from the splitting of the ground state spin-orbit multiplet in lanthanide complexes. The new code gains efficiency via an optimized implementation of a direct configurational averaged Hartree-Fock (CAHF) algorithm for the determination of 4f quasi-atomic active orbitals common to all multi-electron spin manifolds contributing to the ground spin-orbit multiplet of the lanthanide ion. The new CAHF implementation is based on quasi-Newton convergence acceleration techniques coupled to an efficient library for the direct evaluation of molecular integrals, and problem-specific density matrix guess strategies. After describing the main features of the new code, we compare its efficiency with the current state-of-the-art ab initio strategy to determine crystal field levels and properties, and show that our methodology, as implemented in Ceres, represents a more time-efficient computational strategy for the evaluation of the magnetic properties of lanthanide complexes, also allowing a full representation of non-perturbative spin-orbit coupling effects. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Lattice Boltzmann and Navier-Stokes Cartesian CFD Approaches for Airframe Noise Predictions
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Kocheemoolayil, Joseph G.; Kiris, Cetin C.
2017-01-01
Lattice Boltzmann (LB) and compressible Navier-Stokes (NS) equations based computational fluid dynamics (CFD) approaches are compared for simulating airframe noise. Both LB and NS CFD approaches are implemented within the Launch Ascent and Vehicle Aerodynamics (LAVA) framework. Both schemes utilize the same underlying Cartesian structured mesh paradigm with provision for local adaptive grid refinement and sub-cycling in time. We choose a prototypical massively separated, wake-dominated flow ideally suited for Cartesian-grid based approaches in this study - The partially-dressed, cavity-closed nose landing gear (PDCC-NLG) noise problem from AIAA's Benchmark problems for Airframe Noise Computations (BANC) series of workshops. The relative accuracy and computational efficiency of the two approaches are systematically compared. Detailed comments are made on the potential held by LB to significantly reduce time-to-solution for a desired level of accuracy within the context of modeling airframes noise from first principles.
Analysis of Material Sample Heated by Impinging Hot Hydrogen Jet in a Non-Nuclear Tester
NASA Technical Reports Server (NTRS)
Wang, Ten-See; Foote, John; Litchford, Ron
2006-01-01
A computational conjugate heat transfer methodology was developed and anchored with data obtained from a hot-hydrogen jet heated, non-nuclear materials tester, as a first step towards developing an efficient and accurate multiphysics, thermo-fluid computational methodology to predict environments for hypothetical solid-core, nuclear thermal engine thrust chamber. The computational methodology is based on a multidimensional, finite-volume, turbulent, chemically reacting, thermally radiating, unstructured-grid, and pressure-based formulation. The multiphysics invoked in this study include hydrogen dissociation kinetics and thermodynamics, turbulent flow, convective and thermal radiative, and conjugate heat transfers. Predicted hot hydrogen jet and material surface temperatures were compared with those of measurement. Predicted solid temperatures were compared with those obtained with a standard heat transfer code. The interrogation of physics revealed that reactions of hydrogen dissociation and recombination are highly correlated with local temperature and are necessary for accurate prediction of the hot-hydrogen jet temperature.
Matrix-vector multiplication using digital partitioning for more accurate optical computing
NASA Technical Reports Server (NTRS)
Gary, C. K.
1992-01-01
Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2001-01-01
Two numerical procedures, one based on artificial compressibility method and the other pressure projection method, are outlined for obtaining time-accurate solutions of the incompressible Navier-Stokes equations. The performance of the two method are compared by obtaining unsteady solutions for the evolution of twin vortices behind a at plate. Calculated results are compared with experimental and other numerical results. For an un- steady ow which requires small physical time step, pressure projection method was found to be computationally efficient since it does not require any subiterations procedure. It was observed that the artificial compressibility method requires a fast convergence scheme at each physical time step in order to satisfy incompressibility condition. This was obtained by using a GMRES-ILU(0) solver in our computations. When a line-relaxation scheme was used, the time accuracy was degraded and time-accurate computations became very expensive.
Turbulent heat transfer performance of single stage turbine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amano, R.S.; Song, B.
1999-07-01
To increase the efficiency and the power of modern power plant gas turbines, designers are continually trying to raise the maximum turbine inlet temperature. Here, a numerical study based on the Navier-Stokes equations on a three-dimensional turbulent flow in a single stage turbine stator/rotor passage has been conducted and reported in this paper. The full Reynolds-stress closure model (RSM) was used for the computations and the results were also compared with the computations made by using the Launder-Sharma low-Reynolds-number {kappa}-{epsilon} model. The computational results obtained using these models were compared in order to investigate the turbulence effect in the near-wallmore » region. The set of the governing equations in a generalized curvilinear coordinate system was discretized by using the finite volume method with non-staggered grids. The numerical modeling was performed to interact between the stator and rotor blades.« less
Song, Chao; Zheng, Shi-Biao; Zhang, Pengfei; Xu, Kai; Zhang, Libo; Guo, Qiujiang; Liu, Wuxin; Xu, Da; Deng, Hui; Huang, Keqiang; Zheng, Dongning; Zhu, Xiaobo; Wang, H
2017-10-20
Geometric phase, associated with holonomy transformation in quantum state space, is an important quantum-mechanical effect. Besides fundamental interest, this effect has practical applications, among which geometric quantum computation is a paradigm, where quantum logic operations are realized through geometric phase manipulation that has some intrinsic noise-resilient advantages and may enable simplified implementation of multi-qubit gates compared to the dynamical approach. Here we report observation of a continuous-variable geometric phase and demonstrate a quantum gate protocol based on this phase in a superconducting circuit, where five qubits are controllably coupled to a resonator. Our geometric approach allows for one-step implementation of n-qubit controlled-phase gates, which represents a remarkable advantage compared to gate decomposition methods, where the number of required steps dramatically increases with n. Following this approach, we realize these gates with n up to 4, verifying the high efficiency of this geometric manipulation for quantum computation.
Hybrid soft computing systems for electromyographic signals analysis: a review.
Xie, Hong-Bo; Guo, Tianruo; Bai, Siwei; Dokos, Socrates
2014-02-03
Electromyographic (EMG) is a bio-signal collected on human skeletal muscle. Analysis of EMG signals has been widely used to detect human movement intent, control various human-machine interfaces, diagnose neuromuscular diseases, and model neuromusculoskeletal system. With the advances of artificial intelligence and soft computing, many sophisticated techniques have been proposed for such purpose. Hybrid soft computing system (HSCS), the integration of these different techniques, aims to further improve the effectiveness, efficiency, and accuracy of EMG analysis. This paper reviews and compares key combinations of neural network, support vector machine, fuzzy logic, evolutionary computing, and swarm intelligence for EMG analysis. Our suggestions on the possible future development of HSCS in EMG analysis are also given in terms of basic soft computing techniques, further combination of these techniques, and their other applications in EMG analysis.
Hybrid soft computing systems for electromyographic signals analysis: a review
2014-01-01
Electromyographic (EMG) is a bio-signal collected on human skeletal muscle. Analysis of EMG signals has been widely used to detect human movement intent, control various human-machine interfaces, diagnose neuromuscular diseases, and model neuromusculoskeletal system. With the advances of artificial intelligence and soft computing, many sophisticated techniques have been proposed for such purpose. Hybrid soft computing system (HSCS), the integration of these different techniques, aims to further improve the effectiveness, efficiency, and accuracy of EMG analysis. This paper reviews and compares key combinations of neural network, support vector machine, fuzzy logic, evolutionary computing, and swarm intelligence for EMG analysis. Our suggestions on the possible future development of HSCS in EMG analysis are also given in terms of basic soft computing techniques, further combination of these techniques, and their other applications in EMG analysis. PMID:24490979
NASA Astrophysics Data System (ADS)
Wei, Hui; Gong, Guanghong; Li, Ni
2017-10-01
Computer-generated hologram (CGH) is a promising 3D display technology while it is challenged by heavy computation load and vast memory requirement. To solve these problems, a depth compensating CGH calculation method based on symmetry and similarity of zone plates is proposed and implemented on graphics processing unit (GPU). An improved LUT method is put forward to compute the distances between object points and hologram pixels in the XY direction. The concept of depth compensating factor is defined and used for calculating the holograms of points with different depth positions instead of layer-based methods. The proposed method is suitable for arbitrary sampling objects with lower memory usage and higher computational efficiency compared to other CGH methods. The effectiveness of the proposed method is validated by numerical and optical experiments.
Ship Trim Optimization: Assessment of Influence of Trim on Resistance of MOERI Container Ship
Duan, Wenyang
2014-01-01
Environmental issues and rising fuel prices necessitate better energy efficiency in all sectors. Shipping industry is a stakeholder in environmental issues. Shipping industry is responsible for approximately 3% of global CO2 emissions, 14-15% of global NOX emissions, and 16% of global SOX emissions. Ship trim optimization has gained enormous momentum in recent years being an effective operational measure for better energy efficiency to reduce emissions. Ship trim optimization analysis has traditionally been done through tow-tank testing for a specific hullform. Computational techniques are increasingly popular in ship hydrodynamics applications. The purpose of this study is to present MOERI container ship (KCS) hull trim optimization by employing computational methods. KCS hull total resistances and trim and sinkage computed values, in even keel condition, are compared with experimental values and found in reasonable agreement. The agreement validates that mesh, boundary conditions, and solution techniques are correct. The same mesh, boundary conditions, and solution techniques are used to obtain resistance values in different trim conditions at Fn = 0.2274. Based on attained results, optimum trim is suggested. This research serves as foundation for employing computational techniques for ship trim optimization. PMID:24578649
NASA Astrophysics Data System (ADS)
Beck, Joakim; Dia, Ben Mansour; Espath, Luis F. R.; Long, Quan; Tempone, Raúl
2018-06-01
In calculating expected information gain in optimal Bayesian experimental design, the computation of the inner loop in the classical double-loop Monte Carlo requires a large number of samples and suffers from underflow if the number of samples is small. These drawbacks can be avoided by using an importance sampling approach. We present a computationally efficient method for optimal Bayesian experimental design that introduces importance sampling based on the Laplace method to the inner loop. We derive the optimal values for the method parameters in which the average computational cost is minimized according to the desired error tolerance. We use three numerical examples to demonstrate the computational efficiency of our method compared with the classical double-loop Monte Carlo, and a more recent single-loop Monte Carlo method that uses the Laplace method as an approximation of the return value of the inner loop. The first example is a scalar problem that is linear in the uncertain parameter. The second example is a nonlinear scalar problem. The third example deals with the optimal sensor placement for an electrical impedance tomography experiment to recover the fiber orientation in laminate composites.
Efficient simulation of incompressible viscous flow over multi-element airfoils
NASA Technical Reports Server (NTRS)
Rogers, Stuart E.; Wiltberger, N. Lyn; Kwak, Dochan
1992-01-01
The incompressible, viscous, turbulent flow over single and multi-element airfoils is numerically simulated in an efficient manner by solving the incompressible Navier-Stokes equations. The computer code uses the method of pseudo-compressibility with an upwind-differencing scheme for the convective fluxes and an implicit line-relaxation solution algorithm. The motivation for this work includes interest in studying the high-lift take-off and landing configurations of various aircraft. In particular, accurate computation of lift and drag at various angles of attack, up to stall, is desired. Two different turbulence models are tested in computing the flow over an NACA 4412 airfoil; an accurate prediction of stall is obtained. The approach used for multi-element airfoils involves the use of multiple zones of structured grids fitted to each element. Two different approaches are compared: a patched system of grids, and an overlaid Chimera system of grids. Computational results are presented for two-element, three-element, and four-element airfoil configurations. Excellent agreement with experimental surface pressure coefficients is seen. The code converges in less than 200 iterations, requiring on the order of one minute of CPU time (on a CRAY YMP) per element in the airfoil configuration.
Aerodynamic design optimization using sensitivity analysis and computational fluid dynamics
NASA Technical Reports Server (NTRS)
Baysal, Oktay; Eleshaky, Mohamed E.
1991-01-01
A new and efficient method is presented for aerodynamic design optimization, which is based on a computational fluid dynamics (CFD)-sensitivity analysis algorithm. The method is applied to design a scramjet-afterbody configuration for an optimized axial thrust. The Euler equations are solved for the inviscid analysis of the flow, which in turn provides the objective function and the constraints. The CFD analysis is then coupled with the optimization procedure that uses a constrained minimization method. The sensitivity coefficients, i.e. gradients of the objective function and the constraints, needed for the optimization are obtained using a quasi-analytical method rather than the traditional brute force method of finite difference approximations. During the one-dimensional search of the optimization procedure, an approximate flow analysis (predicted flow) based on a first-order Taylor series expansion is used to reduce the computational cost. Finally, the sensitivity of the optimum objective function to various design parameters, which are kept constant during the optimization, is computed to predict new optimum solutions. The flow analysis of the demonstrative example are compared with the experimental data. It is shown that the method is more efficient than the traditional methods.
Message Passing vs. Shared Address Space on a Cluster of SMPs
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak
2000-01-01
The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Fast algorithm for bilinear transforms in optics
NASA Astrophysics Data System (ADS)
Ostrovsky, Andrey S.; Martinez-Niconoff, Gabriel C.; Ramos Romero, Obdulio; Cortes, Liliana
2000-10-01
The fast algorithm for calculating the bilinear transform in the optical system is proposed. This algorithm is based on the coherent-mode representation of the cross-spectral density function of the illumination. The algorithm is computationally efficient when the illumination is partially coherent. Numerical examples are studied and compared with the theoretical results.
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items
ERIC Educational Resources Information Center
Penfield, Randall D.
2006-01-01
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Financial feasibility of marker-aided selection in Douglas-fir.
G.R. Johnson; N.C. Wheeler; S.H. Strauss
2000-01-01
The land area required for a marker-aided selection (MAS) program to break-even (i.e., have equal costs and benefits) was estimated using computer simulation for coastal Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) in the Pacific Northwestern United States. We compared the selection efficiency obtained when using an index that included the...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-02
... removal of the narrowband comparably efficient interconnection (CEI) and open network architecture (ONA... and copying during normal business hours in the FCC Reference Information Center, Portals II, 445 12th... addition, pursuant to the Small Business Paperwork Relief Act of 2002, we will seek specific comment on how...
Exploring Neutrino Oscillation Parameter Space with a Monte Carlo Algorithm
NASA Astrophysics Data System (ADS)
Espejel, Hugo; Ernst, David; Cogswell, Bernadette; Latimer, David
2015-04-01
The χ2 (or likelihood) function for a global analysis of neutrino oscillation data is first calculated as a function of the neutrino mixing parameters. A computational challenge is to obtain the minima or the allowed regions for the mixing parameters. The conventional approach is to calculate the χ2 (or likelihood) function on a grid for a large number of points, and then marginalize over the likelihood function. As the number of parameters increases with the number of neutrinos, making the calculation numerically efficient becomes necessary. We implement a new Monte Carlo algorithm (D. Foreman-Mackey, D. W. Hogg, D. Lang and J. Goodman, Publications of the Astronomical Society of the Pacific, 125 306 (2013)) to determine its computational efficiency at finding the minima and allowed regions. We examine a realistic example to compare the historical and the new methods.
Cellular automatons applied to gas dynamic problems
NASA Technical Reports Server (NTRS)
Long, Lyle N.; Coopersmith, Robert M.; Mclachlan, B. G.
1987-01-01
This paper compares the results of a relatively new computational fluid dynamics method, cellular automatons, with experimental data and analytical results. This technique has been shown to qualitatively predict fluidlike behavior; however, there have been few published comparisons with experiment or other theories. Comparisons are made for a one-dimensional supersonic piston problem, Stokes first problem, and the flow past a normal flat plate. These comparisons are used to assess the ability of the method to accurately model fluid dynamic behavior and to point out its limitations. Reasonable results were obtained for all three test cases, but the fundamental limitations of cellular automatons are numerous. It may be misleading, at this time, to say that cellular automatons are a computationally efficient technique. Other methods, based on continuum or kinetic theory, would also be very efficient if as little of the physics were included.
Evaluation of a Multigrid Scheme for the Incompressible Navier-Stokes Equations
NASA Technical Reports Server (NTRS)
Swanson, R. C.
2004-01-01
A fast multigrid solver for the steady, incompressible Navier-Stokes equations is presented. The multigrid solver is based upon a factorizable discrete scheme for the velocity-pressure form of the Navier-Stokes equations. This scheme correctly distinguishes between the advection-diffusion and elliptic parts of the operator, allowing efficient smoothers to be constructed. To evaluate the multigrid algorithm, solutions are computed for flow over a flat plate, parabola, and a Karman-Trefftz airfoil. Both nonlifting and lifting airfoil flows are considered, with a Reynolds number range of 200 to 800. Convergence and accuracy of the algorithm are discussed. Using Gauss-Seidel line relaxation in alternating directions, multigrid convergence behavior approaching that of O(N) methods is achieved. The computational efficiency of the numerical scheme is compared with that of Runge-Kutta and implicit upwind based multigrid methods.
Evaluation of Preduster in Cement Industry Based on Computational Fluid Dynamic
NASA Astrophysics Data System (ADS)
Septiani, E. L.; Widiyastuti, W.; Djafaar, A.; Ghozali, I.; Pribadi, H. M.
2017-10-01
Ash-laden hot air from clinker in cement industry is being used to reduce water contain in coal, however it may contain large amount of ash even though it was treated by a preduster. This study investigated preduster performance as a cyclone separator in the cement industry by Computational Fluid Dynamic method. In general, the best performance of cyclone is it have relatively high efficiency with the low pressure drop. The most accurate and simple turbulence model, Reynold Average Navier Stokes (RANS), standard k-ε, and combination with Lagrangian model as particles tracking model were used to solve the problem. The measurement in simulation result are flow pattern in the cyclone, pressure outlet and collection efficiency of preduster. The applied model well predicted by comparing with the most accurate empirical model and pressure outlet in experimental measurement.
NASA Astrophysics Data System (ADS)
Kang, Zhizhong
2013-10-01
This paper presents a new approach to automatic registration of terrestrial laser scanning (TLS) point clouds utilizing a novel robust estimation method by an efficient BaySAC (BAYes SAmpling Consensus). The proposed method directly generates reflectance images from 3D point clouds, and then using SIFT algorithm extracts keypoints to identify corresponding image points. The 3D corresponding points, from which transformation parameters between point clouds are computed, are acquired by mapping the 2D ones onto the point cloud. To remove false accepted correspondences, we implement a conditional sampling method to select the n data points with the highest inlier probabilities as a hypothesis set and update the inlier probabilities of each data point using simplified Bayes' rule for the purpose of improving the computation efficiency. The prior probability is estimated by the verification of the distance invariance between correspondences. The proposed approach is tested on four data sets acquired by three different scanners. The results show that, comparing with the performance of RANSAC, BaySAC leads to less iterations and cheaper computation cost when the hypothesis set is contaminated with more outliers. The registration results also indicate that, the proposed algorithm can achieve high registration accuracy on all experimental datasets.
Ren, Qiang; Nagar, Jogender; Kang, Lei; Bian, Yusheng; Werner, Ping; Werner, Douglas H
2017-05-18
A highly efficient numerical approach for simulating the wideband optical response of nano-architectures comprised of Drude-Critical Points (DCP) media (e.g., gold and silver) is proposed and validated through comparing with commercial computational software. The kernel of this algorithm is the subdomain level discontinuous Galerkin time domain (DGTD) method, which can be viewed as a hybrid of the spectral-element time-domain method (SETD) and the finite-element time-domain (FETD) method. An hp-refinement technique is applied to decrease the Degrees-of-Freedom (DoFs) and computational requirements. The collocated E-J scheme facilitates solving the auxiliary equations by converting the inversions of matrices to simpler vector manipulations. A new hybrid time stepping approach, which couples the Runge-Kutta and Newmark methods, is proposed to solve the temporal auxiliary differential equations (ADEs) with a high degree of efficiency. The advantages of this new approach, in terms of computational resource overhead and accuracy, are validated through comparison with well-known commercial software for three diverse cases, which cover both near-field and far-field properties with plane wave and lumped port sources. The presented work provides the missing link between DCP dispersive models and FETD and/or SETD based algorithms. It is a competitive candidate for numerically studying the wideband plasmonic properties of DCP media.
Experimental and Computational Analysis of Unidirectional Flow Through Stirling Engine Heater Head
NASA Technical Reports Server (NTRS)
Wilson, Scott D.; Dyson, Rodger W.; Tew, Roy C.; Demko, Rikako
2006-01-01
A high efficiency Stirling Radioisotope Generator (SRG) is being developed for possible use in long-duration space science missions. NASA s advanced technology goals for next generation Stirling convertors include increasing the Carnot efficiency and percent of Carnot efficiency. To help achieve these goals, a multi-dimensional Computational Fluid Dynamics (CFD) code is being developed to numerically model unsteady fluid flow and heat transfer phenomena of the oscillating working gas inside Stirling convertors. In the absence of transient pressure drop data for the zero mean oscillating multi-dimensional flows present in the Technology Demonstration Convertors on test at NASA Glenn Research Center, unidirectional flow pressure drop test data is used to compare against 2D and 3D computational solutions. This study focuses on tracking pressure drop and mass flow rate data for unidirectional flow though a Stirling heater head using a commercial CFD code (CFD-ACE). The commercial CFD code uses a porous-media model which is dependent on permeability and the inertial coefficient present in the linear and nonlinear terms of the Darcy-Forchheimer equation. Permeability and inertial coefficient were calculated from unidirectional flow test data. CFD simulations of the unidirectional flow test were validated using the porous-media model input parameters which increased simulation accuracy by 14 percent on average.
Increasing the computational efficient of digital cross correlation by a vectorization method
NASA Astrophysics Data System (ADS)
Chang, Ching-Yuan; Ma, Chien-Ching
2017-08-01
This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.
NASA Astrophysics Data System (ADS)
Dash, Rajashree
2017-11-01
Forecasting purchasing power of one currency with respect to another currency is always an interesting topic in the field of financial time series prediction. Despite the existence of several traditional and computational models for currency exchange rate forecasting, there is always a need for developing simpler and more efficient model, which will produce better prediction capability. In this paper, an evolutionary framework is proposed by using an improved shuffled frog leaping (ISFL) algorithm with a computationally efficient functional link artificial neural network (CEFLANN) for prediction of currency exchange rate. The model is validated by observing the monthly prediction measures obtained for three currency exchange data sets such as USD/CAD, USD/CHF, and USD/JPY accumulated within same period of time. The model performance is also compared with two other evolutionary learning techniques such as Shuffled frog leaping algorithm and Particle Swarm optimization algorithm. Practical analysis of results suggest that, the proposed model developed using the ISFL algorithm with CEFLANN network is a promising predictor model for currency exchange rate prediction compared to other models included in the study.
Liu, Ping; Li, Guodong; Liu, Xinggao; Xiao, Long; Wang, Yalin; Yang, Chunhua; Gui, Weihua
2018-02-01
High quality control method is essential for the implementation of aircraft autopilot system. An optimal control problem model considering the safe aerodynamic envelop is therefore established to improve the control quality of aircraft flight level tracking. A novel non-uniform control vector parameterization (CVP) method with time grid refinement is then proposed for solving the optimal control problem. By introducing the Hilbert-Huang transform (HHT) analysis, an efficient time grid refinement approach is presented and an adaptive time grid is automatically obtained. With this refinement, the proposed method needs fewer optimization parameters to achieve better control quality when compared with uniform refinement CVP method, whereas the computational cost is lower. Two well-known flight level altitude tracking problems and one minimum time cost problem are tested as illustrations and the uniform refinement control vector parameterization method is adopted as the comparative base. Numerical results show that the proposed method achieves better performances in terms of optimization accuracy and computation cost; meanwhile, the control quality is efficiently improved. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Exploiting the chaotic behaviour of atmospheric models with reconfigurable architectures
NASA Astrophysics Data System (ADS)
Russell, Francis P.; Düben, Peter D.; Niu, Xinyu; Luk, Wayne; Palmer, T. N.
2017-12-01
Reconfigurable architectures are becoming mainstream: Amazon, Microsoft and IBM are supporting such architectures in their data centres. The computationally intensive nature of atmospheric modelling is an attractive target for hardware acceleration using reconfigurable computing. Performance of hardware designs can be improved through the use of reduced-precision arithmetic, but maintaining appropriate accuracy is essential. We explore reduced-precision optimisation for simulating chaotic systems, targeting atmospheric modelling, in which even minor changes in arithmetic behaviour will cause simulations to diverge quickly. The possibility of equally valid simulations having differing outcomes means that standard techniques for comparing numerical accuracy are inappropriate. We use the Hellinger distance to compare statistical behaviour between reduced-precision CPU implementations to guide reconfigurable designs of a chaotic system, then analyse accuracy, performance and power efficiency of the resulting implementations. Our results show that with only a limited loss in accuracy corresponding to less than 10% uncertainty in input parameters, the throughput and energy efficiency of a single-precision chaotic system implemented on a Xilinx Virtex-6 SX475T Field Programmable Gate Array (FPGA) can be more than doubled.
Memristive Mixed-Signal Neuromorphic Systems: Energy-Efficient Learning at the Circuit-Level
Chakma, Gangotree; Adnan, Md Musabbir; Wyer, Austin R.; ...
2017-11-23
Neuromorphic computing is non-von Neumann computer architecture for the post Moore’s law era of computing. Since a main focus of the post Moore’s law era is energy-efficient computing with fewer resources and less area, neuromorphic computing contributes effectively in this research. Here in this paper, we present a memristive neuromorphic system for improved power and area efficiency. Our particular mixed-signal approach implements neural networks with spiking events in a synchronous way. Moreover, the use of nano-scale memristive devices saves both area and power in the system. We also provide device-level considerations that make the system more energy-efficient. The proposed systemmore » additionally includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.« less
Memristive Mixed-Signal Neuromorphic Systems: Energy-Efficient Learning at the Circuit-Level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chakma, Gangotree; Adnan, Md Musabbir; Wyer, Austin R.
Neuromorphic computing is non-von Neumann computer architecture for the post Moore’s law era of computing. Since a main focus of the post Moore’s law era is energy-efficient computing with fewer resources and less area, neuromorphic computing contributes effectively in this research. Here in this paper, we present a memristive neuromorphic system for improved power and area efficiency. Our particular mixed-signal approach implements neural networks with spiking events in a synchronous way. Moreover, the use of nano-scale memristive devices saves both area and power in the system. We also provide device-level considerations that make the system more energy-efficient. The proposed systemmore » additionally includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.« less
The comparative hydrodynamics of rapid rotation by predatory appendages.
McHenry, M J; Anderson, P S L; Van Wassenbergh, S; Matthews, D G; Summers, A P; Patek, S N
2016-11-01
Countless aquatic animals rotate appendages through the water, yet fluid forces are typically modeled with translational motion. To elucidate the hydrodynamics of rotation, we analyzed the raptorial appendages of mantis shrimp (Stomatopoda) using a combination of flume experiments, mathematical modeling and phylogenetic comparative analyses. We found that computationally efficient blade-element models offered an accurate first-order approximation of drag, when compared with a more elaborate computational fluid-dynamic model. Taking advantage of this efficiency, we compared the hydrodynamics of the raptorial appendage in different species, including a newly measured spearing species, Coronis scolopendra The ultrafast appendages of a smasher species (Odontodactylus scyllarus) were an order of magnitude smaller, yet experienced values of drag-induced torque similar to those of a spearing species (Lysiosquillina maculata). The dactyl, a stabbing segment that can be opened at the distal end of the appendage, generated substantial additional drag in the smasher, but not in the spearer, which uses the segment to capture evasive prey. Phylogenetic comparative analyses revealed that larger mantis shrimp species strike more slowly, regardless of whether they smash or spear their prey. In summary, drag was minimally affected by shape, whereas size, speed and dactyl orientation dominated and differentiated the hydrodynamic forces across species and sizes. This study demonstrates the utility of simple mathematical modeling for comparative analyses and illustrates the multi-faceted consequences of drag during the evolutionary diversification of rotating appendages. © 2016. Published by The Company of Biologists Ltd.
NASA Astrophysics Data System (ADS)
Nardi, Albert; Idiart, Andrés; Trinchero, Paolo; de Vries, Luis Manuel; Molinero, Jorge
2014-08-01
This paper presents the development, verification and application of an efficient interface, denoted as iCP, which couples two standalone simulation programs: the general purpose Finite Element framework COMSOL Multiphysics® and the geochemical simulator PHREEQC. The main goal of the interface is to maximize the synergies between the aforementioned codes, providing a numerical platform that can efficiently simulate a wide number of multiphysics problems coupled with geochemistry. iCP is written in Java and uses the IPhreeqc C++ dynamic library and the COMSOL Java-API. Given the large computational requirements of the aforementioned coupled models, special emphasis has been placed on numerical robustness and efficiency. To this end, the geochemical reactions are solved in parallel by balancing the computational load over multiple threads. First, a benchmark exercise is used to test the reliability of iCP regarding flow and reactive transport. Then, a large scale thermo-hydro-chemical (THC) problem is solved to show the code capabilities. The results of the verification exercise are successfully compared with those obtained using PHREEQC and the application case demonstrates the scalability of a large scale model, at least up to 32 threads.
Extracting Communities from Complex Networks by the k-Dense Method
NASA Astrophysics Data System (ADS)
Saito, Kazumi; Yamada, Takeshi; Kazama, Kazuhiro
To understand the structural and functional properties of large-scale complex networks, it is crucial to efficiently extract a set of cohesive subnetworks as communities. There have been proposed several such community extraction methods in the literature, including the classical k-core decomposition method and, more recently, the k-clique based community extraction method. The k-core method, although computationally efficient, is often not powerful enough for uncovering a detailed community structure and it produces only coarse-grained and loosely connected communities. The k-clique method, on the other hand, can extract fine-grained and tightly connected communities but requires a substantial amount of computational load for large-scale complex networks. In this paper, we present a new notion of a subnetwork called k-dense, and propose an efficient algorithm for extracting k-dense communities. We applied our method to the three different types of networks assembled from real data, namely, from blog trackbacks, word associations and Wikipedia references, and demonstrated that the k-dense method could extract communities almost as efficiently as the k-core method, while the qualities of the extracted communities are comparable to those obtained by the k-clique method.
Efficient iterative method for solving the Dirac-Kohn-Sham density functional theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Lin; Shao, Sihong; E, Weinan
2012-11-06
We present for the first time an efficient iterative method to directly solve the four-component Dirac-Kohn-Sham (DKS) density functional theory. Due to the existence of the negative energy continuum in the DKS operator, the existing iterative techniques for solving the Kohn-Sham systems cannot be efficiently applied to solve the DKS systems. The key component of our method is a novel filtering step (F) which acts as a preconditioner in the framework of the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The resulting method, dubbed the LOBPCG-F method, is able to compute the desired eigenvalues and eigenvectors in the positive energy band without computing any state in the negative energy band. The LOBPCG-F method introduces mild extra cost compared to the standard LOBPCG method and can be easily implemented. We demonstrate our method in the pseudopotential framework with a planewave basis set which naturally satisfies the kinetic balance prescription. Numerical results for Ptmore » $$_{2}$$, Au$$_{2}$$, TlF, and Bi$$_{2}$$Se$$_{3}$$ indicate that the LOBPCG-F method is a robust and efficient method for investigating the relativistic effect in systems containing heavy elements.« less
Towards reversible basic linear algebra subprograms: A performance study
Perumalla, Kalyan S.; Yoginath, Srikanth B.
2014-12-06
Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
Neic, Aurel; Campos, Fernando O; Prassl, Anton J; Niederer, Steven A; Bishop, Martin J; Vigmond, Edward J; Plank, Gernot
2017-10-01
Anatomically accurate and biophysically detailed bidomain models of the human heart have proven a powerful tool for gaining quantitative insight into the links between electrical sources in the myocardium and the concomitant current flow in the surrounding medium as they represent their relationship mechanistically based on first principles. Such models are increasingly considered as a clinical research tool with the perspective of being used, ultimately, as a complementary diagnostic modality. An important prerequisite in many clinical modeling applications is the ability of models to faithfully replicate potential maps and electrograms recorded from a given patient. However, while the personalization of electrophysiology models based on the gold standard bidomain formulation is in principle feasible, the associated computational expenses are significant, rendering their use incompatible with clinical time frames. In this study we report on the development of a novel computationally efficient reaction-eikonal (R-E) model for modeling extracellular potential maps and electrograms. Using a biventricular human electrophysiology model, which incorporates a topologically realistic His-Purkinje system (HPS), we demonstrate by comparing against a high-resolution reaction-diffusion (R-D) bidomain model that the R-E model predicts extracellular potential fields, electrograms as well as ECGs at the body surface with high fidelity and offers vast computational savings greater than three orders of magnitude. Due to their efficiency R-E models are ideally suitable for forward simulations in clinical modeling studies which attempt to personalize electrophysiological model features.
NASA Astrophysics Data System (ADS)
Neic, Aurel; Campos, Fernando O.; Prassl, Anton J.; Niederer, Steven A.; Bishop, Martin J.; Vigmond, Edward J.; Plank, Gernot
2017-10-01
Anatomically accurate and biophysically detailed bidomain models of the human heart have proven a powerful tool for gaining quantitative insight into the links between electrical sources in the myocardium and the concomitant current flow in the surrounding medium as they represent their relationship mechanistically based on first principles. Such models are increasingly considered as a clinical research tool with the perspective of being used, ultimately, as a complementary diagnostic modality. An important prerequisite in many clinical modeling applications is the ability of models to faithfully replicate potential maps and electrograms recorded from a given patient. However, while the personalization of electrophysiology models based on the gold standard bidomain formulation is in principle feasible, the associated computational expenses are significant, rendering their use incompatible with clinical time frames. In this study we report on the development of a novel computationally efficient reaction-eikonal (R-E) model for modeling extracellular potential maps and electrograms. Using a biventricular human electrophysiology model, which incorporates a topologically realistic His-Purkinje system (HPS), we demonstrate by comparing against a high-resolution reaction-diffusion (R-D) bidomain model that the R-E model predicts extracellular potential fields, electrograms as well as ECGs at the body surface with high fidelity and offers vast computational savings greater than three orders of magnitude. Due to their efficiency R-E models are ideally suitable for forward simulations in clinical modeling studies which attempt to personalize electrophysiological model features.
NASA Technical Reports Server (NTRS)
Navon, I. M.
1984-01-01
A Lagrange multiplier method using techniques developed by Bertsekas (1982) was applied to solving the problem of enforcing simultaneous conservation of the nonlinear integral invariants of the shallow water equations on a limited area domain. This application of nonlinear constrained optimization is of the large dimensional type and the conjugate gradient method was found to be the only computationally viable method for the unconstrained minimization. Several conjugate-gradient codes were tested and compared for increasing accuracy requirements. Robustness and computational efficiency were the principal criteria.
Discrete square root filtering - A survey of current techniques.
NASA Technical Reports Server (NTRS)
Kaminskii, P. G.; Bryson, A. E., Jr.; Schmidt, S. F.
1971-01-01
Current techniques in square root filtering are surveyed and related by applying a duality association. Four efficient square root implementations are suggested, and compared with three common conventional implementations in terms of computational complexity and precision. It is shown that the square root computational burden should not exceed the conventional by more than 50% in most practical problems. An examination of numerical conditioning predicts that the square root approach can yield twice the effective precision of the conventional filter in ill-conditioned problems. This prediction is verified in two examples.
An efficient quantum scheme for Private Set Intersection
NASA Astrophysics Data System (ADS)
Shi, Run-hua; Mu, Yi; Zhong, Hong; Cui, Jie; Zhang, Shun
2016-01-01
Private Set Intersection allows a client to privately compute set intersection with the collaboration of the server, which is one of the most fundamental and key problems within the multiparty collaborative computation of protecting the privacy of the parties. In this paper, we first present a cheat-sensitive quantum scheme for Private Set Intersection. Compared with classical schemes, our scheme has lower communication complexity, which is independent of the size of the server's set. Therefore, it is very suitable for big data services in Cloud or large-scale client-server networks.
Data compression strategies for ptychographic diffraction imaging
NASA Astrophysics Data System (ADS)
Loetgering, Lars; Rose, Max; Treffer, David; Vartanyants, Ivan A.; Rosenhahn, Axel; Wilhein, Thomas
2017-12-01
Ptychography is a computational imaging method for solving inverse scattering problems. To date, the high amount of redundancy present in ptychographic data sets requires computer memory that is orders of magnitude larger than the retrieved information. Here, we propose and compare data compression strategies that significantly reduce the amount of data required for wavefield inversion. Information metrics are used to measure the amount of data redundancy present in ptychographic data. Experimental results demonstrate the technique to be memory efficient and stable in the presence of systematic errors such as partial coherence and noise.
Assaraf, Roland
2014-12-01
We show that the recently proposed correlated sampling without reweighting procedure extends the locality (asymptotic independence of the system size) of a physical property to the statistical fluctuations of its estimator. This makes the approach potentially vastly more efficient for computing space-localized properties in large systems compared with standard correlated methods. A proof is given for a large collection of noninteracting fragments. Calculations on hydrogen chains suggest that this behavior holds not only for systems displaying short-range correlations, but also for systems with long-range correlations.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schaerer, Roman Pascal, E-mail: schaerer@mathcces.rwth-aachen.de; Bansal, Pratyuksh; Torrilhon, Manuel
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropymore » distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.« less
Karamintziou, Sofia D.; Custódio, Ana Luísa; Piallat, Brigitte; Polosan, Mircea; Chabardès, Stéphan; Stathis, Pantelis G.; Tagaris, George A.; Sakas, Damianos E.; Polychronaki, Georgia E.; Tsirogiannis, George L.; David, Olivier; Nikita, Konstantina S.
2017-01-01
Advances in the field of closed-loop neuromodulation call for analysis and modeling approaches capable of confronting challenges related to the complex neuronal response to stimulation and the presence of strong internal and measurement noise in neural recordings. Here we elaborate on the algorithmic aspects of a noise-resistant closed-loop subthalamic nucleus deep brain stimulation system for advanced Parkinson’s disease and treatment-refractory obsessive-compulsive disorder, ensuring remarkable performance in terms of both efficiency and selectivity of stimulation, as well as in terms of computational speed. First, we propose an efficient method drawn from dynamical systems theory, for the reliable assessment of significant nonlinear coupling between beta and high-frequency subthalamic neuronal activity, as a biomarker for feedback control. Further, we present a model-based strategy through which optimal parameters of stimulation for minimum energy desynchronizing control of neuronal activity are being identified. The strategy integrates stochastic modeling and derivative-free optimization of neural dynamics based on quadratic modeling. On the basis of numerical simulations, we demonstrate the potential of the presented modeling approach to identify, at a relatively low computational cost, stimulation settings potentially associated with a significantly higher degree of efficiency and selectivity compared with stimulation settings determined post-operatively. Our data reinforce the hypothesis that model-based control strategies are crucial for the design of novel stimulation protocols at the backstage of clinical applications. PMID:28222198
Merritt, M.L.
1993-01-01
The simulation of the transport of injected freshwater in a thin brackish aquifer, overlain and underlain by confining layers containing more saline water, is shown to be influenced by the choice of the finite-difference approximation method, the algorithm for representing vertical advective and dispersive fluxes, and the values assigned to parametric coefficients that specify the degree of vertical dispersion and molecular diffusion that occurs. Computed potable water recovery efficiencies will differ depending upon the choice of algorithm and approximation method, as will dispersion coefficients estimated based on the calibration of simulations to match measured data. A comparison of centered and backward finite-difference approximation methods shows that substantially different transition zones between injected and native waters are depicted by the different methods, and computed recovery efficiencies vary greatly. Standard and experimental algorithms and a variety of values for molecular diffusivity, transverse dispersivity, and vertical scaling factor were compared in simulations of freshwater storage in a thin brackish aquifer. Computed recovery efficiencies vary considerably, and appreciable differences are observed in the distribution of injected freshwater in the various cases tested. The results demonstrate both a qualitatively different description of transport using the experimental algorithms and the interrelated influences of molecular diffusion and transverse dispersion on simulated recovery efficiency. When simulating natural aquifer flow in cross-section, flushing of the aquifer occurred for all tested coefficient choices using both standard and experimental algorithms. ?? 1993.
Proteinortho: Detection of (Co-)orthologs in large-scale analysis
2011-01-01
Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware. PMID:21526987
Joda, Tim; Brägger, Urs
2015-01-01
To compare time-efficiency in the production of implant crowns using a digital workflow versus the conventional pathway. This prospective clinical study used a crossover design that included 20 study participants receiving single-tooth replacements in posterior sites. Each patient received a customized titanium abutment plus a computer-aided design/computer-assisted manufacture (CAD/CAM) zirconia suprastructure (for those in the test group, using digital workflow) and a standardized titanium abutment plus a porcelain-fused-to-metal crown (for those in the control group, using a conventional pathway). The start of the implant prosthetic treatment was established as the baseline. Time-efficiency analysis was defined as the primary outcome, and was measured for every single clinical and laboratory work step in minutes. Statistical analysis was calculated with the Wilcoxon rank sum test. All crowns could be provided within two clinical appointments, independent of the manufacturing process. The mean total production time, as the sum of clinical plus laboratory work steps, was significantly different. The mean ± standard deviation (SD) time was 185.4 ± 17.9 minutes for the digital workflow process and 223.0 ± 26.2 minutes for the conventional pathway (P = .0001). Therefore, digital processing for overall treatment was 16% faster. Detailed analysis for the clinical treatment revealed a significantly reduced mean ± SD chair time of 27.3 ± 3.4 minutes for the test group compared with 33.2 ± 4.9 minutes for the control group (P = .0001). Similar results were found for the mean laboratory work time, with a significant decrease of 158.1 ± 17.2 minutes for the test group vs 189.8 ± 25.3 minutes for the control group (P = .0001). Only a few studies have investigated efficiency parameters of digital workflows compared with conventional pathways in implant dental medicine. This investigation shows that the digital workflow seems to be more time-efficient than the established conventional production pathway for fixed implant-supported crowns. Both clinical chair time and laboratory manufacturing steps could be effectively shortened with the digital process of intraoral scanning plus CAD/CAM technology.
NASA Astrophysics Data System (ADS)
Shi, Yu; Liang, Long; Ge, Hai-Wen; Reitz, Rolf D.
2010-03-01
Acceleration of the chemistry solver for engine combustion is of much interest due to the fact that in practical engine simulations extensive computational time is spent solving the fuel oxidation and emission formation chemistry. A dynamic adaptive chemistry (DAC) scheme based on a directed relation graph error propagation (DRGEP) method has been applied to study homogeneous charge compression ignition (HCCI) engine combustion with detailed chemistry (over 500 species) previously using an R-value-based breadth-first search (RBFS) algorithm, which significantly reduced computational times (by as much as 30-fold). The present paper extends the use of this on-the-fly kinetic mechanism reduction scheme to model combustion in direct-injection (DI) engines. It was found that the DAC scheme becomes less efficient when applied to DI engine simulations using a kinetic mechanism of relatively small size and the accuracy of the original DAC scheme decreases for conventional non-premixed combustion engine. The present study also focuses on determination of search-initiating species, involvement of the NOx chemistry, selection of a proper error tolerance, as well as treatment of the interaction of chemical heat release and the fuel spray. Both the DAC schemes were integrated into the ERC KIVA-3v2 code, and simulations were conducted to compare the two schemes. In general, the present DAC scheme has better efficiency and similar accuracy compared to the previous DAC scheme. The efficiency depends on the size of the chemical kinetics mechanism used and the engine operating conditions. For cases using a small n-heptane kinetic mechanism of 34 species, 30% of the computational time is saved, and 50% for a larger n-heptane kinetic mechanism of 61 species. The paper also demonstrates that by combining the present DAC scheme with an adaptive multi-grid chemistry (AMC) solver, it is feasible to simulate a direct-injection engine using a detailed n-heptane mechanism with 543 species with practical computer time.
An adaptive multi-level simulation algorithm for stochastic biological systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lester, C., E-mail: lesterc@maths.ox.ac.uk; Giles, M. B.; Baker, R. E.
2015-01-14
Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms (SSA) to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method [Anderson and Higham, “Multi-level Montemore » Carlo for continuous time Markov chains, with applications in biochemical kinetics,” SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)] tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths, the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of τ. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel adaptive time-stepping approach where τ is chosen according to the stochastic behaviour of each sample path, we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.« less
CoCoNUT: an efficient system for the comparison and analysis of genomes
2008-01-01
Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
Implementation of tetrahedral-mesh geometry in Monte Carlo radiation transport code PHITS
NASA Astrophysics Data System (ADS)
Furuta, Takuya; Sato, Tatsuhiko; Han, Min Cheol; Yeom, Yeon Soo; Kim, Chan Hyeong; Brown, Justin L.; Bolch, Wesley E.
2017-06-01
A new function to treat tetrahedral-mesh geometry was implemented in the particle and heavy ion transport code systems. To accelerate the computational speed in the transport process, an original algorithm was introduced to initially prepare decomposition maps for the container box of the tetrahedral-mesh geometry. The computational performance was tested by conducting radiation transport simulations of 100 MeV protons and 1 MeV photons in a water phantom represented by tetrahedral mesh. The simulation was repeated with varying number of meshes and the required computational times were then compared with those of the conventional voxel representation. Our results show that the computational costs for each boundary crossing of the region mesh are essentially equivalent for both representations. This study suggests that the tetrahedral-mesh representation offers not only a flexible description of the transport geometry but also improvement of computational efficiency for the radiation transport. Due to the adaptability of tetrahedrons in both size and shape, dosimetrically equivalent objects can be represented by tetrahedrons with a much fewer number of meshes as compared its voxelized representation. Our study additionally included dosimetric calculations using a computational human phantom. A significant acceleration of the computational speed, about 4 times, was confirmed by the adoption of a tetrahedral mesh over the traditional voxel mesh geometry.
Implementation of tetrahedral-mesh geometry in Monte Carlo radiation transport code PHITS.
Furuta, Takuya; Sato, Tatsuhiko; Han, Min Cheol; Yeom, Yeon Soo; Kim, Chan Hyeong; Brown, Justin L; Bolch, Wesley E
2017-06-21
A new function to treat tetrahedral-mesh geometry was implemented in the particle and heavy ion transport code systems. To accelerate the computational speed in the transport process, an original algorithm was introduced to initially prepare decomposition maps for the container box of the tetrahedral-mesh geometry. The computational performance was tested by conducting radiation transport simulations of 100 MeV protons and 1 MeV photons in a water phantom represented by tetrahedral mesh. The simulation was repeated with varying number of meshes and the required computational times were then compared with those of the conventional voxel representation. Our results show that the computational costs for each boundary crossing of the region mesh are essentially equivalent for both representations. This study suggests that the tetrahedral-mesh representation offers not only a flexible description of the transport geometry but also improvement of computational efficiency for the radiation transport. Due to the adaptability of tetrahedrons in both size and shape, dosimetrically equivalent objects can be represented by tetrahedrons with a much fewer number of meshes as compared its voxelized representation. Our study additionally included dosimetric calculations using a computational human phantom. A significant acceleration of the computational speed, about 4 times, was confirmed by the adoption of a tetrahedral mesh over the traditional voxel mesh geometry.
Kato expansion in quantum canonical perturbation theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nikolaev, Andrey, E-mail: Andrey.Nikolaev@rdtex.ru
2016-06-15
This work establishes a connection between canonical perturbation series in quantum mechanics and a Kato expansion for the resolvent of the Liouville superoperator. Our approach leads to an explicit expression for a generator of a block-diagonalizing Dyson’s ordered exponential in arbitrary perturbation order. Unitary intertwining of perturbed and unperturbed averaging superprojectors allows for a description of ambiguities in the generator and block-diagonalized Hamiltonian. We compare the efficiency of the corresponding computational algorithm with the efficiencies of the Van Vleck and Magnus methods for high perturbative orders.
Valente, Marco A G; Teixeira, Deiver A; Azevedo, David L; Feliciano, Gustavo T; Benedetti, Assis V; Fugivara, Cecílio S
2017-01-01
The interaction of volatile corrosion inhibitors (VCI), caprylate salt derivatives from amines, with zinc metallic surfaces is assessed by density functional theory (DFT) computer simulations, electrochemical impedance (EIS) measurements and humid chamber tests. The results obtained by the different methods were compared, and linear correlations were obtained between theoretical and experimental data. The correlations between experimental and theoretical results showed that the molecular size is the determining factor in the inhibition efficiency. The models used and experimental results indicated that dicyclohexylamine caprylate is the most efficient inhibitor.
Geospatial Representation, Analysis and Computing Using Bandlimited Functions
2010-02-19
navigation of aircraft and missiles require detailed representations of gravity and efficient methods for determining orbits and trajectories. However, many...efficient on today’s computers. Under this grant new, computationally efficient, localized representations of gravity have been developed and tested. As a...step in developing a new approach to estimating gravitational potentials, a multiresolution representation for gravity estimation has been proposed
Efficient Computational Prototyping of Mixed Technology Microfluidic Components and Systems
2002-08-01
AFRL-IF-RS-TR-2002-190 Final Technical Report August 2002 EFFICIENT COMPUTATIONAL PROTOTYPING OF MIXED TECHNOLOGY MICROFLUIDIC...SUBTITLE EFFICIENT COMPUTATIONAL PROTOTYPING OF MIXED TECHNOLOGY MICROFLUIDIC COMPONENTS AND SYSTEMS 6. AUTHOR(S) Narayan R. Aluru, Jacob White...Aided Design (CAD) tools for microfluidic components and systems were developed in this effort. Innovative numerical methods and algorithms for mixed
Anukam, Anthony; Mamphweli, Sampson; Okoh, Omobola; Reddy, Prashant
2017-01-01
Sugarcane bagasse was torrefied to improve its quality in terms of properties prior to gasification. Torrefaction was undertaken at 300 °C in an inert atmosphere of N2 at 10 °C·min−1 heating rate. A residence time of 5 min allowed for rapid reaction of the material during torrefaction. Torrefied and untorrefied bagasse were characterized to compare their suitability as feedstocks for gasification. The results showed that torrefied bagasse had lower O–C and H–C atomic ratios of about 0.5 and 0.84 as compared to that of untorrefied bagasse with 0.82 and 1.55, respectively. A calorific value of about 20.29 MJ·kg−1 was also measured for torrefied bagasse, which is around 13% higher than that for untorrefied bagasse with a value of ca. 17.9 MJ·kg−1. This confirms the former as a much more suitable feedstock for gasification than the latter since efficiency of gasification is a function of feedstock calorific value. SEM results also revealed a fibrous structure and pith in the micrographs of both torrefied and untorrefied bagasse, indicating the carbonaceous nature of both materials, with torrefied bagasse exhibiting a more permeable structure with larger surface area, which are among the features that favour gasification. The gasification process of torrefied bagasse relied on computer simulation to establish the impact of torrefaction on gasification efficiency. Optimum efficiency was achieved with torrefied bagasse because of its slightly modified properties. Conversion efficiency of the gasification process of torrefied bagasse increased from 50% to approximately 60% after computer simulation, whereas that of untorrefied bagasse remained constant at 50%, even as the gasification time increased. PMID:28952501
Efficient visibility-driven medical image visualisation via adaptive binned visibility histogram.
Jung, Younhyun; Kim, Jinman; Kumar, Ashnil; Feng, David Dagan; Fulham, Michael
2016-07-01
'Visibility' is a fundamental optical property that represents the observable, by users, proportion of the voxels in a volume during interactive volume rendering. The manipulation of this 'visibility' improves the volume rendering processes; for instance by ensuring the visibility of regions of interest (ROIs) or by guiding the identification of an optimal rendering view-point. The construction of visibility histograms (VHs), which represent the distribution of all the visibility of all voxels in the rendered volume, enables users to explore the volume with real-time feedback about occlusion patterns among spatially related structures during volume rendering manipulations. Volume rendered medical images have been a primary beneficiary of VH given the need to ensure that specific ROIs are visible relative to the surrounding structures, e.g. the visualisation of tumours that may otherwise be occluded by neighbouring structures. VH construction and its subsequent manipulations, however, are computationally expensive due to the histogram binning of the visibilities. This limits the real-time application of VH to medical images that have large intensity ranges and volume dimensions and require a large number of histogram bins. In this study, we introduce an efficient adaptive binned visibility histogram (AB-VH) in which a smaller number of histogram bins are used to represent the visibility distribution of the full VH. We adaptively bin medical images by using a cluster analysis algorithm that groups the voxels according to their intensity similarities into a smaller subset of bins while preserving the distribution of the intensity range of the original images. We increase efficiency by exploiting the parallel computation and multiple render targets (MRT) extension of the modern graphical processing units (GPUs) and this enables efficient computation of the histogram. We show the application of our method to single-modality computed tomography (CT), magnetic resonance (MR) imaging and multi-modality positron emission tomography-CT (PET-CT). In our experiments, the AB-VH markedly improved the computational efficiency for the VH construction and thus improved the subsequent VH-driven volume manipulations. This efficiency was achieved without major degradation in the VH visually and numerical differences between the AB-VH and its full-bin counterpart. We applied several variants of the K-means clustering algorithm with varying Ks (the number of clusters) and found that higher values of K resulted in better performance at a lower computational gain. The AB-VH also had an improved performance when compared to the conventional method of down-sampling of the histogram bins (equal binning) for volume rendering visualisation. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
Exact and efficient simulation of concordant computation
NASA Astrophysics Data System (ADS)
Cable, Hugo; Browne, Daniel E.
2015-11-01
Concordant computation is a circuit-based model of quantum computation for mixed states, that assumes that all correlations within the register are discord-free (i.e. the correlations are essentially classical) at every step of the computation. The question of whether concordant computation always admits efficient simulation by a classical computer was first considered by Eastin in arXiv:quant-ph/1006.4402v1, where an answer in the affirmative was given for circuits consisting only of one- and two-qubit gates. Building on this work, we develop the theory of classical simulation of concordant computation. We present a new framework for understanding such computations, argue that a larger class of concordant computations admit efficient simulation, and provide alternative proofs for the main results of arXiv:quant-ph/1006.4402v1 with an emphasis on the exactness of simulation which is crucial for this model. We include detailed analysis of the arithmetic complexity for solving equations in the simulation, as well as extensions to larger gates and qudits. We explore the limitations of our approach, and discuss the challenges faced in developing efficient classical simulation algorithms for all concordant computations.
Fast methods to numerically integrate the Reynolds equation for gas fluid films
NASA Technical Reports Server (NTRS)
Dimofte, Florin
1992-01-01
The alternating direction implicit (ADI) method is adopted, modified, and applied to the Reynolds equation for thin, gas fluid films. An efficient code is developed to predict both the steady-state and dynamic performance of an aerodynamic journal bearing. An alternative approach is shown for hybrid journal gas bearings by using Liebmann's iterative solution (LIS) for elliptic partial differential equations. The results are compared with known design criteria from experimental data. The developed methods show good accuracy and very short computer running time in comparison with methods based on an inverting of a matrix. The computer codes need a small amount of memory and can be run on either personal computers or on mainframe systems.
Convolution of large 3D images on GPU and its decomposition
NASA Astrophysics Data System (ADS)
Karas, Pavel; Svoboda, David
2011-12-01
In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.
NASA Technical Reports Server (NTRS)
Chima, R. V.; Strazisar, A. J.
1982-01-01
Two and three dimensional inviscid solutions for the flow in a transonic axial compressor rotor at design speed are compared with probe and laser anemometers measurements at near-stall and maximum-flow operating points. Experimental details of the laser anemometer system and computational details of the two dimensional axisymmetric code and three dimensional Euler code are described. Comparisons are made between relative Mach number and flow angle contours, shock location, and shock strength. A procedure for using an efficient axisymmetric code to generate downstream pressure input for computationally expensive Euler codes is discussed. A film supplement shows the calculations of the two operating points with the time-marching Euler code.
NASA Astrophysics Data System (ADS)
MacDonald, Christopher L.; Bhattacharya, Nirupama; Sprouse, Brian P.; Silva, Gabriel A.
2015-09-01
Computing numerical solutions to fractional differential equations can be computationally intensive due to the effect of non-local derivatives in which all previous time points contribute to the current iteration. In general, numerical approaches that depend on truncating part of the system history while efficient, can suffer from high degrees of error and inaccuracy. Here we present an adaptive time step memory method for smooth functions applied to the Grünwald-Letnikov fractional diffusion derivative. This method is computationally efficient and results in smaller errors during numerical simulations. Sampled points along the system's history at progressively longer intervals are assumed to reflect the values of neighboring time points. By including progressively fewer points backward in time, a temporally 'weighted' history is computed that includes contributions from the entire past of the system, maintaining accuracy, but with fewer points actually calculated, greatly improving computational efficiency.
Selected computations of transonic cavity flows
NASA Technical Reports Server (NTRS)
Atwood, Christopher A.
1993-01-01
An efficient diagonal scheme implemented in an overset mesh framework has permitted the analysis of geometrically complex cavity flows via the Reynolds averaged Navier-Stokes equations. Use of rapid hyperbolic and algebraic grid methods has allowed simple specification of critical turbulent regions with an algebraic turbulence model. Comparisons between numerical and experimental results are made in two dimensions for the following problems: a backward-facing step; a resonating cavity; and two quieted cavity configurations. In three-dimensions the flow about three early concepts of the stratospheric Observatory For Infrared Astronomy (SOFIA) are compared to wind-tunnel data. Shedding frequencies of resolved shear layer structures are compared against experiment for the quieted cavities. The results demonstrate the progress of computational assessment of configuration safety and performance.
2009-01-01
Background Marginal posterior genotype probabilities need to be computed for genetic analyses such as geneticcounseling in humans and selective breeding in animal and plant species. Methods In this paper, we describe a peeling based, deterministic, exact algorithm to compute efficiently genotype probabilities for every member of a pedigree with loops without recourse to junction-tree methods from graph theory. The efficiency in computing the likelihood by peeling comes from storing intermediate results in multidimensional tables called cutsets. Computing marginal genotype probabilities for individual i requires recomputing the likelihood for each of the possible genotypes of individual i. This can be done efficiently by storing intermediate results in two types of cutsets called anterior and posterior cutsets and reusing these intermediate results to compute the likelihood. Examples A small example is used to illustrate the theoretical concepts discussed in this paper, and marginal genotype probabilities are computed at a monogenic disease locus for every member in a real cattle pedigree. PMID:19958551
Methods for Computationally Efficient Structured CFD Simulations of Complex Turbomachinery Flows
NASA Technical Reports Server (NTRS)
Herrick, Gregory P.; Chen, Jen-Ping
2012-01-01
This research presents more efficient computational methods by which to perform multi-block structured Computational Fluid Dynamics (CFD) simulations of turbomachinery, thus facilitating higher-fidelity solutions of complicated geometries and their associated flows. This computational framework offers flexibility in allocating resources to balance process count and wall-clock computation time, while facilitating research interests of simulating axial compressor stall inception with more complete gridding of the flow passages and rotor tip clearance regions than is typically practiced with structured codes. The paradigm presented herein facilitates CFD simulation of previously impractical geometries and flows. These methods are validated and demonstrate improved computational efficiency when applied to complicated geometries and flows.
Murphy, Ryan J; Liacouras, Peter C; Grant, Gerald T; Wolfe, Kevin C; Armand, Mehran; Gordon, Chad R
2016-11-01
Craniomaxillofacial reconstruction with patient-specific, customized craniofacial implants (CCIs) is ideal for skeletal defects involving areas of aesthetic concern-the non-weight-bearing facial skeleton, temporal skull, and/or frontal-forehead region. Results to date are superior to a variety of "off-the-shelf" materials, but require a protocol computed tomography scan and preexisting defect for computer-assisted design/computer-assisted manufacturing of the CCI. The authors developed a craniomaxillofacial surgical assistance workstation to address these challenges and intraoperatively guide CCI modification for an unknown defect size/shape. First, the surgeon designed an oversized CCI based on his/her surgical plan. Intraoperatively, the surgeon resected the bone and digitized the resection using a navigation pointer. Next, a projector displayed the limits of the craniofacial bone defect onto the prefabricated, oversized CCI for the size modification process; the surgeon followed the projected trace to modify the implant. A cadaveric study compared the standard technique (n = 1) to the experimental technique (n = 5) using surgical time and implant fit. The technology reduced the time and effort needed to resize the oversized CCI by an order of magnitude as compared with the standard manual resizing process. Implant fit was consistently better for the computer-assisted case compared with the control by at least 30%, requiring only 5.17 minutes in the computer-assisted cases compared with 35 minutes for the control. This approach demonstrated improvement in surgical time and accuracy of CCI-based craniomaxillofacial reconstruction compared with previously reported methods. The craniomaxillofacial surgical assistance workstation will provide craniofacial surgeons a computer-assisted technology for effective and efficient single-stage reconstruction when exact craniofacial bone defect sizes are unknown.
A polynomial chaos approach to the analysis of vehicle dynamics under uncertainty
NASA Astrophysics Data System (ADS)
Kewlani, Gaurav; Crawford, Justin; Iagnemma, Karl
2012-05-01
The ability of ground vehicles to quickly and accurately analyse their dynamic response to a given input is critical to their safety and efficient autonomous operation. In field conditions, significant uncertainty is associated with terrain and/or vehicle parameter estimates, and this uncertainty must be considered in the analysis of vehicle motion dynamics. Here, polynomial chaos approaches that explicitly consider parametric uncertainty during modelling of vehicle dynamics are presented. They are shown to be computationally more efficient than the standard Monte Carlo scheme, and experimental results compared with the simulation results performed on ANVEL (a vehicle simulator) indicate that the method can be utilised for efficient and accurate prediction of vehicle motion in realistic scenarios.
A community detection algorithm based on structural similarity
NASA Astrophysics Data System (ADS)
Guo, Xuchao; Hao, Xia; Liu, Yaqiong; Zhang, Li; Wang, Lu
2017-09-01
In order to further improve the efficiency and accuracy of community detection algorithm, a new algorithm named SSTCA (the community detection algorithm based on structural similarity with threshold) is proposed. In this algorithm, the structural similarities are taken as the weights of edges, and the threshold k is considered to remove multiple edges whose weights are less than the threshold, and improve the computational efficiency. Tests were done on the Zachary’s network, Dolphins’ social network and Football dataset by the proposed algorithm, and compared with GN and SSNCA algorithm. The results show that the new algorithm is superior to other algorithms in accuracy for the dense networks and the operating efficiency is improved obviously.
Aycock, Kenneth I; Campbell, Robert L; Manning, Keefe B; Craven, Brent A
2017-06-01
Inferior vena cava (IVC) filters are medical devices designed to provide a mechanical barrier to the passage of emboli from the deep veins of the legs to the heart and lungs. Despite decades of development and clinical use, IVC filters still fail to prevent the passage of all hazardous emboli. The objective of this study is to (1) develop a resolved two-way computational model of embolus transport, (2) provide verification and validation evidence for the model, and (3) demonstrate the ability of the model to predict the embolus-trapping efficiency of an IVC filter. Our model couples computational fluid dynamics simulations of blood flow to six-degree-of-freedom simulations of embolus transport and resolves the interactions between rigid, spherical emboli and the blood flow using an immersed boundary method. Following model development and numerical verification and validation of the computational approach against benchmark data from the literature, embolus transport simulations are performed in an idealized IVC geometry. Centered and tilted filter orientations are considered using a nonlinear finite element-based virtual filter placement procedure. A total of 2048 coupled CFD/6-DOF simulations are performed to predict the embolus-trapping statistics of the filter. The simulations predict that the embolus-trapping efficiency of the IVC filter increases with increasing embolus diameter and increasing embolus-to-blood density ratio. Tilted filter placement is found to decrease the embolus-trapping efficiency compared with centered filter placement. Multiple embolus-trapping locations are predicted for the IVC filter, and the trapping locations are predicted to shift upstream and toward the vessel wall with increasing embolus diameter. Simulations of the injection of successive emboli into the IVC are also performed and reveal that the embolus-trapping efficiency decreases with increasing thrombus load in the IVC filter. In future work, the computational tool could be used to investigate IVC filter design improvements, the effect of patient anatomy on embolus transport and IVC filter embolus-trapping efficiency, and, with further development and validation, optimal filter selection and placement on a patient-specific basis.
Reliability based design optimization: Formulations and methodologies
NASA Astrophysics Data System (ADS)
Agarwal, Harish
Modern products ranging from simple components to complex systems should be designed to be optimal and reliable. The challenge of modern engineering is to ensure that manufacturing costs are reduced and design cycle times are minimized while achieving requirements for performance and reliability. If the market for the product is competitive, improved quality and reliability can generate very strong competitive advantages. Simulation based design plays an important role in designing almost any kind of automotive, aerospace, and consumer products under these competitive conditions. Single discipline simulations used for analysis are being coupled together to create complex coupled simulation tools. This investigation focuses on the development of efficient and robust methodologies for reliability based design optimization in a simulation based design environment. Original contributions of this research are the development of a novel efficient and robust unilevel methodology for reliability based design optimization, the development of an innovative decoupled reliability based design optimization methodology, the application of homotopy techniques in unilevel reliability based design optimization methodology, and the development of a new framework for reliability based design optimization under epistemic uncertainty. The unilevel methodology for reliability based design optimization is shown to be mathematically equivalent to the traditional nested formulation. Numerical test problems show that the unilevel methodology can reduce computational cost by at least 50% as compared to the nested approach. The decoupled reliability based design optimization methodology is an approximate technique to obtain consistent reliable designs at lesser computational expense. Test problems show that the methodology is computationally efficient compared to the nested approach. A framework for performing reliability based design optimization under epistemic uncertainty is also developed. A trust region managed sequential approximate optimization methodology is employed for this purpose. Results from numerical test studies indicate that the methodology can be used for performing design optimization under severe uncertainty.
A novel scene management technology for complex virtual battlefield environment
NASA Astrophysics Data System (ADS)
Sheng, Changchong; Jiang, Libing; Tang, Bo; Tang, Xiaoan
2018-04-01
The efficient scene management of virtual environment is an important research content of computer real-time visualization, which has a decisive influence on the efficiency of drawing. However, Traditional scene management methods do not suitable for complex virtual battlefield environments, this paper combines the advantages of traditional scene graph technology and spatial data structure method, using the idea of management and rendering separation, a loose object-oriented scene graph structure is established to manage the entity model data in the scene, and the performance-based quad-tree structure is created for traversing and rendering. In addition, the collaborative update relationship between the above two structural trees is designed to achieve efficient scene management. Compared with the previous scene management method, this method is more efficient and meets the needs of real-time visualization.
NASA Astrophysics Data System (ADS)
Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.
2003-08-01
In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
Computation of multi-dimensional viscous supersonic jet flow
NASA Technical Reports Server (NTRS)
Kim, Y. N.; Buggeln, R. C.; Mcdonald, H.
1986-01-01
A new method has been developed for two- and three-dimensional computations of viscous supersonic flows with embedded subsonic regions adjacent to solid boundaries. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases relevant to internal supersonic flow is presented and compared with data. Comparison between data and computation are in general excellent thus indicating that the computational technique has great promise as a tool for calculating supersonic flow with embedded subsonic regions. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Control mechanism of double-rotator-structure ternary optical computer
NASA Astrophysics Data System (ADS)
Kai, SONG; Liping, YAN
2017-03-01
Double-rotator-structure ternary optical processor (DRSTOP) has two characteristics, namely, giant data-bits parallel computing and reconfigurable processor, which can handle thousands of data bits in parallel, and can run much faster than computers and other optical computer systems so far. In order to put DRSTOP into practical application, this paper established a series of methods, namely, task classification method, data-bits allocation method, control information generation method, control information formatting and sending method, and decoded results obtaining method and so on. These methods form the control mechanism of DRSTOP. This control mechanism makes DRSTOP become an automated computing platform. Compared with the traditional calculation tools, DRSTOP computing platform can ease the contradiction between high energy consumption and big data computing due to greatly reducing the cost of communications and I/O. Finally, the paper designed a set of experiments for DRSTOP control mechanism to verify its feasibility and correctness. Experimental results showed that the control mechanism is correct, feasible and efficient.
High Performance Computing Meets Energy Efficiency - Continuum Magazine |
NREL High Performance Computing Meets Energy Efficiency High Performance Computing Meets Energy turbines. Simulation by Patrick J. Moriarty and Matthew J. Churchfield, NREL The new High Performance Computing Data Center at the National Renewable Energy Laboratory (NREL) hosts high-speed, high-volume data
Sensitivity analysis of dynamic biological systems with time-delays.
Wu, Wu Hsiung; Wang, Feng Sheng; Chang, Maw Shang
2010-10-15
Mathematical modeling has been applied to the study and analysis of complex biological systems for a long time. Some processes in biological systems, such as the gene expression and feedback control in signal transduction networks, involve a time delay. These systems are represented as delay differential equation (DDE) models. Numerical sensitivity analysis of a DDE model by the direct method requires the solutions of model and sensitivity equations with time-delays. The major effort is the computation of Jacobian matrix when computing the solution of sensitivity equations. The computation of partial derivatives of complex equations either by the analytic method or by symbolic manipulation is time consuming, inconvenient, and prone to introduce human errors. To address this problem, an automatic approach to obtain the derivatives of complex functions efficiently and accurately is necessary. We have proposed an efficient algorithm with an adaptive step size control to compute the solution and dynamic sensitivities of biological systems described by ordinal differential equations (ODEs). The adaptive direct-decoupled algorithm is extended to solve the solution and dynamic sensitivities of time-delay systems describing by DDEs. To save the human effort and avoid the human errors in the computation of partial derivatives, an automatic differentiation technique is embedded in the extended algorithm to evaluate the Jacobian matrix. The extended algorithm is implemented and applied to two realistic models with time-delays: the cardiovascular control system and the TNF-α signal transduction network. The results show that the extended algorithm is a good tool for dynamic sensitivity analysis on DDE models with less user intervention. By comparing with direct-coupled methods in theory, the extended algorithm is efficient, accurate, and easy to use for end users without programming background to do dynamic sensitivity analysis on complex biological systems with time-delays.
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Efficient Use of Distributed Systems for Scientific Applications
NASA Technical Reports Server (NTRS)
Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques
2000-01-01
Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring distributed systems. In particular this application, illustrated in the document entails an integration of finite element and fluid dynamic simulations to address the cooling of turbine blades of a gas turbine engine design. It is not uncommon to encounter high-temperature, film-cooled turbine airfoils with 1,000,000s of degrees of freedom. This results because of the complexity of the various components of the airfoils, requiring fine-grain meshing for accuracy. Additional information is contained in the original.
LCC-Demons: a robust and accurate symmetric diffeomorphic registration algorithm.
Lorenzi, M; Ayache, N; Frisoni, G B; Pennec, X
2013-11-01
Non-linear registration is a key instrument for computational anatomy to study the morphology of organs and tissues. However, in order to be an effective instrument for the clinical practice, registration algorithms must be computationally efficient, accurate and most importantly robust to the multiple biases affecting medical images. In this work we propose a fast and robust registration framework based on the log-Demons diffeomorphic registration algorithm. The transformation is parameterized by stationary velocity fields (SVFs), and the similarity metric implements a symmetric local correlation coefficient (LCC). Moreover, we show how the SVF setting provides a stable and consistent numerical scheme for the computation of the Jacobian determinant and the flux of the deformation across the boundaries of a given region. Thus, it provides a robust evaluation of spatial changes. We tested the LCC-Demons in the inter-subject registration setting, by comparing with state-of-the-art registration algorithms on public available datasets, and in the intra-subject longitudinal registration problem, for the statistically powered measurements of the longitudinal atrophy in Alzheimer's disease. Experimental results show that LCC-Demons is a generic, flexible, efficient and robust algorithm for the accurate non-linear registration of images, which can find several applications in the field of medical imaging. Without any additional optimization, it solves equally well intra & inter-subject registration problems, and compares favorably to state-of-the-art methods. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Bardina, J. E.
1994-01-01
A new computational efficient 3-D compressible Reynolds-averaged implicit Navier-Stokes method with advanced two equation turbulence models for high speed flows is presented. All convective terms are modeled using an entropy satisfying higher-order Total Variation Diminishing (TVD) scheme based on implicit upwind flux-difference split approximations and arithmetic averaging procedure of primitive variables. This method combines the best features of data management and computational efficiency of space marching procedures with the generality and stability of time dependent Navier-Stokes procedures to solve flows with mixed supersonic and subsonic zones, including streamwise separated flows. Its robust stability derives from a combination of conservative implicit upwind flux-difference splitting with Roe's property U to provide accurate shock capturing capability that non-conservative schemes do not guarantee, alternating symmetric Gauss-Seidel 'method of planes' relaxation procedure coupled with a three-dimensional two-factor diagonal-dominant approximate factorization scheme, TVD flux limiters of higher-order flux differences satisfying realizability, and well-posed characteristic-based implicit boundary-point a'pproximations consistent with the local characteristics domain of dependence. The efficiency of the method is highly increased with Newton Raphson acceleration which allows convergence in essentially one forward sweep for supersonic flows. The method is verified by comparing with experiment and other Navier-Stokes methods. Here, results of adiabatic and cooled flat plate flows, compression corner flow, and 3-D hypersonic shock-wave/turbulent boundary layer interaction flows are presented. The robust 3-D method achieves a better computational efficiency of at least one order of magnitude over the CNS Navier-Stokes code. It provides cost-effective aerodynamic predictions in agreement with experiment, and the capability of predicting complex flow structures in complex geometries with good accuracy.
Propulsive efficiency of frog swimming with different feet and swimming patterns
Jizhuang, Fan; Wei, Zhang; Bowen, Yuan; Gangfeng, Liu
2017-01-01
ABSTRACT Aquatic and terrestrial animals have different swimming performances and mechanical efficiencies based on their different swimming methods. To explore propulsion in swimming frogs, this study calculated mechanical efficiencies based on data describing aquatic and terrestrial webbed-foot shapes and swimming patterns. First, a simplified frog model and dynamic equation were established, and hydrodynamic forces on the foot were computed according to computational fluid dynamic calculations. Then, a two-link mechanism was used to stand in for the diverse and complicated hind legs found in different frog species, in order to simplify the input work calculation. Joint torques were derived based on the virtual work principle to compute the efficiency of foot propulsion. Finally, two feet and swimming patterns were combined to compute propulsive efficiency. The aquatic frog demonstrated a propulsive efficiency (43.11%) between those of drag-based and lift-based propulsions, while the terrestrial frog efficiency (29.58%) fell within the range of drag-based propulsion. The results illustrate the main factor of swimming patterns for swimming performance and efficiency. PMID:28302669
NASA Astrophysics Data System (ADS)
Alzubaidi, Mohammad; Balasubramanian, Vineeth; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.
2012-03-01
Inductive learning refers to machine learning algorithms that learn a model from a set of training data instances. Any test instance is then classified by comparing it to the learned model. When the set of training instances lend themselves well to modeling, the use of a model substantially reduces the computation cost of classification. However, some training data sets are complex, and do not lend themselves well to modeling. Transductive learning refers to machine learning algorithms that classify test instances by comparing them to all of the training instances, without creating an explicit model. This can produce better classification performance, but at a much higher computational cost. Medical images vary greatly across human populations, constituting a data set that does not lend itself well to modeling. Our previous work showed that the wide variations seen across training sets of "normal" chest radiographs make it difficult to successfully classify test radiographs with an inductive (modeling) approach, and that a transductive approach leads to much better performance in detecting atypical regions. The problem with the transductive approach is its high computational cost. This paper develops and demonstrates a novel semi-transductive framework that can address the unique challenges of atypicality detection in chest radiographs. The proposed framework combines the superior performance of transductive methods with the reduced computational cost of inductive methods. Our results show that the proposed semitransductive approach provides both effective and efficient detection of atypical regions within a set of chest radiographs previously labeled by Mayo Clinic expert thoracic radiologists.
NASA Astrophysics Data System (ADS)
Lu, D.; Ricciuto, D. M.; Evans, K. J.
2017-12-01
Data-worth analysis plays an essential role in improving the understanding of the subsurface system, in developing and refining subsurface models, and in supporting rational water resources management. However, data-worth analysis is computationally expensive as it requires quantifying parameter uncertainty, prediction uncertainty, and both current and potential data uncertainties. Assessment of these uncertainties in large-scale stochastic subsurface simulations using standard Monte Carlo (MC) sampling or advanced surrogate modeling is extremely computationally intensive, sometimes even infeasible. In this work, we propose efficient Bayesian analysis of data-worth using a multilevel Monte Carlo (MLMC) method. Compared to the standard MC that requires a significantly large number of high-fidelity model executions to achieve a prescribed accuracy in estimating expectations, the MLMC can substantially reduce the computational cost with the use of multifidelity approximations. As the data-worth analysis involves a great deal of expectation estimations, the cost savings from MLMC in the assessment can be very outstanding. While the proposed MLMC-based data-worth analysis is broadly applicable, we use it to a highly heterogeneous oil reservoir simulation to select an optimal candidate data set that gives the largest uncertainty reduction in predicting mass flow rates at four production wells. The choices made by the MLMC estimation are validated by the actual measurements of the potential data, and consistent with the estimation obtained from the standard MC. But compared to the standard MC, the MLMC greatly reduces the computational costs in the uncertainty reduction estimation, with up to 600 days cost savings when one processor is used.
Arterial waveguide model for shear wave elastography: implementation and in vitro validation
NASA Astrophysics Data System (ADS)
Vaziri Astaneh, Ali; Urban, Matthew W.; Aquino, Wilkins; Greenleaf, James F.; Guddati, Murthy N.
2017-07-01
Arterial stiffness is found to be an early indicator of many cardiovascular diseases. Among various techniques, shear wave elastography has emerged as a promising tool for estimating local arterial stiffness through the observed dispersion of guided waves. In this paper, we develop efficient models for the computational simulation of guided wave dispersion in arterial walls. The models are capable of considering fluid-loaded tubes, immersed in fluid or embedded in a solid, which are encountered in in vitro/ex vivo, and in vivo experiments. The proposed methods are based on judiciously combining Fourier transformation and finite element discretization, leading to a significant reduction in computational cost while fully capturing complex 3D wave propagation. The developed methods are implemented in open-source code, and verified by comparing them with significantly more expensive, fully 3D finite element models. We also validate the models using the shear wave elastography of tissue-mimicking phantoms. The computational efficiency of the developed methods indicates the possibility of being able to estimate arterial stiffness in real time, which would be beneficial in clinical settings.
Miller, Robert T.; Delin, G.N.
1994-01-01
A three-dimensional, anisotropic, nonisothermal, ground-water-flow, and thermal-energy-transport model was constructed to simulate the four short-term test cycles. The model was used to simulate the entire short-term testing period of approximately 400 days. The only model properties varied during model calibration were longitudinal and transverse thermal dispersivities, which, for final calibration, were simulated as 3.3 and 0.33 meters, respectively. The model was calibrated by comparing model-computed results to (1) measured temperatures at selected altitudes in four observation wells, (2) measured temperatures at the production well, and (3) calculated thermal efficiencies of the aquifer. Model-computed withdrawal-water temperatures were within an average of about 3 percent of measured values and model-computed aquifer-thermal efficiencies were within an average of about 5 percent of calculated values for the short-term test cycles. These data indicate that the model accurately simulated thermal-energy storage within the Franconia-Ironton-Galesville aquifer.
Comparative Evaluation of Different Optimization Algorithms for Structural Design Applications
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.
1996-01-01
Non-linear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Centre, a project was initiated to assess the performance of eight different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using the eight different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems, however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with Sequential Unconstrained Minimizations Technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
NASA Astrophysics Data System (ADS)
Plata, Jose J.; Nath, Pinku; Usanmaz, Demet; Carrete, Jesús; Toher, Cormac; de Jong, Maarten; Asta, Mark; Fornari, Marco; Nardelli, Marco Buongiorno; Curtarolo, Stefano
2017-10-01
One of the most accurate approaches for calculating lattice thermal conductivity, , is solving the Boltzmann transport equation starting from third-order anharmonic force constants. In addition to the underlying approximations of ab-initio parameterization, two main challenges are associated with this path: high computational costs and lack of automation in the frameworks using this methodology, which affect the discovery rate of novel materials with ad-hoc properties. Here, the Automatic Anharmonic Phonon Library (AAPL) is presented. It efficiently computes interatomic force constants by making effective use of crystal symmetry analysis, it solves the Boltzmann transport equation to obtain , and allows a fully integrated operation with minimum user intervention, a rational addition to the current high-throughput accelerated materials development framework AFLOW. An "experiment vs. theory" study of the approach is shown, comparing accuracy and speed with respect to other available packages, and for materials characterized by strong electron localization and correlation. Combining AAPL with the pseudo-hybrid functional ACBN0 is possible to improve accuracy without increasing computational requirements.
Modeling bioluminescent photon transport in tissue based on Radiosity-diffusion model
NASA Astrophysics Data System (ADS)
Sun, Li; Wang, Pu; Tian, Jie; Zhang, Bo; Han, Dong; Yang, Xin
2010-03-01
Bioluminescence tomography (BLT) is one of the most important non-invasive optical molecular imaging modalities. The model for the bioluminescent photon propagation plays a significant role in the bioluminescence tomography study. Due to the high computational efficiency, diffusion approximation (DA) is generally applied in the bioluminescence tomography. But the diffusion equation is valid only in highly scattering and weakly absorbing regions and fails in non-scattering or low-scattering tissues, such as a cyst in the breast, the cerebrospinal fluid (CSF) layer of the brain and synovial fluid layer in the joints. A hybrid Radiosity-diffusion model is proposed for dealing with the non-scattering regions within diffusing domains in this paper. This hybrid method incorporates a priori information of the geometry of non-scattering regions, which can be acquired by magnetic resonance imaging (MRI) or x-ray computed tomography (CT). Then the model is implemented using a finite element method (FEM) to ensure the high computational efficiency. Finally, we demonstrate that the method is comparable with Mont Carlo (MC) method which is regarded as a 'gold standard' for photon transportation simulation.
NASA Astrophysics Data System (ADS)
Senegačnik, Jure; Tavčar, Gregor; Katrašnik, Tomaž
2015-03-01
The paper presents a computationally efficient method for solving the time dependent diffusion equation in a granule of the Li-ion battery's granular solid electrode. The method, called Discrete Temporal Convolution method (DTC), is based on a discrete temporal convolution of the analytical solution of the step function boundary value problem. This approach enables modelling concentration distribution in the granular particles for arbitrary time dependent exchange fluxes that do not need to be known a priori. It is demonstrated in the paper that the proposed method features faster computational times than finite volume/difference methods and Padé approximation at the same accuracy of the results. It is also demonstrated that all three addressed methods feature higher accuracy compared to the quasi-steady polynomial approaches when applied to simulate the current densities variations typical for mobile/automotive applications. The proposed approach can thus be considered as one of the key innovative methods enabling real-time capability of the multi particle electrochemical battery models featuring spatial and temporal resolved particle concentration profiles.
NASA Technical Reports Server (NTRS)
Yamauchi, G.; Johnson, W.
1984-01-01
A computationally efficient body analysis designed to couple with a comprehensive helicopter analysis is developed in order to calculate the body-induced aerodynamic effects on rotor performance and loads. A modified slender body theory is used as the body model. With the objective of demonstrating the accuracy, efficiency, and application of the method, the analysis at this stage is restricted to axisymmetric bodies at zero angle of attack. By comparing with results from an exact analysis for simple body shapes, it is found that the modified slender body theory provides an accurate potential flow solution for moderately thick bodies, with only a 10%-20% increase in computational effort over that of an isolated rotor analysis. The computational ease of this method provides a means for routine assessment of body-induced effects on a rotor. Results are given for several configurations that typify those being used in the Ames 40- by 80-Foot Wind Tunnel and in the rotor-body aerodynamic interference tests being conducted at Ames. A rotor-hybrid airship configuration is also analyzed.
Dong, Jianwu; Chen, Feng; Zhou, Dong; Liu, Tian; Yu, Zhaofei; Wang, Yi
2017-03-01
Existence of low SNR regions and rapid-phase variations pose challenges to spatial phase unwrapping algorithms. Global optimization-based phase unwrapping methods are widely used, but are significantly slower than greedy methods. In this paper, dual decomposition acceleration is introduced to speed up a three-dimensional graph cut-based phase unwrapping algorithm. The phase unwrapping problem is formulated as a global discrete energy minimization problem, whereas the technique of dual decomposition is used to increase the computational efficiency by splitting the full problem into overlapping subproblems and enforcing the congruence of overlapping variables. Using three dimensional (3D) multiecho gradient echo images from an agarose phantom and five brain hemorrhage patients, we compared this proposed method with an unaccelerated graph cut-based method. Experimental results show up to 18-fold acceleration in computation time. Dual decomposition significantly improves the computational efficiency of 3D graph cut-based phase unwrapping algorithms. Magn Reson Med 77:1353-1358, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing
Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant
2016-01-01
Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
On the Efficacy of a Computer-Based Program to Teach Visual Braille Reading
ERIC Educational Resources Information Center
Scheithauer, Mindy C.; Tiger, Jeffrey H.; Miller, Sarah J.
2013-01-01
Scheithauer and Tiger (2012) created an efficient computerized program that taught 4 sighted college students to select text letters when presented with visual depictions of braille alphabetic characters and resulted in the emergence of some braille reading. The current study extended these results to a larger sample (n?=?81) and compared the…
ERIC Educational Resources Information Center
Mullany, Britta; Barlow, Allison; Neault, Nicole; Billy, Trudy; Hastings, Ranelda; Coho-Mescal, Valerie; Lorenzo, Sherilyn; Walkup, John T.
2013-01-01
Computer-assisted interviewing techniques have increasingly been used in program and research settings to improve data collection quality and efficiency. Little is known, however, regarding the use of such techniques with American Indian (AI) adolescents in collecting sensitive information. This brief compares the consistency of AI adolescent…
Snore related signals processing in a private cloud computing system.
Qian, Kun; Guo, Jian; Xu, Huijie; Zhu, Zhaomeng; Zhang, Gongxuan
2014-09-01
Snore related signals (SRS) have been demonstrated to carry important information about the obstruction site and degree in the upper airway of Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) patients in recent years. To make this acoustic signal analysis method more accurate and robust, big SRS data processing is inevitable. As an emerging concept and technology, cloud computing has motivated numerous researchers and engineers to exploit applications both in academic and industry field, which could have an ability to implement a huge blue print in biomedical engineering. Considering the security and transferring requirement of biomedical data, we designed a system based on private cloud computing to process SRS. Then we set the comparable experiments of processing a 5-hour audio recording of an OSAHS patient by a personal computer, a server and a private cloud computing system to demonstrate the efficiency of the infrastructure we proposed.
Computing Finite-Time Lyapunov Exponents with Optimally Time Dependent Reduction
NASA Astrophysics Data System (ADS)
Babaee, Hessam; Farazmand, Mohammad; Sapsis, Themis; Haller, George
2016-11-01
We present a method to compute Finite-Time Lyapunov Exponents (FTLE) of a dynamical system using Optimally Time-Dependent (OTD) reduction recently introduced by H. Babaee and T. P. Sapsis. The OTD modes are a set of finite-dimensional, time-dependent, orthonormal basis {ui (x , t) } |i=1N that capture the directions associated with transient instabilities. The evolution equation of the OTD modes is derived from a minimization principle that optimally approximates the most unstable directions over finite times. To compute the FTLE, we evolve a single OTD mode along with the nonlinear dynamics. We approximate the FTLE from the reduced system obtained from projecting the instantaneous linearized dynamics onto the OTD mode. This results in a significant reduction in the computational cost compared to conventional methods for computing FTLE. We demonstrate the efficiency of our method for double Gyre and ABC flows. ARO project 66710-EG-YIP.
Development of a Computer Writing System Based on EOG
López, Alberto; Ferrero, Francisco; Yangüela, David; Álvarez, Constantina; Postolache, Octavian
2017-01-01
The development of a novel computer writing system based on eye movements is introduced herein. A system of these characteristics requires the consideration of three subsystems: (1) A hardware device for the acquisition and transmission of the signals generated by eye movement to the computer; (2) A software application that allows, among other functions, data processing in order to minimize noise and classify signals; and (3) A graphical interface that allows the user to write text easily on the computer screen using eye movements only. This work analyzes these three subsystems and proposes innovative and low cost solutions for each one of them. This computer writing system was tested with 20 users and its efficiency was compared to a traditional virtual keyboard. The results have shown an important reduction in the time spent on writing, which can be very useful, especially for people with severe motor disorders. PMID:28672863
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
NASA Astrophysics Data System (ADS)
Grujicic, M.; Arakere, G.; Hariharan, A.; Pandurangan, B.
2012-06-01
The introduction of newer joining technologies like the so-called friction-stir welding (FSW) into automotive engineering entails the knowledge of the joint-material microstructure and properties. Since, the development of vehicles (including military vehicles capable of surviving blast and ballistic impacts) nowadays involves extensive use of the computational engineering analyses (CEA), robust high-fidelity material models are needed for the FSW joints. A two-level material-homogenization procedure is proposed and utilized in this study to help manage computational cost and computer storage requirements for such CEAs. The method utilizes experimental (microstructure, microhardness, tensile testing, and x-ray diffraction) data to construct: (a) the material model for each weld zone and (b) the material model for the entire weld. The procedure is validated by comparing its predictions with the predictions of more detailed but more costly computational analyses.
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Newman, Perry A.; Haigler, Kara J.
1993-01-01
The computational technique of automatic differentiation (AD) is applied to a three-dimensional thin-layer Navier-Stokes multigrid flow solver to assess the feasibility and computational impact of obtaining exact sensitivity derivatives typical of those needed for sensitivity analyses. Calculations are performed for an ONERA M6 wing in transonic flow with both the Baldwin-Lomax and Johnson-King turbulence models. The wing lift, drag, and pitching moment coefficients are differentiated with respect to two different groups of input parameters. The first group consists of the second- and fourth-order damping coefficients of the computational algorithm, whereas the second group consists of two parameters in the viscous turbulent flow physics modelling. Results obtained via AD are compared, for both accuracy and computational efficiency with the results obtained with divided differences (DD). The AD results are accurate, extremely simple to obtain, and show significant computational advantage over those obtained by DD for some cases.
Development of a Computer Writing System Based on EOG.
López, Alberto; Ferrero, Francisco; Yangüela, David; Álvarez, Constantina; Postolache, Octavian
2017-06-26
The development of a novel computer writing system based on eye movements is introduced herein. A system of these characteristics requires the consideration of three subsystems: (1) A hardware device for the acquisition and transmission of the signals generated by eye movement to the computer; (2) A software application that allows, among other functions, data processing in order to minimize noise and classify signals; and (3) A graphical interface that allows the user to write text easily on the computer screen using eye movements only. This work analyzes these three subsystems and proposes innovative and low cost solutions for each one of them. This computer writing system was tested with 20 users and its efficiency was compared to a traditional virtual keyboard. The results have shown an important reduction in the time spent on writing, which can be very useful, especially for people with severe motor disorders.
Gleeson, Fergus V.; Brady, Michael; Schnabel, Julia A.
2018-01-01
Abstract. Deformable image registration, a key component of motion correction in medical imaging, needs to be efficient and provides plausible spatial transformations that reliably approximate biological aspects of complex human organ motion. Standard approaches, such as Demons registration, mostly use Gaussian regularization for organ motion, which, though computationally efficient, rule out their application to intrinsically more complex organ motions, such as sliding interfaces. We propose regularization of motion based on supervoxels, which provides an integrated discontinuity preserving prior for motions, such as sliding. More precisely, we replace Gaussian smoothing by fast, structure-preserving, guided filtering to provide efficient, locally adaptive regularization of the estimated displacement field. We illustrate the approach by applying it to estimate sliding motions at lung and liver interfaces on challenging four-dimensional computed tomography (CT) and dynamic contrast-enhanced magnetic resonance imaging datasets. The results show that guided filter-based regularization improves the accuracy of lung and liver motion correction as compared to Gaussian smoothing. Furthermore, our framework achieves state-of-the-art results on a publicly available CT liver dataset. PMID:29662918
Papież, Bartłomiej W; Franklin, James M; Heinrich, Mattias P; Gleeson, Fergus V; Brady, Michael; Schnabel, Julia A
2018-04-01
Deformable image registration, a key component of motion correction in medical imaging, needs to be efficient and provides plausible spatial transformations that reliably approximate biological aspects of complex human organ motion. Standard approaches, such as Demons registration, mostly use Gaussian regularization for organ motion, which, though computationally efficient, rule out their application to intrinsically more complex organ motions, such as sliding interfaces. We propose regularization of motion based on supervoxels, which provides an integrated discontinuity preserving prior for motions, such as sliding. More precisely, we replace Gaussian smoothing by fast, structure-preserving, guided filtering to provide efficient, locally adaptive regularization of the estimated displacement field. We illustrate the approach by applying it to estimate sliding motions at lung and liver interfaces on challenging four-dimensional computed tomography (CT) and dynamic contrast-enhanced magnetic resonance imaging datasets. The results show that guided filter-based regularization improves the accuracy of lung and liver motion correction as compared to Gaussian smoothing. Furthermore, our framework achieves state-of-the-art results on a publicly available CT liver dataset.
A Component-Based FPGA Design Framework for Neuronal Ion Channel Dynamics Simulations
Mak, Terrence S. T.; Rachmuth, Guy; Lam, Kai-Pui; Poon, Chi-Sang
2008-01-01
Neuron-machine interfaces such as dynamic clamp and brain-implantable neuroprosthetic devices require real-time simulations of neuronal ion channel dynamics. Field Programmable Gate Array (FPGA) has emerged as a high-speed digital platform ideal for such application-specific computations. We propose an efficient and flexible component-based FPGA design framework for neuronal ion channel dynamics simulations, which overcomes certain limitations of the recently proposed memory-based approach. A parallel processing strategy is used to minimize computational delay, and a hardware-efficient factoring approach for calculating exponential and division functions in neuronal ion channel models is used to conserve resource consumption. Performances of the various FPGA design approaches are compared theoretically and experimentally in corresponding implementations of the AMPA and NMDA synaptic ion channel models. Our results suggest that the component-based design framework provides a more memory economic solution as well as more efficient logic utilization for large word lengths, whereas the memory-based approach may be suitable for time-critical applications where a higher throughput rate is desired. PMID:17190033
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-01-01
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption. PMID:26131666
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-06-26
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption.
An efficient formulation of robot arm dynamics for control and computer simulation
NASA Astrophysics Data System (ADS)
Lee, C. S. G.; Nigam, R.
This paper describes an efficient formulation of the dynamic equations of motion of industrial robots based on the Lagrange formulation of d'Alembert's principle. This formulation, as applied to a PUMA robot arm, results in a set of closed form second order differential equations with cross product terms. They are not as efficient in computation as those formulated by the Newton-Euler method, but provide a better analytical model for control analysis and computer simulation. Computational complexities of this dynamic model together with other models are tabulated for discussion.