fast running computer: Topics by Science.gov

Sample records for fast running computer

Comparison of scientific computing platforms for MCNP4A Monte Carlo calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendricks, J.S.; Brockhoff, R.C.

1994-04-01

The performance of seven computer platforms is evaluated with the widely used and internationally available MCNP4A Monte Carlo radiation transport code. All results are reproducible and are presented in such a way as to enable comparison with computer platforms not in the study. The authors observed that the HP/9000-735 workstation runs MCNP 50% faster than the Cray YMP 8/64. Compared with the Cray YMP 8/64, the IBM RS/6000-560 is 68% as fast, the Sun Sparc10 is 66% as fast, the Silicon Graphics ONYX is 90% as fast, the Gateway 2000 model 4DX2-66V personal computer is 27% as fast, and themore » Sun Sparc2 is 24% as fast. In addition to comparing the timing performance of the seven platforms, the authors observe that changes in compilers and software over the past 2 yr have resulted in only modest performance improvements, hardware improvements have enhanced performance by less than a factor of [approximately]3, timing studies are very problem dependent, MCNP4Q runs about as fast as MCNP4.« less
Concepts and Plans towards fast large scale Monte Carlo production for the ATLAS Experiment

NASA Astrophysics Data System (ADS)

Ritsch, E.; Atlas Collaboration

2014-06-01

The huge success of the physics program of the ATLAS experiment at the Large Hadron Collider (LHC) during Run 1 relies upon a great number of simulated Monte Carlo events. This Monte Carlo production takes the biggest part of the computing resources being in use by ATLAS as of now. In this document we describe the plans to overcome the computing resource limitations for large scale Monte Carlo production in the ATLAS Experiment for Run 2, and beyond. A number of fast detector simulation, digitization and reconstruction techniques are being discussed, based upon a new flexible detector simulation framework. To optimally benefit from these developments, a redesigned ATLAS MC production chain is presented at the end of this document.
Fast All-Sky Radiation Model for Solar Applications (FARMS): A Brief Overview of Mechanisms, Performance, and Applications: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Yu; Sengupta, Manajit

Solar radiation can be computed using radiative transfer models, such as the Rapid Radiation Transfer Model (RRTM) and its general circulation model applications, and used for various energy applications. Due to the complexity of computing radiation fields in aerosol and cloudy atmospheres, simulating solar radiation can be extremely time-consuming, but many approximations--e.g., the two-stream approach and the delta-M truncation scheme--can be utilized. To provide a new fast option for computing solar radiation, we developed the Fast All-sky Radiation Model for Solar applications (FARMS) by parameterizing the simulated diffuse horizontal irradiance and direct normal irradiance for cloudy conditions from the RRTMmore » runs using a 16-stream discrete ordinates radiative transfer method. The solar irradiance at the surface was simulated by combining the cloud irradiance parameterizations with a fast clear-sky model, REST2. To understand the accuracy and efficiency of the newly developed fast model, we analyzed FARMS runs using cloud optical and microphysical properties retrieved using GOES data from 2009-2012. The global horizontal irradiance for cloudy conditions was simulated using FARMS and RRTM for global circulation modeling with a two-stream approximation and compared to measurements taken from the U.S. Department of Energy's Atmospheric Radiation Measurement Climate Research Facility Southern Great Plains site. Our results indicate that the accuracy of FARMS is comparable to or better than the two-stream approach; however, FARMS is approximately 400 times more efficient because it does not explicitly solve the radiative transfer equation for each individual cloud condition. Radiative transfer model runs are computationally expensive, but this model is promising for broad applications in solar resource assessment and forecasting. It is currently being used in the National Solar Radiation Database, which is publicly available from the National Renewable Energy Laboratory at http://nsrdb.nrel.gov.« less
Program Processes Thermocouple Readings

NASA Technical Reports Server (NTRS)

Quave, Christine A.; Nail, William, III

1995-01-01

Digital Signal Processor for Thermocouples (DART) computer program implements precise and fast method of converting voltage to temperature for large-temperature-range thermocouple applications. Written using LabVIEW software. DART available only as object code for use on Macintosh II FX or higher-series computers running System 7.0 or later and IBM PC-series and compatible computers running Microsoft Windows 3.1. Macintosh version of DART (SSC-00032) requires LabVIEW 2.2.1 or 3.0 for execution. IBM PC version (SSC-00031) requires LabVIEW 3.0 for Windows 3.1. LabVIEW software product of National Instruments and not included with program.
Scalable Technology for a New Generation of Collaborative Applications

DTIC Science & Technology

2007-04-01

of the International Symposium on Distributed Computing (DISC), Cracow, Poland, September 2005. Classic Paxos vs. Fast Paxos: Caveat Emptor, Flavio...grou or able and fast multicast primitive to layer under high-level latency across dimensions as varied as group size [10, 17],abstractions such as...servers, networked via fast , dedicated interconnects. The system to subscribe to a fraction of the equities on the software stack running on a single
Analyzing Spacecraft Telecommunication Systems

NASA Technical Reports Server (NTRS)

Kordon, Mark; Hanks, David; Gladden, Roy; Wood, Eric

2004-01-01

Multi-Mission Telecom Analysis Tool (MMTAT) is a C-language computer program for analyzing proposed spacecraft telecommunication systems. MMTAT utilizes parameterized input and computational models that can be run on standard desktop computers to perform fast and accurate analyses of telecommunication links. MMTAT is easy to use and can easily be integrated with other software applications and run as part of almost any computational simulation. It is distributed as either a stand-alone application program with a graphical user interface or a linkable library with a well-defined set of application programming interface (API) calls. As a stand-alone program, MMTAT provides both textual and graphical output. The graphs make it possible to understand, quickly and easily, how telecommunication performance varies with variations in input parameters. A delimited text file that can be read by any spreadsheet program is generated at the end of each run. The API in the linkable-library form of MMTAT enables the user to control simulation software and to change parameters during a simulation run. Results can be retrieved either at the end of a run or by use of a function call at any time step.
Los Alamos National Laboratory Economic Analysis Capability Overview

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boero, Riccardo; Edwards, Brian Keith; Pasqualini, Donatella

Los Alamos National Laboratory has developed two types of models to compute the economic impact of infrastructure disruptions. FastEcon is a fast running model that estimates first-order economic impacts of large scale events such as hurricanes and floods and can be used to identify the amount of economic activity that occurs in a specific area. LANL’s Computable General Equilibrium (CGE) model estimates more comprehensive static and dynamic economic impacts of a broader array of events and captures the interactions between sectors and industries when estimating economic impacts.
Fast methods to numerically integrate the Reynolds equation for gas fluid films

NASA Technical Reports Server (NTRS)

Dimofte, Florin

1992-01-01

The alternating direction implicit (ADI) method is adopted, modified, and applied to the Reynolds equation for thin, gas fluid films. An efficient code is developed to predict both the steady-state and dynamic performance of an aerodynamic journal bearing. An alternative approach is shown for hybrid journal gas bearings by using Liebmann's iterative solution (LIS) for elliptic partial differential equations. The results are compared with known design criteria from experimental data. The developed methods show good accuracy and very short computer running time in comparison with methods based on an inverting of a matrix. The computer codes need a small amount of memory and can be run on either personal computers or on mainframe systems.
Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches

PubMed Central

Senter, Evan; Sheikh, Saad; Dotu, Ivan; Ponty, Yann; Clote, Peter

2012-01-01

Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/. PMID:23284639
Fast computation of hologram patterns of a 3D object using run-length encoding and novel look-up table methods.

PubMed

Kim, Seung-Cheol; Kim, Eun-Soo

2009-02-20

In this paper we propose a new approach for fast generation of computer-generated holograms (CGHs) of a 3D object by using the run-length encoding (RLE) and the novel look-up table (N-LUT) methods. With the RLE method, spatially redundant data of a 3D object are extracted and regrouped into the N-point redundancy map according to the number of the adjacent object points having the same 3D value. Based on this redundancy map, N-point principle fringe patterns (PFPs) are newly calculated by using the 1-point PFP of the N-LUT, and the CGH pattern for the 3D object is generated with these N-point PFPs. In this approach, object points to be involved in calculation of the CGH pattern can be dramatically reduced and, as a result, an increase of computational speed can be obtained. Some experiments with a test 3D object are carried out and the results are compared to those of the conventional methods.
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

PubMed Central

Torres-Huitzil, Cesar

2013-01-01

Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Fast neural network surrogates for very high dimensional physics-based models in computational oceanography.

PubMed

van der Merwe, Rudolph; Leen, Todd K; Lu, Zhengdong; Frolov, Sergey; Baptista, Antonio M

2007-05-01

We present neural network surrogates that provide extremely fast and accurate emulation of a large-scale circulation model for the coupled Columbia River, its estuary and near ocean regions. The circulation model has O(10(7)) degrees of freedom, is highly nonlinear and is driven by ocean, atmospheric and river influences at its boundaries. The surrogates provide accurate emulation of the full circulation code and run over 1000 times faster. Such fast dynamic surrogates will enable significant advances in ensemble forecasts in oceanography and weather.
An Accurate and Efficient Algorithm for Detection of Radio Bursts with an Unknown Dispersion Measure, for Single-dish Telescopes and Interferometers

NASA Astrophysics Data System (ADS)

Zackay, Barak; Ofek, Eran O.

2017-01-01

Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the “fast dispersion measure transform” algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of 2{N}f{N}t+{N}t{N}{{Δ }}{{log}}2({N}f), where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm’s computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.
Digital SAR processing using a fast polynomial transform

NASA Technical Reports Server (NTRS)

Butman, S.; Lipes, R.; Rubin, A.; Truong, T. K.

1981-01-01

A new digital processing algorithm based on the fast polynomial transform is developed for producing images from Synthetic Aperture Radar data. This algorithm enables the computation of the two dimensional cyclic correlation of the raw echo data with the impulse response of a point target, thereby reducing distortions inherent in one dimensional transforms. This SAR processing technique was evaluated on a general-purpose computer and an actual Seasat SAR image was produced. However, regular production runs will require a dedicated facility. It is expected that such a new SAR processing algorithm could provide the basis for a real-time SAR correlator implementation in the Deep Space Network.
Parallel computing for automated model calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

2002-07-29

Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less
Modified Laser and Thermos cell calculations on microcomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shapiro, A.; Huria, H.C.

1987-01-01

In the course of designing and operating nuclear reactors, many fuel pin cell calculations are required to obtain homogenized cell cross sections as a function of burnup. In the interest of convenience and cost, it would be very desirable to be able to make such calculations on microcomputers. In addition, such a microcomputer code would be very helpful for educational course work in reactor computations. To establish the feasibility of making detailed cell calculations on a microcomputer, a mainframe cell code was compiled and run on a microcomputer. The computer code Laser, originally written in Fortran IV for the IBM-7090more » class of mainframe computers, is a cylindrical, one-dimensional, multigroup lattice cell program that includes burnup. It is based on the MUFT code for epithermal and fast group calculations, and Thermos for the thermal calculations. There are 50 fast and epithermal groups and 35 thermal groups. Resonances are calculated assuming a homogeneous system and then corrected for self-shielding, Dancoff, and Doppler by self-shielding factors. The Laser code was converted to run on a microcomputer. In addition, the Thermos portion of Laser was extracted and compiled separately to have available a stand alone thermal code.« less
Fast simulation tool for ultraviolet radiation at the earth's surface

NASA Astrophysics Data System (ADS)

Engelsen, Ola; Kylling, Arve

2005-04-01

FastRT is a fast, yet accurate, UV simulation tool that computes downward surface UV doses, UV indices, and irradiances in the spectral range 290 to 400 nm with a resolution as small as 0.05 nm. It computes a full UV spectrum within a few milliseconds on a standard PC, and enables the user to convolve the spectrum with user-defined and built-in spectral response functions including the International Commission on Illumination (CIE) erythemal response function used for UV index calculations. The program accounts for the main radiative input parameters, i.e., instrumental characteristics, solar zenith angle, ozone column, aerosol loading, clouds, surface albedo, and surface altitude. FastRT is based on look-up tables of carefully selected entries of atmospheric transmittances and spherical albedos, and exploits the smoothness of these quantities with respect to atmospheric, surface, geometrical, and spectral parameters. An interactive site, http://nadir.nilu.no/~olaeng/fastrt/fastrt.html, enables the public to run the FastRT program with most input options. This page also contains updated information about FastRT and links to freely downloadable source codes and binaries.
RenderMan design principles

NASA Technical Reports Server (NTRS)

Apodaca, Tony; Porter, Tom

1989-01-01

The two worlds of interactive graphics and realistic graphics have remained separate. Fast graphics hardware runs simple algorithms and generates simple looking images. Photorealistic image synthesis software runs slowly on large expensive computers. The time has come for these two branches of computer graphics to merge. The speed and expense of graphics hardware is no longer the barrier to the wide acceptance of photorealism. There is every reason to believe that high quality image synthesis will become a standard capability of every graphics machine, from superworkstation to personal computer. The significant barrier has been the lack of a common language, an agreed-upon set of terms and conditions, for 3-D modeling systems to talk to 3-D rendering systems for computing an accurate rendition of that scene. Pixar has introduced RenderMan to serve as that common language. RenderMan, specifically the extensibility it offers in shading calculations, is discussed.
Response of the first wetted wall of an IFE reactor chamber to the energy release from a direct-drive DT capsule

DOE Office of Scientific and Technical Information (OSTI.GOV)

Medin, Stanislav A.; Basko, Mikhail M.; Orlov, Yurii N.

2012-07-11

Radiation hydrodynamics 1D simulations were performed with two concurrent codes, DEIRA and RAMPHY. The DEIRA code was used for DT capsule implosion and burn, and the RAMPHY code was used for computation of X-ray and fast ions deposition in the first wall liquid film of the reactor chamber. The simulations were run for 740 MJ direct drive DT capsule and Pb thin liquid wall reactor chamber of 10 m diameter. Temporal profiles for DT capsule leaking power of X-rays, neutrons and fast {sup 4}He ions were obtained and spatial profiles of the liquid film flow parameter were computed and analyzed.
Digital SAR processing using a fast polynomial transform

NASA Technical Reports Server (NTRS)

Truong, T. K.; Lipes, R. G.; Butman, S. A.; Reed, I. S.; Rubin, A. L.

1984-01-01

A new digital processing algorithm based on the fast polynomial transform is developed for producing images from Synthetic Aperture Radar data. This algorithm enables the computation of the two dimensional cyclic correlation of the raw echo data with the impulse response of a point target, thereby reducing distortions inherent in one dimensional transforms. This SAR processing technique was evaluated on a general-purpose computer and an actual Seasat SAR image was produced. However, regular production runs will require a dedicated facility. It is expected that such a new SAR processing algorithm could provide the basis for a real-time SAR correlator implementation in the Deep Space Network. Previously announced in STAR as N82-11295

Covariance Analysis Tool (G-CAT) for Computing Ascent, Descent, and Landing Errors

NASA Technical Reports Server (NTRS)

Boussalis, Dhemetrios; Bayard, David S.

2013-01-01

G-CAT is a covariance analysis tool that enables fast and accurate computation of error ellipses for descent, landing, ascent, and rendezvous scenarios, and quantifies knowledge error contributions needed for error budgeting purposes. Because GCAT supports hardware/system trade studies in spacecraft and mission design, it is useful in both early and late mission/ proposal phases where Monte Carlo simulation capability is not mature, Monte Carlo simulation takes too long to run, and/or there is a need to perform multiple parametric system design trades that would require an unwieldy number of Monte Carlo runs. G-CAT is formulated as a variable-order square-root linearized Kalman filter (LKF), typically using over 120 filter states. An important property of G-CAT is that it is based on a 6-DOF (degrees of freedom) formulation that completely captures the combined effects of both attitude and translation errors on the propagated trajectories. This ensures its accuracy for guidance, navigation, and control (GN&C) analysis. G-CAT provides the desired fast turnaround analysis needed for error budgeting in support of mission concept formulations, design trade studies, and proposal development efforts. The main usefulness of a covariance analysis tool such as G-CAT is its ability to calculate the performance envelope directly from a single run. This is in sharp contrast to running thousands of simulations to obtain similar information using Monte Carlo methods. It does this by propagating the "statistics" of the overall design, rather than simulating individual trajectories. G-CAT supports applications to lunar, planetary, and small body missions. It characterizes onboard knowledge propagation errors associated with inertial measurement unit (IMU) errors (gyro and accelerometer), gravity errors/dispersions (spherical harmonics, masscons), and radar errors (multiple altimeter beams, multiple Doppler velocimeter beams). G-CAT is a standalone MATLAB- based tool intended to run on any engineer's desktop computer.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code

NASA Astrophysics Data System (ADS)

Payne, J.; McCune, D.; Prater, R.

2010-11-01

NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
The relationship between gamma frequency and running speed differs for slow and fast gamma rhythms in freely behaving rats

PubMed Central

Zheng, Chenguang; Bieri, Kevin Wood; Trettel, Sean Gregory; Colgin, Laura Lee

2015-01-01

In hippocampal area CA1 of rats, the frequency of gamma activity has been shown to increase with running speed (Ahmed and Mehta, 2012). This finding suggests that different gamma frequencies simply allow for different timings of transitions across cell assemblies at varying running speeds, rather than serving unique functions. However, accumulating evidence supports the conclusion that slow (~25–55 Hz) and fast (~60–100 Hz) gamma are distinct network states with different functions. If slow and fast gamma constitute distinct network states, then it is possible that slow and fast gamma frequencies are differentially affected by running speed. In this study, we tested this hypothesis and found that slow and fast gamma frequencies change differently as a function of running speed in hippocampal areas CA1 and CA3, and in the superficial layers of the medial entorhinal cortex (MEC). Fast gamma frequencies increased with increasing running speed in all three areas. Slow gamma frequencies changed significantly less across different speeds. Furthermore, at high running speeds, CA3 firing rates were low, and MEC firing rates were high, suggesting that CA1 transitions from CA3 inputs to MEC inputs as running speed increases. These results support the hypothesis that slow and fast gamma reflect functionally distinct states in the hippocampal network, with fast gamma driven by MEC at high running speeds and slow gamma driven by CA3 at low running speeds. PMID:25601003
A Hierarchical Algorithm for Fast Debye Summation with Applications to Small Angle Scattering

PubMed Central

Gumerov, Nail A.; Berlin, Konstantin; Fushman, David; Duraiswami, Ramani

2012-01-01

Debye summation, which involves the summation of sinc functions of distances between all pair of atoms in three dimensional space, arises in computations performed in crystallography, small/wide angle X-ray scattering (SAXS/WAXS) and small angle neutron scattering (SANS). Direct evaluation of Debye summation has quadratic complexity, which results in computational bottleneck when determining crystal properties, or running structure refinement protocols that involve SAXS or SANS, even for moderately sized molecules. We present a fast approximation algorithm that efficiently computes the summation to any prescribed accuracy ε in linear time. The algorithm is similar to the fast multipole method (FMM), and is based on a hierarchical spatial decomposition of the molecule coupled with local harmonic expansions and translation of these expansions. An even more efficient implementation is possible when the scattering profile is all that is required, as in small angle scattering reconstruction (SAS) of macromolecules. We examine the relationship of the proposed algorithm to existing approximate methods for profile computations, and show that these methods may result in inaccurate profile computations, unless an error bound derived in this paper is used. Our theoretical and computational results show orders of magnitude improvement in computation complexity over existing methods, while maintaining prescribed accuracy. PMID:22707386
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

PubMed

Li, Wenyuan; Gong, Ke; Li, Qingjiao; Alber, Frank; Zhou, Xianghong Jasmine

2015-03-15

Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/. © The Author 2014. Published by Oxford University Press.
Surveillance of Space - Optimal Use of Complementary Sensors for Maximum Efficiency

DTIC Science & Technology

2006-04-01

as track - before - detect [4] have been shown to allow improved sensitivity. This technique employs fast running algorithms and computing power to pre...Multifunction Radar” IEEE Signal Processing Magazine, January 2006. [4] Wallace W R “The Use of Track - Before - Detect in Pulse-Doppler Radar” IEE 490, Radar
Computer simulation results of attitude estimation of earth orbiting satellites

NASA Technical Reports Server (NTRS)

Kou, S. R.

1976-01-01

Computer simulation results of attitude estimation of Earth-orbiting satellites (including Space Telescope) subjected to environmental disturbances and noises are presented. Decomposed linear recursive filter and Kalman filter were used as estimation tools. Six programs were developed for this simulation, and all were written in the basic language and were run on HP 9830A and HP 9866A computers. Simulation results show that a decomposed linear recursive filter is accurate in estimation and fast in response time. Furthermore, for higher order systems, this filter has computational advantages (i.e., less integration errors and roundoff errors) over a Kalman filter.
Fast neural net simulation with a DSP processor array.

PubMed

Muller, U A; Gunzinger, A; Guggenbuhl, W

1995-01-01

This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
High resolution frequency analysis techniques with application to the redshift experiment

NASA Technical Reports Server (NTRS)

Decher, R.; Teuber, D.

1975-01-01

High resolution frequency analysis methods, with application to the gravitational probe redshift experiment, are discussed. For this experiment a resolution of .00001 Hz is required to measure a slowly varying, low frequency signal of approximately 1 Hz. Major building blocks include fast Fourier transform, discrete Fourier transform, Lagrange interpolation, golden section search, and adaptive matched filter technique. Accuracy, resolution, and computer effort of these methods are investigated, including test runs on an IBM 360/65 computer.
Fast-response free-running dc-to-dc converter employing a state-trajectory control law

NASA Technical Reports Server (NTRS)

Huffman, S. D.; Burns, W. W., III; Wilson, T. G.; Owen, H. A., Jr.

1977-01-01

A recently proposed state-trajectory control law for a family of energy-storage dc-to-dc converters has been implemented for the voltage step-up configuration. Two methods of realization are discussed; one employs a digital processor and the other uses analog computational circuits. Performance characteristics of experimental voltage step-up converters operating under the control of each of these implementations are reported and compared to theoretical predictions and computer simulations.
Fast Inference with Min-Sum Matrix Product.

PubMed

Felzenszwalb, Pedro F; McAuley, Julian J

2011-12-01

The MAP inference problem in many graphical models can be solved efficiently using a fast algorithm for computing min-sum products of n × n matrices. The class of models in question includes cyclic and skip-chain models that arise in many applications. Although the worst-case complexity of the min-sum product operation is not known to be much better than O(n(3)), an O(n(2.5)) expected time algorithm was recently given, subject to some constraints on the input matrices. In this paper, we give an algorithm that runs in O(n(2) log n) expected time, assuming that the entries in the input matrices are independent samples from a uniform distribution. We also show that two variants of our algorithm are quite fast for inputs that arise in several applications. This leads to significant performance gains over previous methods in applications within computer vision and natural language processing.
On Computing Breakpoint Distances for Genomes with Duplicate Genes.

PubMed

Shao, Mingfu; Moret, Bernard M E

2017-06-01

A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.
User's Guide for a Modular Flutter Analysis Software System (Fast Version 1.0)

NASA Technical Reports Server (NTRS)

Desmarais, R. N.; Bennett, R. M.

1978-01-01

The use and operation of a group of computer programs to perform a flutter analysis of a single planar wing are described. This system of programs is called FAST for Flutter Analysis System, and consists of five programs. Each program performs certain portions of a flutter analysis and can be run sequentially as a job step or individually. FAST uses natural vibration modes as input data and performs a conventional V-g type of solution. The unsteady aerodynamics programs in FAST are based on the subsonic kernel function lifting-surface theory although other aerodynamic programs can be used. Application of the programs is illustrated by a sample case of a complete flutter calculation that exercises each program.
PACER -- A fast running computer code for the calculation of short-term containment/confinement loads following coolant boundary failure. Volume 2: User information

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sienicki, J.J.

A fast running and simple computer code has been developed to calculate pressure loadings inside light water reactor containments/confinements under loss-of-coolant accident conditions. PACER was originally developed to calculate containment/confinement pressure and temperature time histories for loss-of-coolant accidents in Soviet-designed VVER reactors and is relevant to the activities of the US International Nuclear Safety Center. The code employs a multicompartment representation of the containment volume and is focused upon application to early time containment phenomena during and immediately following blowdown. PACER has been developed for FORTRAN 77 and earlier versions of FORTRAN. The code has been successfully compiled and executedmore » on SUN SPARC and Hewlett-Packard HP-735 workstations provided that appropriate compiler options are specified. The code incorporates both capabilities built around a hardwired default generic VVER-440 Model V230 design as well as fairly general user-defined input. However, array dimensions are hardwired and must be changed by modifying the source code if the number of compartments/cells differs from the default number of nine. Detailed input instructions are provided as well as a description of outputs. Input files and selected output are presented for two sample problems run on both HP-735 and SUN SPARC workstations.« less
General purpose molecular dynamics simulations fully implemented on graphics processing units

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.

2008-05-01

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
Operating System For Numerically Controlled Milling Machine

NASA Technical Reports Server (NTRS)

Ray, R. B.

1992-01-01

OPMILL program is operating system for Kearney and Trecker milling machine providing fast easy way to program manufacture of machine parts with IBM-compatible personal computer. Gives machinist "equation plotter" feature, which plots equations that define movements and converts equations to milling-machine-controlling program moving cutter along defined path. System includes tool-manager software handling up to 25 tools and automatically adjusts to account for each tool. Developed on IBM PS/2 computer running DOS 3.3 with 1 MB of random-access memory.
Fast non-Abelian geometric gates via transitionless quantum driving.

PubMed

Zhang, J; Kyaw, Thi Ha; Tong, D M; Sjöqvist, Erik; Kwek, Leong-Chuan

2015-12-21

A practical quantum computer must be capable of performing high fidelity quantum gates on a set of quantum bits (qubits). In the presence of noise, the realization of such gates poses daunting challenges. Geometric phases, which possess intrinsic noise-tolerant features, hold the promise for performing robust quantum computation. In particular, quantum holonomies, i.e., non-Abelian geometric phases, naturally lead to universal quantum computation due to their non-commutativity. Although quantum gates based on adiabatic holonomies have already been proposed, the slow evolution eventually compromises qubit coherence and computational power. Here, we propose a general approach to speed up an implementation of adiabatic holonomic gates by using transitionless driving techniques and show how such a universal set of fast geometric quantum gates in a superconducting circuit architecture can be obtained in an all-geometric approach. Compared with standard non-adiabatic holonomic quantum computation, the holonomies obtained in our approach tends asymptotically to those of the adiabatic approach in the long run-time limit and thus might open up a new horizon for realizing a practical quantum computer.
Fast non-Abelian geometric gates via transitionless quantum driving

PubMed Central

Zhang, J.; Kyaw, Thi Ha; Tong, D. M.; Sjöqvist, Erik; Kwek, Leong-Chuan

2015-01-01

A practical quantum computer must be capable of performing high fidelity quantum gates on a set of quantum bits (qubits). In the presence of noise, the realization of such gates poses daunting challenges. Geometric phases, which possess intrinsic noise-tolerant features, hold the promise for performing robust quantum computation. In particular, quantum holonomies, i.e., non-Abelian geometric phases, naturally lead to universal quantum computation due to their non-commutativity. Although quantum gates based on adiabatic holonomies have already been proposed, the slow evolution eventually compromises qubit coherence and computational power. Here, we propose a general approach to speed up an implementation of adiabatic holonomic gates by using transitionless driving techniques and show how such a universal set of fast geometric quantum gates in a superconducting circuit architecture can be obtained in an all-geometric approach. Compared with standard non-adiabatic holonomic quantum computation, the holonomies obtained in our approach tends asymptotically to those of the adiabatic approach in the long run-time limit and thus might open up a new horizon for realizing a practical quantum computer. PMID:26687580
Advanced reliability methods for structural evaluation

NASA Technical Reports Server (NTRS)

Wirsching, P. H.; Wu, Y.-T.

1985-01-01

Fast probability integration (FPI) methods, which can yield approximate solutions to such general structural reliability problems as the computation of the probabilities of complicated functions of random variables, are known to require one-tenth the computer time of Monte Carlo methods for a probability level of 0.001; lower probabilities yield even more dramatic differences. A strategy is presented in which a computer routine is run k times with selected perturbed values of the variables to obtain k solutions for a response variable Y. An approximating polynomial is fit to the k 'data' sets, and FPI methods are employed for this explicit form.
Nonadiabatic conditional geometric phase shift with NMR.

PubMed

Xiang-Bin, W; Keiji, M

2001-08-27

A conditional geometric phase shift gate, which is fault tolerant to certain types of errors due to its geometric nature, was realized recently via nuclear magnetic resonance (NMR) under adiabatic conditions. However, in quantum computation, everything must be completed within the decoherence time. The adiabatic condition makes any fast conditional Berry phase (cyclic adiabatic geometric phase) shift gate impossible. Here we show that by using a newly designed sequence of simple operations with an additional vertical magnetic field, the conditional geometric phase shift gate can be run nonadiabatically. Therefore geometric quantum computation can be done at the same rate as usual quantum computation.

Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

PubMed

Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

2014-06-25

The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions

PubMed Central

2014-01-01

Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
Plancton: an opportunistic distributed computing project based on Docker containers

NASA Astrophysics Data System (ADS)

Concas, Matteo; Berzano, Dario; Bagnasco, Stefano; Lusso, Stefano; Masera, Massimo; Puccio, Maximiliano; Vallero, Sara

2017-10-01

The computing power of most modern commodity computers is far from being fully exploited by standard usage patterns. In this work we describe the development and setup of a virtual computing cluster based on Docker containers used as worker nodes. The facility is based on Plancton: a lightweight fire-and-forget background service. Plancton spawns and controls a local pool of Docker containers on a host with free resources, by constantly monitoring its CPU utilisation. It is designed to release the resources allocated opportunistically, whenever another demanding task is run by the host user, according to configurable policies. This is attained by killing a number of running containers. One of the advantages of a thin virtualization layer such as Linux containers is that they can be started almost instantly upon request. We will show how fast the start-up and disposal of containers eventually enables us to implement an opportunistic cluster based on Plancton daemons without a central control node, where the spawned Docker containers behave as job pilots. Finally, we will show how Plancton was configured to run up to 10 000 concurrent opportunistic jobs on the ALICE High-Level Trigger facility, by giving a considerable advantage in terms of management compared to virtual machines.
Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

NASA Astrophysics Data System (ADS)

Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

2017-07-01

Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.
Designing and Implementing an OVERFLOW Reader for ParaView and Comparing Performance Between Central Processing Units and Graphical Processing Units

NASA Technical Reports Server (NTRS)

Chawner, David M.; Gomez, Ray J.

2010-01-01

In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.
Redistribution of Mechanical Work at the Knee and Ankle Joints During Fast Running in Minimalist Shoes.

PubMed

Fuller, Joel T; Buckley, Jonathan D; Tsiros, Margarita D; Brown, Nicholas A T; Thewlis, Dominic

2016-10-01

Minimalist shoes have been suggested as a way to alter running biomechanics to improve running performance and reduce injuries. However, to date, researchers have only considered the effect of minimalist shoes at slow running speeds. To determine if runners change foot-strike pattern and alter the distribution of mechanical work at the knee and ankle joints when running at a fast speed in minimalist shoes compared with conventional running shoes. Crossover study. Research laboratory. Twenty-six trained runners (age = 30.0 ± 7.9 years [age range, 18-40 years], height = 1.79 ± 0.06 m, mass = 75.3 ± 8.2 kg, weekly training distance = 27 ± 15 km) who ran with a habitual rearfoot foot-strike pattern and had no experience running in minimalist shoes. Participants completed overground running trials at 18 km/h in minimalist and conventional shoes. Sagittal-plane kinematics and joint work at the knee and ankle joints were computed using 3-dimensional kinematic and ground reaction force data. Foot-strike pattern was classified as rearfoot, midfoot, or forefoot strike based on strike index and ankle angle at initial contact. We observed no difference in foot-strike classification between shoes (χ 2 1 = 2.29, P = .13). Ankle angle at initial contact was less (2.46° versus 7.43°; t 25 = 3.34, P = .003) and strike index was greater (35.97% versus 29.04%; t 25 = 2.38, P = .03) when running in minimalist shoes compared with conventional shoes. We observed greater negative (52.87 J versus 42.46 J; t 24 = 2.29, P = .03) and positive work (68.91 J versus 59.08 J; t 24 = 2.65, P = .01) at the ankle but less negative (59.01 J versus 67.02 J; t 24 = 2.25, P = .03) and positive work (40.37 J versus 47.09 J; t 24 = 2.11, P = .046) at the knee with minimalist shoes compared with conventional shoes. Running in minimalist shoes at a fast speed caused a redistribution of work from the knee to the ankle joint. This finding suggests that runners changing from conventional to minimalist shoes for short-distance races could be at an increased risk of ankle and calf injuries but a reduced risk of knee injuries.
Simulator platform for fast reactor operation and safety technology demonstration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vilim, R. B.; Park, Y. S.; Grandy, C.

2012-07-30

A simulator platform for visualization and demonstration of innovative concepts in fast reactor technology is described. The objective is to make more accessible the workings of fast reactor technology innovations and to do so in a human factors environment that uses state-of-the art visualization technologies. In this work the computer codes in use at Argonne National Laboratory (ANL) for the design of fast reactor systems are being integrated to run on this platform. This includes linking reactor systems codes with mechanical structures codes and using advanced graphics to depict the thermo-hydraulic-structure interactions that give rise to an inherently safe responsemore » to upsets. It also includes visualization of mechanical systems operation including advanced concepts that make use of robotics for operations, in-service inspection, and maintenance.« less
Simulating Scenes In Outer Space

NASA Technical Reports Server (NTRS)

Callahan, John D.

1989-01-01

Multimission Interactive Picture Planner, MIP, computer program for scientifically accurate and fast, three-dimensional animation of scenes in deep space. Versatile, reasonably comprehensive, and portable, and runs on microcomputers. New techniques developed to perform rapidly calculations and transformations necessary to animate scenes in scientifically accurate three-dimensional space. Written in FORTRAN 77 code. Primarily designed to handle Voyager, Galileo, and Space Telescope. Adapted to handle other missions.
Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

PubMed Central

Saa, Pedro A.; Nielsen, Lars K.

2016-01-01

Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155
Fast modal extraction in NASTRAN via the FEER computer program. [based on automatic matrix reduction method for lower modes of structures with many degrees of freedom

NASA Technical Reports Server (NTRS)

Newman, M. B.; Pipano, A.

1973-01-01

A new eigensolution routine, FEER (Fast Eigensolution Extraction Routine), used in conjunction with NASTRAN at Israel Aircraft Industries is described. The FEER program is based on an automatic matrix reduction scheme whereby the lower modes of structures with many degrees of freedom can be accurately extracted from a tridiagonal eigenvalue problem whose size is of the same order of magnitude as the number of required modes. The process is effected without arbitrary lumping of masses at selected node points or selection of nodes to be retained in the analysis set. The results of computational efficiency studies are presented, showing major arithmetic operation counts and actual computer run times of FEER as compared to other methods of eigenvalue extraction, including those available in the NASTRAN READ module. It is concluded that the tridiagonal reduction method used in FEER would serve as a valuable addition to NASTRAN for highly increased efficiency in obtaining structural vibration modes.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Fast Physically Correct Refocusing for Sparse Light Fields Using Block-Based Multi-Rate View Interpolation.

PubMed

Huang, Chao-Tsung; Wang, Yu-Wen; Huang, Li-Ren; Chin, Jui; Chen, Liang-Gee

2017-02-01

Digital refocusing has a tradeoff between complexity and quality when using sparsely sampled light fields for low-storage applications. In this paper, we propose a fast physically correct refocusing algorithm to address this issue in a twofold way. First, view interpolation is adopted to provide photorealistic quality at infocus-defocus hybrid boundaries. Regarding its conventional high complexity, we devised a fast line-scan method specifically for refocusing, and its 1D kernel can be 30× faster than the benchmark View Synthesis Reference Software (VSRS)-1D-Fast. Second, we propose a block-based multi-rate processing flow for accelerating purely infocused or defocused regions, and a further 3- 34× speedup can be achieved for high-resolution images. All candidate blocks of variable sizes can interpolate different numbers of rendered views and perform refocusing in different subsampled layers. To avoid visible aliasing and block artifacts, we determine these parameters and the simulated aperture filter through a localized filter response analysis using defocus blur statistics. The final quadtree block partitions are then optimized in terms of computation time. Extensive experimental results are provided to show superior refocusing quality and fast computation speed. In particular, the run time is comparable with the conventional single-image blurring, which causes serious boundary artifacts.
Superresolution radar imaging based on fast inverse-free sparse Bayesian learning for multiple measurement vectors

NASA Astrophysics Data System (ADS)

He, Xingyu; Tong, Ningning; Hu, Xiaowei

2018-01-01

Compressive sensing has been successfully applied to inverse synthetic aperture radar (ISAR) imaging of moving targets. By exploiting the block sparse structure of the target image, sparse solution for multiple measurement vectors (MMV) can be applied in ISAR imaging and a substantial performance improvement can be achieved. As an effective sparse recovery method, sparse Bayesian learning (SBL) for MMV involves a matrix inverse at each iteration. Its associated computational complexity grows significantly with the problem size. To address this problem, we develop a fast inverse-free (IF) SBL method for MMV. A relaxed evidence lower bound (ELBO), which is computationally more amiable than the traditional ELBO used by SBL, is obtained by invoking fundamental property for smooth functions. A variational expectation-maximization scheme is then employed to maximize the relaxed ELBO, and a computationally efficient IF-MSBL algorithm is proposed. Numerical results based on simulated and real data show that the proposed method can reconstruct row sparse signal accurately and obtain clear superresolution ISAR images. Moreover, the running time and computational complexity are reduced to a great extent compared with traditional SBL methods.
Beauty and the beast: Some perspectives on efficient model analysis, surrogate models, and the future of modeling

NASA Astrophysics Data System (ADS)

Hill, M. C.; Jakeman, J.; Razavi, S.; Tolson, B.

2015-12-01

For many environmental systems model runtimes have remained very long as more capable computers have been used to add more processes and more time and space discretization. Scientists have also added more parameters and kinds of observations, and many model runs are needed to explore the models. Computational demand equals run time multiplied by number of model runs divided by parallelization opportunities. Model exploration is conducted using sensitivity analysis, optimization, and uncertainty quantification. Sensitivity analysis is used to reveal consequences of what may be very complex simulated relations, optimization is used to identify parameter values that fit the data best, or at least better, and uncertainty quantification is used to evaluate the precision of simulated results. The long execution times make such analyses a challenge. Methods for addressing this challenges include computationally frugal analysis of the demanding original model and a number of ingenious surrogate modeling methods. Both commonly use about 50-100 runs of the demanding original model. In this talk we consider the tradeoffs between (1) original model development decisions, (2) computationally frugal analysis of the original model, and (3) using many model runs of the fast surrogate model. Some questions of interest are as follows. If the added processes and discretization invested in (1) are compared with the restrictions and approximations in model analysis produced by long model execution times, is there a net benefit related of the goals of the model? Are there changes to the numerical methods that could reduce the computational demands while giving up less fidelity than is compromised by using computationally frugal methods or surrogate models for model analysis? Both the computationally frugal methods and surrogate models require that the solution of interest be a smooth function of the parameters or interest. How does the information obtained from the local methods typical of (2) and the global averaged methods typical of (3) compare for typical systems? The discussion will use examples of response of the Greenland glacier to global warming and surface and groundwater modeling.
Method of developing all-optical trinary JK, D-type, and T-type flip-flops using semiconductor optical amplifiers.

PubMed

Garai, Sisir Kumar

2012-04-10

To meet the demand of very fast and agile optical networks, the optical processors in a network system should have a very fast execution rate, large information handling, and large information storage capacities. Multivalued logic operations and multistate optical flip-flops are the basic building blocks for such fast running optical computing and data processing systems. In the past two decades, many methods of implementing all-optical flip-flops have been proposed. Most of these suffer from speed limitations because of the low switching response of active devices. The frequency encoding technique has been used because of its many advantages. It can preserve its identity throughout data communication irrespective of loss of light energy due to reflection, refraction, attenuation, etc. The action of polarization-rotation-based very fast switching of semiconductor optical amplifiers increases processing speed. At the same time, tristate optical flip-flops increase information handling capacity.
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

PubMed

Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michał J

2014-09-15

Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Holographic Adaptive Laser Optics System (HALOS): Fast, Autonomous Aberration Correction

NASA Astrophysics Data System (ADS)

Andersen, G.; MacDonald, K.; Gelsinger-Austin, P.

2013-09-01

We present an adaptive optics system which uses a multiplexed hologram to deconvolve the phase aberrations in an input beam. This wavefront characterization is extremely fast as it is based on simple measurements of the intensity of focal spots and does not require any computations. Furthermore, the system does not require a computer in the loop and is thus much cheaper, less complex and more robust as well. A fully functional, closed-loop prototype incorporating a 32-element MEMS mirror has been constructed. The unit has a footprint no larger than a laptop but runs at a bandwidth of 100kHz over an order of magnitude faster than comparable, conventional systems occupying a significantly larger volume. Additionally, since the sensing is based on parallel, all-optical processing, the speed is independent of actuator number running at the same bandwidth for one actuator as for a million. We are developing the HALOS technology with a view towards next-generation surveillance systems for extreme adaptive optics applications. These include imaging, lidar and free-space optical communications for unmanned aerial vehicles and SSA. The small volume is ideal for UAVs, while the high speed and high resolution will be of great benefit to the ground-based observation of space-based objects.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
A fast and high performance multiple data integration algorithm for identifying human disease genes

PubMed Central

2015-01-01

Background Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. Results In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. Conclusions The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms. PMID:26399620

Merlin - Massively parallel heterogeneous computing

NASA Technical Reports Server (NTRS)

Wittie, Larry; Maples, Creve

1989-01-01

Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
The LHCb software and computing upgrade for Run 3: opportunities and challenges

NASA Astrophysics Data System (ADS)

Bozzi, C.; Roiser, S.; LHCb Collaboration

2017-10-01

The LHCb detector will be upgraded for the LHC Run 3 and will be readout at 30 MHz, corresponding to the full inelastic collision rate, with major implications on the full software trigger and offline computing. If the current computing model and software framework are kept, the data storage capacity and computing power required to process data at this rate, and to generate and reconstruct equivalent samples of simulated events, will exceed the current capacity by at least one order of magnitude. A redesign of the software framework, including scheduling, the event model, the detector description and the conditions database, is needed to fully exploit the computing power of multi-, many-core architectures, and coprocessors. Data processing and the analysis model will also change towards an early streaming of different data types, in order to limit storage resources, with further implications for the data analysis workflows. Fast simulation options will allow to obtain a reasonable parameterization of the detector response in considerably less computing time. Finally, the upgrade of LHCb will be a good opportunity to review and implement changes in the domains of software design, test and review, and analysis workflow and preservation. In this contribution, activities and recent results in all the above areas are presented.
Fast and accurate face recognition based on image compression

NASA Astrophysics Data System (ADS)

Zheng, Yufeng; Blasch, Erik

2017-05-01

Image compression is desired for many image-related applications especially for network-based applications with bandwidth and storage constraints. The face recognition community typical reports concentrate on the maximal compression rate that would not decrease the recognition accuracy. In general, the wavelet-based face recognition methods such as EBGM (elastic bunch graph matching) and FPB (face pattern byte) are of high performance but run slowly due to their high computation demands. The PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) algorithms run fast but perform poorly in face recognition. In this paper, we propose a novel face recognition method based on standard image compression algorithm, which is termed as compression-based (CPB) face recognition. First, all gallery images are compressed by the selected compression algorithm. Second, a mixed image is formed with the probe and gallery images and then compressed. Third, a composite compression ratio (CCR) is computed with three compression ratios calculated from: probe, gallery and mixed images. Finally, the CCR values are compared and the largest CCR corresponds to the matched face. The time cost of each face matching is about the time of compressing the mixed face image. We tested the proposed CPB method on the "ASUMSS face database" (visible and thermal images) from 105 subjects. The face recognition accuracy with visible images is 94.76% when using JPEG compression. On the same face dataset, the accuracy of FPB algorithm was reported as 91.43%. The JPEG-compressionbased (JPEG-CPB) face recognition is standard and fast, which may be integrated into a real-time imaging device.
Redistribution of Mechanical Work at the Knee and Ankle Joints During Fast Running in Minimalist Shoes

PubMed Central

Fuller, Joel T.; Buckley, Jonathan D.; Tsiros, Margarita D.; Brown, Nicholas A. T.; Thewlis, Dominic

2016-01-01

Context: Minimalist shoes have been suggested as a way to alter running biomechanics to improve running performance and reduce injuries. However, to date, researchers have only considered the effect of minimalist shoes at slow running speeds. Objective: To determine if runners change foot-strike pattern and alter the distribution of mechanical work at the knee and ankle joints when running at a fast speed in minimalist shoes compared with conventional running shoes. Design: Crossover study. Setting: Research laboratory. Patients or Other Participants: Twenty-six trained runners (age = 30.0 ± 7.9 years [age range, 18−40 years], height = 1.79 ± 0.06 m, mass = 75.3 ± 8.2 kg, weekly training distance = 27 ± 15 km) who ran with a habitual rearfoot foot-strike pattern and had no experience running in minimalist shoes. Intervention(s): Participants completed overground running trials at 18 km/h in minimalist and conventional shoes. Main Outcome Measure(s): Sagittal-plane kinematics and joint work at the knee and ankle joints were computed using 3-dimensional kinematic and ground reaction force data. Foot-strike pattern was classified as rearfoot, midfoot, or forefoot strike based on strike index and ankle angle at initial contact. Results: We observed no difference in foot-strike classification between shoes (χ21 = 2.29, P = .13). Ankle angle at initial contact was less (2.46° versus 7.43°; t25 = 3.34, P = .003) and strike index was greater (35.97% versus 29.04%; t25 = 2.38, P = .03) when running in minimalist shoes compared with conventional shoes. We observed greater negative (52.87 J versus 42.46 J; t24 = 2.29, P = .03) and positive work (68.91 J versus 59.08 J; t24 = 2.65, P = .01) at the ankle but less negative (59.01 J versus 67.02 J; t24 = 2.25, P = .03) and positive work (40.37 J versus 47.09 J; t24 = 2.11, P = .046) at the knee with minimalist shoes compared with conventional shoes. Conclusions: Running in minimalist shoes at a fast speed caused a redistribution of work from the knee to the ankle joint. This finding suggests that runners changing from conventional to minimalist shoes for short-distance races could be at an increased risk of ankle and calf injuries but a reduced risk of knee injuries. PMID:27834504
A Fast Code for Jupiter Atmospheric Entry

NASA Technical Reports Server (NTRS)

Tauber, Michael E.; Wercinski, Paul; Yang, Lily; Chen, Yih-Kanq; Arnold, James (Technical Monitor)

1998-01-01

A fast code was developed to calculate the forebody heating environment and heat shielding that is required for Jupiter atmospheric entry probes. A carbon phenolic heat shield material was assumed and, since computational efficiency was a major goal, analytic expressions were used, primarily, to calculate the heating, ablation and the required insulation. The code was verified by comparison with flight measurements from the Galileo probe's entry; the calculation required 3.5 sec of CPU time on a work station. The computed surface recessions from ablation were compared with the flight values at six body stations. The average, absolute, predicted difference in the recession was 12.5% too high. The forebody's mass loss was overpredicted by 5.5% and the heat shield mass was calculated to be 15% less than the probe's actual heat shield. However, the calculated heat shield mass did not include contingencies for the various uncertainties that must be considered in the design of probes. Therefore, the agreement with the Galileo probe's values was considered satisfactory, especially in view of the code's fast running time and the methods' approximations.
A Newton-Krylov solver for fast spin-up of online ocean tracers

NASA Astrophysics Data System (ADS)

Lindsay, Keith

2017-01-01

We present a Newton-Krylov based solver to efficiently spin up tracers in an online ocean model. We demonstrate that the solver converges, that tracer simulations initialized with the solution from the solver have small drift, and that the solver takes orders of magnitude less computational time than the brute force spin-up approach. To demonstrate the application of the solver, we use it to efficiently spin up the tracer ideal age with respect to the circulation from different time intervals in a long physics run. We then evaluate how the spun-up ideal age tracer depends on the duration of the physics run, i.e., on how equilibrated the circulation is.
An 81.6 μW FastICA processor for epileptic seizure detection.

PubMed

Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming

2015-02-01

To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Effects of Ramadan intermittent fasting on middle-distance running performance in well-trained runners.

PubMed

Brisswalter, Jeanick; Bouhlel, Ezzedine; Falola, Jean Marie; Abbiss, Christopher R; Vallier, Jean Marc; Hausswirth, Christophe; Hauswirth, Christophe

2011-09-01

To assess whether Ramadan intermittent fasting (RIF) affects 5000-m running performance and physiological parameters classically associated with middle-distance performance. Two experimental groups (Ramadan fasting, n = 9, vs control, n = 9) participated in 2 experimental sessions, one before RIF and the other at the last week of fasting. For each session, subjects completed 4 tests in the same order: a maximal running test, a maximal voluntary contraction (MVC) of knee extensor, 2 rectangular submaximal exercises on treadmill for 6 minutes at an intensity corresponding to the first ventilatory threshold (VT1), and a running performance test (5000 m). Eighteen, well-trained, middle-distance runners. Maximal oxygen consumption, MVC, running performance, running efficiency, submaximal VO(2) kinetics parameters (VO(2), VO(2)b, time constant τ, and amplitude A1) and anthropometric parameters were recorded or calculated. At the end of Ramadan fasting, a decrease in MVC was observed (-3.2%; P < 0.00001; η, 0.80), associated with an increase in the time constant of oxygen kinetics (+51%; P < 0.00007; η, 0.72) and a decrease in performance (-5%; P < 0.0007; η, 0.51). No effect was observed on running efficiency or maximal aerobic power. These results suggest that Ramadan changes in muscular performance and oxygen kinetics could affect performance during middle-distance events and need to be considered to choose training protocols during RIF.
A Fast Implementation of the ISODATA Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2005-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.

2007-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

PubMed Central

2013-01-01

Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
Towards real-time photon Monte Carlo dose calculation in the cloud

NASA Astrophysics Data System (ADS)

Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe

2017-06-01

Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.

PubMed

Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe

2017-06-07

Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
xPerm: fast index canonicalization for tensor computer algebra

NASA Astrophysics Data System (ADS)

Martín-García, José M.

2008-10-01

We present a very fast implementation of the Butler-Portugal algorithm for index canonicalization with respect to permutation symmetries. It is called xPerm, and has been written as a combination of a Mathematica package and a C subroutine. The latter performs the most demanding parts of the computations and can be linked from any other program or computer algebra system. We demonstrate with tests and timings the effectively polynomial performance of the Butler-Portugal algorithm with respect to the number of indices, though we also show a case in which it is exponential. Our implementation handles generic tensorial expressions with several dozen indices in hundredths of a second, or one hundred indices in a few seconds, clearly outperforming all other current canonicalizers. The code has been already under intensive testing for several years and has been essential in recent investigations in large-scale tensor computer algebra. Program summaryProgram title: xPerm Catalogue identifier: AEBH_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 93 582 No. of bytes in distributed program, including test data, etc.: 1 537 832 Distribution format: tar.gz Programming language: C and Mathematica (version 5.0 or higher) Computer: Any computer running C and Mathematica (version 5.0 or higher) Operating system: Linux, Unix, Windows XP, MacOS RAM:: 20 Mbyte Word size: 64 or 32 bits Classification: 1.5, 5 Nature of problem: Canonicalization of indexed expressions with respect to permutation symmetries. Solution method: The Butler-Portugal algorithm. Restrictions: Multiterm symmetries are not considered. Running time: A few seconds with generic expressions of up to 100 indices. The xPermDoc.nb notebook supplied with the distribution takes approximately one and a half hours to execute in full.
Hybrid stochastic simulation of reaction-diffusion systems with slow and fast dynamics.

PubMed

Strehl, Robert; Ilie, Silvana

2015-12-21

In this paper, we present a novel hybrid method to simulate discrete stochastic reaction-diffusion models arising in biochemical signaling pathways. We study moderately stiff systems, for which we can partition each reaction or diffusion channel into either a slow or fast subset, based on its propensity. Numerical approaches missing this distinction are often limited with respect to computational run time or approximation quality. We design an approximate scheme that remedies these pitfalls by using a new blending strategy of the well-established inhomogeneous stochastic simulation algorithm and the tau-leaping simulation method. The advantages of our hybrid simulation algorithm are demonstrated on three benchmarking systems, with special focus on approximation accuracy and efficiency.
Fast global image smoothing based on weighted least squares.

PubMed

Min, Dongbo; Choi, Sunghwan; Lu, Jiangbo; Ham, Bumsub; Sohn, Kwanghoon; Do, Minh N

2014-12-01

This paper presents an efficient technique for performing a spatially inhomogeneous edge-preserving image smoothing, called fast global smoother. Focusing on sparse Laplacian matrices consisting of a data term and a prior term (typically defined using four or eight neighbors for 2D image), our approach efficiently solves such global objective functions. In particular, we approximate the solution of the memory-and computation-intensive large linear system, defined over a d-dimensional spatial domain, by solving a sequence of 1D subsystems. Our separable implementation enables applying a linear-time tridiagonal matrix algorithm to solve d three-point Laplacian matrices iteratively. Our approach combines the best of two paradigms, i.e., efficient edge-preserving filters and optimization-based smoothing. Our method has a comparable runtime to the fast edge-preserving filters, but its global optimization formulation overcomes many limitations of the local filtering approaches. Our method also achieves high-quality results as the state-of-the-art optimization-based techniques, but runs ∼10-30 times faster. Besides, considering the flexibility in defining an objective function, we further propose generalized fast algorithms that perform Lγ norm smoothing (0 < γ < 2) and support an aggregated (robust) data term for handling imprecise data constraints. We demonstrate the effectiveness and efficiency of our techniques in a range of image processing and computer graphics applications.
Distributed and collaborative synthetic environments

NASA Technical Reports Server (NTRS)

Bajaj, Chandrajit L.; Bernardini, Fausto

1995-01-01

Fast graphics workstations and increased computing power, together with improved interface technologies, have created new and diverse possibilities for developing and interacting with synthetic environments. A synthetic environment system is generally characterized by input/output devices that constitute the interface between the human senses and the synthetic environment generated by the computer; and a computation system running a real-time simulation of the environment. A basic need of a synthetic environment system is that of giving the user a plausible reproduction of the visual aspect of the objects with which he is interacting. The goal of our Shastra research project is to provide a substrate of geometric data structures and algorithms which allow the distributed construction and modification of the environment, efficient querying of objects attributes, collaborative interaction with the environment, fast computation of collision detection and visibility information for efficient dynamic simulation and real-time scene display. In particular, we address the following issues: (1) A geometric framework for modeling and visualizing synthetic environments and interacting with them. We highlight the functions required for the geometric engine of a synthetic environment system. (2) A distribution and collaboration substrate that supports construction, modification, and interaction with synthetic environments on networked desktop machines.
Using Machine Learning as a fast emulator of physical processes within the Met Office's Unified Model

NASA Astrophysics Data System (ADS)

Prudden, R.; Arribas, A.; Tomlinson, J.; Robinson, N.

2017-12-01

The Unified Model is a numerical model of the atmosphere used at the UK Met Office (and numerous partner organisations including Korean Meteorological Agency, Australian Bureau of Meteorology and US Air Force) for both weather and climate applications.Especifically, dynamical models such as the Unified Model are now a central part of weather forecasting. Starting from basic physical laws, these models make it possible to predict events such as storms before they have even begun to form. The Unified Model can be simply described as having two components: one component solves the navier-stokes equations (usually referred to as the "dynamics"); the other solves relevant sub-grid physical processes (usually referred to as the "physics"). Running weather forecasts requires substantial computing resources - for example, the UK Met Office operates the largest operational High Performance Computer in Europe - and the cost of a typical simulation is spent roughly 50% in the "dynamics" and 50% in the "physics". Therefore there is a high incentive to reduce cost of weather forecasts and Machine Learning is a possible option because, once a machine learning model has been trained, it is often much faster to run than a full simulation. This is the motivation for a technique called model emulation, the idea being to build a fast statistical model which closely approximates a far more expensive simulation. In this paper we discuss the use of Machine Learning as an emulator to replace the "physics" component of the Unified Model. Various approaches and options will be presented and the implications for further model development, operational running of forecasting systems, development of data assimilation schemes, and development of ensemble prediction techniques will be discussed.
Fast and Sensitive Alignment of Microbial Whole Genome Sequencing Reads to Large Sequence Datasets on a Desktop PC: Application to Metagenomic Datasets and Pathogen Identification

PubMed Central

2014-01-01

Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner. PMID:25077800

Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

PubMed

Pongor, Lőrinc S; Vera, Roberto; Ligeti, Balázs

2014-01-01

Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.
VASA: Interactive Computational Steering of Large Asynchronous Simulation Pipelines for Societal Infrastructure.

PubMed

Ko, Sungahn; Zhao, Jieqiong; Xia, Jing; Afzal, Shehzad; Wang, Xiaoyu; Abram, Greg; Elmqvist, Niklas; Kne, Len; Van Riper, David; Gaither, Kelly; Kennedy, Shaun; Tolone, William; Ribarsky, William; Ebert, David S

2014-12-01

We present VASA, a visual analytics platform consisting of a desktop application, a component model, and a suite of distributed simulation components for modeling the impact of societal threats such as weather, food contamination, and traffic on critical infrastructure such as supply chains, road networks, and power grids. Each component encapsulates a high-fidelity simulation model that together form an asynchronous simulation pipeline: a system of systems of individual simulations with a common data and parameter exchange format. At the heart of VASA is the Workbench, a visual analytics application providing three distinct features: (1) low-fidelity approximations of the distributed simulation components using local simulation proxies to enable analysts to interactively configure a simulation run; (2) computational steering mechanisms to manage the execution of individual simulation components; and (3) spatiotemporal and interactive methods to explore the combined results of a simulation run. We showcase the utility of the platform using examples involving supply chains during a hurricane as well as food contamination in a fast food restaurant chain.
Software Framework for Advanced Power Plant Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

John Widmann; Sorin Munteanu; Aseem Jain

2010-08-01

This report summarizes the work accomplished during the Phase II development effort of the Advanced Process Engineering Co-Simulator (APECS). The objective of the project is to develop the tools to efficiently combine high-fidelity computational fluid dynamics (CFD) models with process modeling software. During the course of the project, a robust integration controller was developed that can be used in any CAPE-OPEN compliant process modeling environment. The controller mediates the exchange of information between the process modeling software and the CFD software. Several approaches to reducing the time disparity between CFD simulations and process modeling have been investigated and implemented. Thesemore » include enabling the CFD models to be run on a remote cluster and enabling multiple CFD models to be run simultaneously. Furthermore, computationally fast reduced-order models (ROMs) have been developed that can be 'trained' using the results from CFD simulations and then used directly within flowsheets. Unit operation models (both CFD and ROMs) can be uploaded to a model database and shared between multiple users.« less
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
A Fast Code for Jupiter Atmospheric Entry Analysis

NASA Technical Reports Server (NTRS)

Yauber, Michael E.; Wercinski, Paul; Yang, Lily; Chen, Yih-Kanq

1999-01-01

A fast code was developed to calculate the forebody heating environment and heat shielding that is required for Jupiter atmospheric entry probes. A carbon phenolic heat shield material was assumed and, since computational efficiency was a major goal, analytic expressions were used, primarily, to calculate the heating, ablation and the required insulation. The code was verified by comparison with flight measurements from the Galileo probe's entry. The calculation required 3.5 sec of CPU time on a work station, or three to four orders of magnitude less than for previous Jovian entry heat shields. The computed surface recessions from ablation were compared with the flight values at six body stations. The average, absolute, predicted difference in the recession was 13.7% too high. The forebody's mass loss was overpredicted by 5.3% and the heat shield mass was calculated to be 15% less than the probe's actual heat shield. However, the calculated heat shield mass did not include contingencies for the various uncertainties that must be considered in the design of probes. Therefore, the agreement with the Galileo probe's values was satisfactory in view of the code's fast running time and the methods' approximations.
Hybrid stochastic simulation of reaction-diffusion systems with slow and fast dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Strehl, Robert; Ilie, Silvana, E-mail: silvana@ryerson.ca

2015-12-21

In this paper, we present a novel hybrid method to simulate discrete stochastic reaction-diffusion models arising in biochemical signaling pathways. We study moderately stiff systems, for which we can partition each reaction or diffusion channel into either a slow or fast subset, based on its propensity. Numerical approaches missing this distinction are often limited with respect to computational run time or approximation quality. We design an approximate scheme that remedies these pitfalls by using a new blending strategy of the well-established inhomogeneous stochastic simulation algorithm and the tau-leaping simulation method. The advantages of our hybrid simulation algorithm are demonstrated onmore » three benchmarking systems, with special focus on approximation accuracy and efficiency.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Krempasky, J.; Flechsig, U.; Korhonen, T.

Synchronous monochromator and insertion device energy scans were implemented at the Surfaces/Interfaces:Microscopy (SIM) beamline in order to provide the users fast X-ray magnetic dichroism studies (XMCD). A simple software control scheme is proposed based on a fast monochromator run-time energy readback which quickly updates the insertion device requested energy during an on-the-fly X-ray absorption scan (XAS). In this scheme the Plain Grating Monochromator (PGM) motion control, being much slower compared with the insertion device (APPLE-II type undulator), acts as a 'master' controlling the undulator 'slave' energy position. This master-slave software implementation exploits EPICS distributed device control over computer network andmore » allows for a quasi-synchronous motion control combined with data acquisition needed for the XAS or XMCD experiment.« less
Analysis and Simulation of Narrowband GPS Jamming Using Digital Excision Temporal Filtering.

DTIC Science & Technology

1994-12-01

the sequence of stored values from the P- code sampled at a 20 MHz rate. When correlated with a reference vector of the same length to simulate a GPS ...rate required for the GPS signals, (20 MHz sampling rate for the P- code signal), the personal computer (PC) used run the simulation could not perform...This subroutine is used to perform a fast FFT based 168 biased cross correlation . Written by Capt Gerry Falen, USAF, 16 AUG 94 % start of code
Fast intersection detection algorithm for PC-based robot off-line programming

NASA Astrophysics Data System (ADS)

Fedrowitz, Christian H.

1994-11-01

This paper presents a method for fast and reliable collision detection in complex production cells. The algorithm is part of the PC-based robot off-line programming system of the University of Siegen (Ropsus). The method is based on a solid model which is managed by a simplified constructive solid geometry model (CSG-model). The collision detection problem is divided in two steps. In the first step the complexity of the problem is reduced in linear time. In the second step the remaining solids are tested for intersection. For this the Simplex algorithm, which is known from linear optimization, is used. It computes a point which is common to two convex polyhedra. The polyhedra intersect, if such a point exists. Regarding the simplified geometrical model of Ropsus the algorithm runs also in linear time. In conjunction with the first step a resultant collision detection algorithm is found which requires linear time in all. Moreover it computes the resultant intersection polyhedron using the dual transformation.
Early melanoma diagnosis with mobile imaging.

PubMed

Do, Thanh-Toan; Zhou, Yiren; Zheng, Haitian; Cheung, Ngai-Man; Koh, Dawn

2014-01-01

We research a mobile imaging system for early diagnosis of melanoma. Different from previous work, we focus on smartphone-captured images, and propose a detection system that runs entirely on the smartphone. Smartphone-captured images taken under loosely-controlled conditions introduce new challenges for melanoma detection, while processing performed on the smartphone is subject to computation and memory constraints. To address these challenges, we propose to localize the skin lesion by combining fast skin detection and fusion of two fast segmentation results. We propose new features to capture color variation and border irregularity which are useful for smartphone-captured images. We also propose a new feature selection criterion to select a small set of good features used in the final lightweight system. Our evaluation confirms the effectiveness of proposed algorithms and features. In addition, we present our system prototype which computes selected visual features from a user-captured skin lesion image, and analyzes them to estimate the likelihood of malignance, all on an off-the-shelf smartphone.
"Observation Obscurer" - Time Series Viewer, Editor and Processor

NASA Astrophysics Data System (ADS)

Andronov, I. L.

The program is described, which contains a set of subroutines suitable for East viewing and interactive filtering and processing of regularly and irregularly spaced time series. Being a 32-bit DOS application, it may be used as a default fast viewer/editor of time series in any compute shell ("commander") or in Windows. It allows to view the data in the "time" or "phase" mode, to remove ("obscure") or filter outstanding bad points; to make scale transformations and smoothing using few methods (e.g. mean with phase binning, determination of the statistically opti- mal number of phase bins; "running parabola" (Andronov, 1997, As. Ap. Suppl, 125, 207) fit and to make time series analysis using some methods, e.g. correlation, autocorrelation and histogram analysis: determination of extrema etc. Some features have been developed specially for variable star observers, e.g. the barycentric correction, the creation and fast analysis of "OC" diagrams etc. The manual for "hot keys" is presented. The computer code was compiled with a 32-bit Free Pascal (www.freepascal.org).
Ground reaction forces in shallow water running are affected by immersion level, running speed and gender.

PubMed

Haupenthal, Alessandro; Fontana, Heiliane de Brito; Ruschel, Caroline; dos Santos, Daniela Pacheco; Roesler, Helio

2013-07-01

To analyze the effect of depth of immersion, running speed and gender on ground reaction forces during water running. Controlled laboratory study. Twenty adults (ten male and ten female) participated by running at two levels of immersion (hip and chest) and two speed conditions (slow and fast). Data were collected using an underwater force platform. The following variables were analyzed: vertical force peak (Fy), loading rate (LR) and anterior force peak (Fx anterior). Three-factor mixed ANOVA was used to analyze data. Significant effects of immersion level, speed and gender on Fy were observed, without interaction between factors. Fy was greater when females ran fast at the hip level. There was a significant increase in LR with a reduction in the level of immersion regardless of the speed and gender. No effect of speed or gender on LR was observed. Regarding Fx anterior, significant interaction between speed and immersion level was found: in the slow condition, participants presented greater values at chest immersion, whereas, during the fast running condition, greater values were observed at hip level. The effect of gender was only significant during fast water running, with Fx anterior being greater in the men group. Increasing speed raised Fx anterior significantly irrespective of the level of immersion and gender. The magnitude of ground reaction forces during shallow water running are affected by immersion level, running speed and gender and, for this reason, these factors should be taken into account during exercise prescription. Copyright © 2012 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Fast algorithms for transforming back and forth between a signed permutation and its equivalent simple permutation.

PubMed

Gog, Simon; Bader, Martin

2008-10-01

The problem of sorting signed permutations by reversals is a well-studied problem in computational biology. The first polynomial time algorithm was presented by Hannenhalli and Pevzner in 1995. The algorithm was improved several times, and nowadays the most efficient algorithm has a subquadratic running time. Simple permutations played an important role in the development of these algorithms. Although the latest result of Tannier et al. does not require simple permutations, the preliminary version of their algorithm as well as the first polynomial time algorithm of Hannenhalli and Pevzner use the structure of simple permutations. More precisely, the latter algorithms require a precomputation that transforms a permutation into an equivalent simple permutation. To the best of our knowledge, all published algorithms for this transformation have at least a quadratic running time. For further investigations on genome rearrangement problems, the existence of a fast algorithm for the transformation could be crucial. Another important task is the back transformation, i.e. if we have a sorting on the simple permutation, transform it into a sorting on the original permutation. Again, the naive approach results in an algorithm with quadratic running time. In this paper, we present a linear time algorithm for transforming a permutation into an equivalent simple permutation, and an O(n log n) algorithm for the back transformation of the sorting sequence.
Stochastic hybrid systems for studying biochemical processes.

PubMed

Singh, Abhyudai; Hespanha, João P

2010-11-13

Many protein and mRNA species occur at low molecular counts within cells, and hence are subject to large stochastic fluctuations in copy numbers over time. Development of computationally tractable frameworks for modelling stochastic fluctuations in population counts is essential to understand how noise at the cellular level affects biological function and phenotype. We show that stochastic hybrid systems (SHSs) provide a convenient framework for modelling the time evolution of population counts of different chemical species involved in a set of biochemical reactions. We illustrate recently developed techniques that allow fast computations of the statistical moments of the population count, without having to run computationally expensive Monte Carlo simulations of the biochemical reactions. Finally, we review different examples from the literature that illustrate the benefits of using SHSs for modelling biochemical processes.
RTSPM: real-time Linux control software for scanning probe microscopy.

PubMed

Chandrasekhar, V; Mehta, M M

2013-01-01

Real time computer control is an essential feature of scanning probe microscopes, which have become important tools for the characterization and investigation of nanometer scale samples. Most commercial (and some open-source) scanning probe data acquisition software uses digital signal processors to handle the real time data processing and control, which adds to the expense and complexity of the control software. We describe here scan control software that uses a single computer and a data acquisition card to acquire scan data. The computer runs an open-source real time Linux kernel, which permits fast acquisition and control while maintaining a responsive graphical user interface. Images from a simulated tuning-fork based microscope as well as a standard topographical sample are also presented, showing some of the capabilities of the software.
Discrete Fourier Transform Analysis in a Complex Vector Space

NASA Technical Reports Server (NTRS)

Dean, Bruce H.

2009-01-01

Alternative computational strategies for the Discrete Fourier Transform (DFT) have been developed using analysis of geometric manifolds. This approach provides a general framework for performing DFT calculations, and suggests a more efficient implementation of the DFT for applications using iterative transform methods, particularly phase retrieval. The DFT can thus be implemented using fewer operations when compared to the usual DFT counterpart. The software decreases the run time of the DFT in certain applications such as phase retrieval that iteratively call the DFT function. The algorithm exploits a special computational approach based on analysis of the DFT as a transformation in a complex vector space. As such, this approach has the potential to realize a DFT computation that approaches N operations versus Nlog(N) operations for the equivalent Fast Fourier Transform (FFT) calculation.
BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences

PubMed Central

Dahlö, Martin; Haziza, Frédéric; Kallio, Aleksi; Korpelainen, Eija; Bongcam-Rudloff, Erik; Spjuth, Ola

2015-01-01

Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org. PMID:26401099
BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences.

PubMed

Dahlö, Martin; Haziza, Frédéric; Kallio, Aleksi; Korpelainen, Eija; Bongcam-Rudloff, Erik; Spjuth, Ola

2015-01-01

Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.
The viability of ADVANTG deterministic method for synthetic radiography generation

NASA Astrophysics Data System (ADS)

Bingham, Andrew; Lee, Hyoung K.

2018-07-01

Fast simulation techniques to generate synthetic radiographic images of high resolution are helpful when new radiation imaging systems are designed. However, the standard stochastic approach requires lengthy run time with poorer statistics at higher resolution. The investigation of the viability of a deterministic approach to synthetic radiography image generation was explored. The aim was to analyze a computational time decrease over the stochastic method. ADVANTG was compared to MCNP in multiple scenarios including a small radiography system prototype, to simulate high resolution radiography images. By using ADVANTG deterministic code to simulate radiography images the computational time was found to decrease 10 to 13 times compared to the MCNP stochastic approach while retaining image quality.
Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter

2000-03-01

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.

Plasma Physics Calculations on a Parallel Macintosh Cluster

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.; Kokelaar, Pieter R.

We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 Mflops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Fast Simulation of Tsunamis in Real Time

NASA Astrophysics Data System (ADS)

Fryer, G. J.; Wang, D.; Becker, N. C.; Weinstein, S. A.; Walsh, D.

2011-12-01

The U.S. Tsunami Warning Centers primarily base their wave height forecasts on precomputed tsunami scenarios, such as the SIFT model (Standby Inundation Forecasting of Tsunamis) developed by NOAA's Center for Tsunami Research. In SIFT, tsunami simulations for about 1600 individual earthquake sources, each 100x50 km, define shallow subduction worldwide. These simulations are stored in a database and combined linearly to make up the tsunami from any great earthquake. Precomputation is necessary because the nonlinear shallow-water wave equations are too time consuming to compute during an event. While such scenario-based models are valuable, they tacitly assume all energy in a tsunami comes from thrust at the décollement. The thrust assumption is often violated (e.g., 1933 Sanriku, 2007 Kurils, 2009 Samoa), while a significant number of tsunamigenic earthquakes are completely unrelated to subduction (e.g., 1812 Santa Barbara, 1939 Accra, 1975 Kalapana). Finally, parts of some subduction zones are so poorly defined that precomputations may be of little value (e.g., 1762 Arakan, 1755 Lisbon). For all such sources, a fast means of estimating tsunami size is essential. At the Pacific Tsunami Warning Center, we have been using our model RIFT (Real-time Inundation Forecasting of Tsunamis) experimentally for two years. RIFT is fast by design: it solves only the linearized form of the equations. At 4 arc-minutes resolution calculations for the entire Pacific take just a few minutes on an 8-processor Linux box. Part of the rationale for developing RIFT was earthquakes of M 7.8 or smaller, which approach the lower limit of the more complex SIFT's abilities. For such events we currently issue a fixed warning to areas within 1,000 km of the source, which typically means a lot of over-warning. With sources defined by W-phase CMTs, exhaustive comparison with runup data shows that we can reduce the warning area significantly. Even before CMTs are available, we routinely run models based on the local tectonics, which provide a useful first estimate of the tsunami. Our runup comparisons show that Green's Law (i.e., 1-D runup estimates) works very well indeed, especially if computations are run at 2 arc-minutes. We are developing an experimental RIFT-based product showing expected runups on open coasts. While these will necessarily be rather crude they will be a great help to emergency managers trying to assess the hazard. RIFT is typically run using a single source, but it can already handle multiple sources. In particular, it can handle multiple sources of different orientations such as 1993 Okushiri, or the décollement-splay combinations to be expected during major earthquakes in accretionary margins such as Nankai, Cascadia, and Middle America. As computers get faster and the number-crunching burden is off-loaded to GPUs, we are convinced there will still be a use for a fast, linearized, modeling capability. Rather than applying scaling laws to a CMT, or distributing slip over 100x50 km sub-faults, for example, it would be preferable to model tsunamis using the output from a finite-fault analysis. To accomplish such a compute-bound task fast enough for warning purposes will demand a rapid, approximate technique like RIFT.
qtcm 0.1.2: A Python Implementation of the Neelin-Zeng Quasi-Equilibrium Tropical Circulation model

NASA Astrophysics Data System (ADS)

Lin, J. W.-B.

2008-10-01

Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiled languages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionality available with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Python to create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran to optimize model performance, but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone, and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.
qtcm 0.1.2: a Python implementation of the Neelin-Zeng Quasi-Equilibrium Tropical Circulation Model

NASA Astrophysics Data System (ADS)

Lin, J. W.-B.

2009-02-01

Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiled languages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionality available with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Python to create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran to optimize model performance, but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone, and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.
A Python Implementation of an Intermediate-Level Tropical Circulation Model and Implications for How Modeling Science is Done

NASA Astrophysics Data System (ADS)

Lin, J. W. B.

2015-12-01

Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiledlanguages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionalityavailable with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Pythonto create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran, to optimize model performance but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.
Fast computation of high energy elastic collision scattering angle for electric propulsion plume simulation

NASA Astrophysics Data System (ADS)

Araki, Samuel J.

2016-11-01

In the plumes of Hall thrusters and ion thrusters, high energy ions experience elastic collisions with slow neutral atoms. These collisions involve a process of momentum exchange, altering the initial velocity vectors of the collision pair. In addition to the momentum exchange process, ions and atoms can exchange electrons, resulting in slow charge-exchange ions and fast atoms. In these simulations, it is particularly important to accurately perform computations of ion-atom elastic collisions in determining the plume current profile and assessing the integration of spacecraft components. The existing models are currently capable of accurate calculation but are not fast enough such that the calculation can be a bottleneck of plume simulations. This study investigates methods to accelerate an ion-atom elastic collision calculation that includes both momentum- and charge-exchange processes. The scattering angles are pre-computed through a classical approach with ab initio spin-orbit free potential and are stored in a two-dimensional array as functions of impact parameter and energy. When performing a collision calculation for an ion-atom pair, the scattering angle is computed by a table lookup and multiple linear interpolations, given the relative energy and randomly determined impact parameter. In order to further accelerate the calculations, the number of collision calculations is reduced by properly defining two cut-off cross-sections for the elastic scattering. In the MCC method, the target atom needs to be sampled; however, it is confirmed that initial target atom velocity does not play a significant role in typical electric propulsion plume simulations such that the sampling process is unnecessary. With these implementations, the computational run-time to perform a collision calculation is reduced significantly compared to previous methods, while retaining the accuracy of the high fidelity models.
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment

PubMed Central

2013-01-01

Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

PubMed

Nagar, Anurag; Hahsler, Michael

2013-01-01

Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.
hybrid\\scriptsize{{MANTIS}}: a CPU-GPU Monte Carlo method for modeling indirect x-ray detectors with columnar scintillators

NASA Astrophysics Data System (ADS)

Sharma, Diksha; Badal, Andreu; Badano, Aldo

2012-04-01

The computational modeling of medical imaging systems often requires obtaining a large number of simulated images with low statistical uncertainty which translates into prohibitive computing times. We describe a novel hybrid approach for Monte Carlo simulations that maximizes utilization of CPUs and GPUs in modern workstations. We apply the method to the modeling of indirect x-ray detectors using a new and improved version of the code \\scriptsize{{MANTIS}}, an open source software tool used for the Monte Carlo simulations of indirect x-ray imagers. We first describe a GPU implementation of the physics and geometry models in fast\\scriptsize{{DETECT}}2 (the optical transport model) and a serial CPU version of the same code. We discuss its new features like on-the-fly column geometry and columnar crosstalk in relation to the \\scriptsize{{MANTIS}} code, and point out areas where our model provides more flexibility for the modeling of realistic columnar structures in large area detectors. Second, we modify \\scriptsize{{PENELOPE}} (the open source software package that handles the x-ray and electron transport in \\scriptsize{{MANTIS}}) to allow direct output of location and energy deposited during x-ray and electron interactions occurring within the scintillator. This information is then handled by optical transport routines in fast\\scriptsize{{DETECT}}2. A load balancer dynamically allocates optical transport showers to the GPU and CPU computing cores. Our hybrid\\scriptsize{{MANTIS}} approach achieves a significant speed-up factor of 627 when compared to \\scriptsize{{MANTIS}} and of 35 when compared to the same code running only in a CPU instead of a GPU. Using hybrid\\scriptsize{{MANTIS}}, we successfully hide hours of optical transport time by running it in parallel with the x-ray and electron transport, thus shifting the computational bottleneck from optical to x-ray transport. The new code requires much less memory than \\scriptsize{{MANTIS}} and, as a result, allows us to efficiently simulate large area detectors.
Aggregated channels network for real-time pedestrian detection

NASA Astrophysics Data System (ADS)

Ghorban, Farzin; Marín, Javier; Su, Yu; Colombo, Alessandro; Kummert, Anton

2018-04-01

Convolutional neural networks (CNNs) have demonstrated their superiority in numerous computer vision tasks, yet their computational cost results prohibitive for many real-time applications such as pedestrian detection which is usually performed on low-consumption hardware. In order to alleviate this drawback, most strategies focus on using a two-stage cascade approach. Essentially, in the first stage a fast method generates a significant but reduced amount of high quality proposals that later, in the second stage, are evaluated by the CNN. In this work, we propose a novel detection pipeline that further benefits from the two-stage cascade strategy. More concretely, the enriched and subsequently compressed features used in the first stage are reused as the CNN input. As a consequence, a simpler network architecture, adapted for such small input sizes, allows to achieve real-time performance and obtain results close to the state-of-the-art while running significantly faster without the use of GPU. In particular, considering that the proposed pipeline runs in frame rate, the achieved performance is highly competitive. We furthermore demonstrate that the proposed pipeline on itself can serve as an effective proposal generator.
Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics

NASA Astrophysics Data System (ADS)

Junghans, Christoph; Mniszewski, Susan; Voter, Arthur; Perez, Danny; Eidenbenz, Stephan

2014-03-01

We present an example of a new class of tools that we call application simulators, parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation (PDES). We demonstrate our approach with a TADSim application simulator that models the Temperature Accelerated Dynamics (TAD) method, which is an algorithmically complex member of the Accelerated Molecular Dynamics (AMD) family. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We further extend TADSim to model algorithm extensions to standard TAD, such as speculative spawning of the compute-bound stages of the algorithm, and predict performance improvements without having to implement such a method. Focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights into the TAD algorithm behavior and suggested extensions to the TAD method.
GPU real-time processing in NA62 trigger system

NASA Astrophysics Data System (ADS)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-01-01

A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.
Propulsion strategy in the gait of primary school children; the effect of age and speed.

PubMed

Lye, Jillian; Parkinson, Stephanie; Diamond, Nicola; Downs, Jenny; Morris, Susan

2016-12-01

The strategy used to generate power for forward propulsion in walking and running has recently been highlighted as a marker of gait maturation and elastic energy recycling. This study investigated ankle and hip power generation as a propulsion strategy (PS) during the late stance/early swing phases of walking and running in typically developing (TD) children (15: six to nine years; 17: nine to 13years) using three-dimensional gait analysis. Peak ankle power generation at push-off (peakA2), peak hip power generation in early swing (peakH3) and propulsion strategy (PS) [peakA2/(peakA2+peakH3)] were calculated to provide the relative contribution of ankle power to total propulsion. Mean PS values decreased as speed increased for comfortable walking (p<0.001), fast walking (p<0.001) and fast running (p<0.001), and less consistently during jogging (p=0.054). PS varied with age (p<0.001) only during fast walking. At any speed of fast walking, older children generated more peakA2 (p=0.001) and less peakH3 (p=0.001) than younger children. While the kinetics of running propulsion appear to be developed by age six years, the skills of fast walking appeared to require additional neuromuscular maturity. These findings support the concept that running is a skill that matures early for TD children. Copyright © 2016 Elsevier B.V. All rights reserved.
Inferring muscle functional roles of the ostrich pelvic limb during walking and running using computer optimization.

PubMed

Rankin, Jeffery W; Rubenson, Jonas; Hutchinson, John R

2016-05-01

Owing to their cursorial background, ostriches (Struthio camelus) walk and run with high metabolic economy, can reach very fast running speeds and quickly execute cutting manoeuvres. These capabilities are believed to be a result of their ability to coordinate muscles to take advantage of specialized passive limb structures. This study aimed to infer the functional roles of ostrich pelvic limb muscles during gait. Existing gait data were combined with a newly developed musculoskeletal model to generate simulations of ostrich walking and running that predict muscle excitations, force and mechanical work. Consistent with previous avian electromyography studies, predicted excitation patterns showed that individual muscles tended to be excited primarily during only stance or swing. Work and force estimates show that ostrich gaits are partially hip-driven with the bi-articular hip-knee muscles driving stance mechanics. Conversely, the knee extensors acted as brakes, absorbing energy. The digital extensors generated large amounts of both negative and positive mechanical work, with increased magnitudes during running, providing further evidence that ostriches make extensive use of tendinous elastic energy storage to improve economy. The simulations also highlight the need to carefully consider non-muscular soft tissues that may play a role in ostrich gait. © 2016 The Authors.
Muscular strategy shift in human running: dependence of running speed on hip and ankle muscle performance.

PubMed

Dorn, Tim W; Schache, Anthony G; Pandy, Marcus G

2012-06-01

Humans run faster by increasing a combination of stride length and stride frequency. In slow and medium-paced running, stride length is increased by exerting larger support forces during ground contact, whereas in fast running and sprinting, stride frequency is increased by swinging the legs more rapidly through the air. Many studies have investigated the mechanics of human running, yet little is known about how the individual leg muscles accelerate the joints and centre of mass during this task. The aim of this study was to describe and explain the synergistic actions of the individual leg muscles over a wide range of running speeds, from slow running to maximal sprinting. Experimental gait data from nine subjects were combined with a detailed computer model of the musculoskeletal system to determine the forces developed by the leg muscles at different running speeds. For speeds up to 7 m s(-1), the ankle plantarflexors, soleus and gastrocnemius, contributed most significantly to vertical support forces and hence increases in stride length. At speeds greater than 7 m s(-1), these muscles shortened at relatively high velocities and had less time to generate the forces needed for support. Thus, above 7 m s(-1), the strategy used to increase running speed shifted to the goal of increasing stride frequency. The hip muscles, primarily the iliopsoas, gluteus maximus and hamstrings, achieved this goal by accelerating the hip and knee joints more vigorously during swing. These findings provide insight into the strategies used by the leg muscles to maximise running performance and have implications for the design of athletic training programs.
Climbing favours the tripod gait over alternative faster insect gaits

NASA Astrophysics Data System (ADS)

Ramdya, Pavan; Thandiackal, Robin; Cherney, Raphael; Asselborn, Thibault; Benton, Richard; Ijspeert, Auke Jan; Floreano, Dario

2017-02-01

To escape danger or catch prey, running vertebrates rely on dynamic gaits with minimal ground contact. By contrast, most insects use a tripod gait that maintains at least three legs on the ground at any given time. One prevailing hypothesis for this difference in fast locomotor strategies is that tripod locomotion allows insects to rapidly navigate three-dimensional terrain. To test this, we computationally discovered fast locomotor gaits for a model based on Drosophila melanogaster. Indeed, the tripod gait emerges to the exclusion of many other possible gaits when optimizing fast upward climbing with leg adhesion. By contrast, novel two-legged bipod gaits are fastest on flat terrain without adhesion in the model and in a hexapod robot. Intriguingly, when adhesive leg structures in real Drosophila are covered, animals exhibit atypical bipod-like leg coordination. We propose that the requirement to climb vertical terrain may drive the prevalence of the tripod gait over faster alternative gaits with minimal ground contact.
Climbing favours the tripod gait over alternative faster insect gaits

PubMed Central

Ramdya, Pavan; Thandiackal, Robin; Cherney, Raphael; Asselborn, Thibault; Benton, Richard; Ijspeert, Auke Jan; Floreano, Dario

2017-01-01

To escape danger or catch prey, running vertebrates rely on dynamic gaits with minimal ground contact. By contrast, most insects use a tripod gait that maintains at least three legs on the ground at any given time. One prevailing hypothesis for this difference in fast locomotor strategies is that tripod locomotion allows insects to rapidly navigate three-dimensional terrain. To test this, we computationally discovered fast locomotor gaits for a model based on Drosophila melanogaster. Indeed, the tripod gait emerges to the exclusion of many other possible gaits when optimizing fast upward climbing with leg adhesion. By contrast, novel two-legged bipod gaits are fastest on flat terrain without adhesion in the model and in a hexapod robot. Intriguingly, when adhesive leg structures in real Drosophila are covered, animals exhibit atypical bipod-like leg coordination. We propose that the requirement to climb vertical terrain may drive the prevalence of the tripod gait over faster alternative gaits with minimal ground contact. PMID:28211509
Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

PubMed

Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

2015-01-01

To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.
EMHP: an accurate automated hole masking algorithm for single-particle cryo-EM image processing.

PubMed

Berndsen, Zachary; Bowman, Charles; Jang, Haerin; Ward, Andrew B

2017-12-01

The Electron Microscopy Hole Punch (EMHP) is a streamlined suite of tools for quick assessment, sorting and hole masking of electron micrographs. With recent advances in single-particle electron cryo-microscopy (cryo-EM) data processing allowing for the rapid determination of protein structures using a smaller computational footprint, we saw the need for a fast and simple tool for data pre-processing that could run independent of existing high-performance computing (HPC) infrastructures. EMHP provides a data preprocessing platform in a small package that requires minimal python dependencies to function. https://www.bitbucket.org/chazbot/emhp Apache 2.0 License. bowman@scripps.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Automatic computation of 2D cardiac measurements from B-mode echocardiography

NASA Astrophysics Data System (ADS)

Park, JinHyeong; Feng, Shaolei; Zhou, S. Kevin

2012-03-01

We propose a robust and fully automatic algorithm which computes the 2D echocardiography measurements recommended by America Society of Echocardiography. The algorithm employs knowledge-based imaging technologies which can learn the expert's knowledge from the training images and expert's annotation. Based on the models constructed from the learning stage, the algorithm searches initial location of the landmark points for the measurements by utilizing heart structure of left ventricle including mitral valve aortic valve. It employs the pseudo anatomic M-mode image generated by accumulating the line images in 2D parasternal long axis view along the time to refine the measurement landmark points. The experiment results with large volume of data show that the algorithm runs fast and is robust comparable to expert.

Inverse dynamics of adaptive structures used as space cranes

NASA Technical Reports Server (NTRS)

Das, S. K.; Utku, S.; Wada, B. K.

1990-01-01

As a precursor to the real-time control of fast moving adaptive structures used as space cranes, a formulation is given for the flexibility induced motion relative to the nominal motion (i.e., the motion that assumes no flexibility) and for obtaining the open loop time varying driving forces. An algorithm is proposed for the computation of the relative motion and driving forces. The governing equations are given in matrix form with explicit functional dependencies. A simulator is developed to implement the algorithm on a digital computer. In the formulations, the distributed mass of the crane is lumped by two schemes, vz., 'trapezoidal' lumping and 'Simpson's rule' lumping. The effects of the mass lumping schemes are shown by simulator runs.
On the Relevancy of Efficient, Integrated Computer and Network Monitoring in HEP Distributed Online Environment

NASA Astrophysics Data System (ADS)

Carvalho, D.; Gavillet, Ph.; Delgado, V.; Albert, J. N.; Bellas, N.; Javello, J.; Miere, Y.; Ruffinoni, D.; Smith, G.

Large Scientific Equipments are controlled by Computer Systems whose complexity is growing driven, on the one hand by the volume and variety of the information, its distributed nature, the sophistication of its treatment and, on the other hand by the fast evolution of the computer and network market. Some people call them genetically Large-Scale Distributed Data Intensive Information Systems or Distributed Computer Control Systems (DCCS) for those systems dealing more with real time control. Taking advantage of (or forced by) the distributed architecture, the tasks are more and more often implemented as Client-Server applications. In this framework the monitoring of the computer nodes, the communications network and the applications becomes of primary importance for ensuring the safe running and guaranteed performance of the system. With the future generation of HEP experiments, such as those at the LHC in view, it is proposed to integrate the various functions of DCCS monitoring into one general purpose Multi-layer System.
Evaluation of nonlinear structural dynamic responses using a fast-running spring-mass formulation

NASA Astrophysics Data System (ADS)

Benjamin, A. S.; Altman, B. S.; Gruda, J. D.

In today's world, accurate finite-element simulations of large nonlinear systems may require meshes composed of hundreds of thousands of degrees of freedom. Even with today's fast computers and the promise of ever-faster ones in the future, central processing unit (CPU) expenditures for such problems could be measured in days. Many contemporary engineering problems, such as those found in risk assessment, probabilistic structural analysis, and structural design optimization, cannot tolerate the cost or turnaround time for such CPU-intensive analyses, because these applications require a large number of cases to be run with different inputs. For many risk assessment applications, analysts would prefer running times to be measurable in minutes. There is therefore a need for approximation methods which can solve such problems far more efficiently than the very detailed methods and yet maintain an acceptable degree of accuracy. For this purpose, we have been working on two methods of approximation: neural networks and spring-mass models. This paper presents our work and results to date for spring-mass modeling and analysis, since we are further along in this area than in the neural network formulation. It describes the physical and numerical models contained in a code we developed called STRESS, which stands for 'Spring-mass Transient Response Evaluation for structural Systems'. The paper also presents results for a demonstration problem, and compares these with results obtained for the same problem using PRONTO3D, a state-of-the-art finite element code which was also developed at Sandia.
A chest-shape target automatic detection method based on Deformable Part Models

NASA Astrophysics Data System (ADS)

Zhang, Mo; Jin, Weiqi; Li, Li

2016-10-01

Automatic weapon platform is one of the important research directions at domestic and overseas, it needs to accomplish fast searching for the object to be shot under complex background. Therefore, fast detection for given target is the foundation of further task. Considering that chest-shape target is common target of shoot practice, this paper treats chestshape target as the target and studies target automatic detection method based on Deformable Part Models. The algorithm computes Histograms of Oriented Gradient(HOG) features of the target and trains a model using Latent variable Support Vector Machine(SVM); In this model, target image is divided into several parts then we can obtain foot filter and part filters; Finally, the algorithm detects the target at the HOG features pyramid with method of sliding window. The running time of extracting HOG pyramid with lookup table can be shorten by 36%. The result indicates that this algorithm can detect the chest-shape target in natural environments indoors or outdoors. The true positive rate of detection reaches 76% with many hard samples, and the false positive rate approaches 0. Running on a PC (Intel(R)Core(TM) i5-4200H CPU) with C++ language, the detection time of images with the resolution of 640 × 480 is 2.093s. According to TI company run library about image pyramid and convolution for DM642 and other hardware, our detection algorithm is expected to be implemented on hardware platform, and it has application prospect in actual system.
FAST - FREEDOM ASSEMBLY SEQUENCING TOOL PROTOTYPE

NASA Technical Reports Server (NTRS)

Borden, C. S.

1994-01-01

FAST is a project management tool designed to optimize the assembly sequence of Space Station Freedom. An appropriate assembly sequence coordinates engineering, design, utilization, transportation availability, and operations requirements. Since complex designs tend to change frequently, FAST assesses the system level effects of detailed changes and produces output metrics that identify preferred assembly sequences. FAST incorporates Space Shuttle integration, Space Station hardware, on-orbit operations, and programmatic drivers as either precedence relations or numerical data. Hardware sequencing information can either be input directly and evaluated via the "specified" mode of operation or evaluated from the input precedence relations in the "flexible" mode. In the specified mode, FAST takes as its input a list of the cargo elements assigned to each flight. The program determines positions for the cargo elements that maximize the center of gravity (c.g.) margin. These positions are restricted by the geometry of the cargo elements and the location of attachment fittings both in the orbiter and on the cargo elements. FAST calculates every permutation of cargo element location according to its height, trunnion fitting locations, and required intercargo element spacing. Each cargo element is tested in both its normal and reversed orientation (rotated 180 degrees). The best solution is that which maximizes the c.g. margin for each flight. In the flexible mode, FAST begins with the first flight and determines all feasible combinations of cargo elements according to mass, volume, EVA, and precedence relation constraints. The program generates an assembly sequence that meets mass, volume, position, EVA, and precedence constraints while minimizing the total number of Shuttle flights required. Issues associated with ground operations, spacecraft performance, logistics requirements and user requirements will be addressed in future versions of the model. FAST is written in C-Language and has been implemented on DEC VAX series computers running VMS. The program is distributed in executable form. The source code is also provided, but it cannot be compiled without the Tree Manipulation Based Routines (TMBR) package from the Jet Propulsion Laboratory, which is not currently available from COSMIC. The main memory requirement is based on the data used to drive the FAST program. All applications should easily run on an installation with 10Mb of main memory. FAST was developed in 1990 and is a copyrighted work with all copyright vested in NASA. DEC, VAX and VMS are trademarks of Digital Equipment Corporation.
FAST User Guide

NASA Technical Reports Server (NTRS)

Walatka, Pamela P.; Clucas, Jean; McCabe, R. Kevin; Plessel, Todd; Potter, R.; Cooper, D. M. (Technical Monitor)

1994-01-01

The Flow Analysis Software Toolkit, FAST, is a software environment for visualizing data. FAST is a collection of separate programs (modules) that run simultaneously and allow the user to examine the results of numerical and experimental simulations. The user can load data files, perform calculations on the data, visualize the results of these calculations, construct scenes of 3D graphical objects, and plot, animate and record the scenes. Computational Fluid Dynamics (CFD) visualization is the primary intended use of FAST, but FAST can also assist in the analysis of other types of data. FAST combines the capabilities of such programs as PLOT3D, RIP, SURF, and GAS into one environment with modules that share data. Sharing data between modules eliminates the drudgery of transferring data between programs. All the modules in the FAST environment have a consistent, highly interactive graphical user interface. Most commands are entered by pointing and'clicking. The modular construction of FAST makes it flexible and extensible. The environment can be custom configured and new modules can be developed and added as needed. The following modules have been developed for FAST: VIEWER, FILE IO, CALCULATOR, SURFER, TOPOLOGY, PLOTTER, TITLER, TRACER, ARCGRAPH, GQ, SURFERU, SHOTET, and ISOLEVU. A utility is also included to make the inclusion of user defined modules in the FAST environment easy. The VIEWER module is the central control for the FAST environment. From VIEWER, the user can-change object attributes, interactively position objects in three-dimensional space, define and save scenes, create animations, spawn new FAST modules, add additional view windows, and save and execute command scripts. The FAST User Guide uses text and FAST MAPS (graphical representations of the entire user interface) to guide the user through the use of FAST. Chapters include: Maps, Overview, Tips, Getting Started Tutorial, a separate chapter for each module, file formats, and system administration.
FALCON: a toolbox for the fast contextualization of logical networks

PubMed Central

De Landtsheer, Sébastien; Trairatphisan, Panuwat; Lucarelli, Philippe; Sauter, Thomas

2017-01-01

Abstract Motivation Mathematical modelling of regulatory networks allows for the discovery of knowledge at the system level. However, existing modelling tools are often computation-heavy and do not offer intuitive ways to explore the model, to test hypotheses or to interpret the results biologically. Results We have developed a computational approach to contextualize logical models of regulatory networks with biological measurements based on a probabilistic description of rule-based interactions between the different molecules. Here, we propose a Matlab toolbox, FALCON, to automatically and efficiently build and contextualize networks, which includes a pipeline for conducting parameter analysis, knockouts and easy and fast model investigation. The contextualized models could then provide qualitative and quantitative information about the network and suggest hypotheses about biological processes. Availability and implementation FALCON is freely available for non-commercial users on GitHub under the GPLv3 licence. The toolbox, installation instructions, full documentation and test datasets are available at https://github.com/sysbiolux/FALCON. FALCON runs under Matlab (MathWorks) and requires the Optimization Toolbox. Contact thomas.sauter@uni.lu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28673016
FALCON: a toolbox for the fast contextualization of logical networks.

PubMed

De Landtsheer, Sébastien; Trairatphisan, Panuwat; Lucarelli, Philippe; Sauter, Thomas

2017-11-01

Mathematical modelling of regulatory networks allows for the discovery of knowledge at the system level. However, existing modelling tools are often computation-heavy and do not offer intuitive ways to explore the model, to test hypotheses or to interpret the results biologically. We have developed a computational approach to contextualize logical models of regulatory networks with biological measurements based on a probabilistic description of rule-based interactions between the different molecules. Here, we propose a Matlab toolbox, FALCON, to automatically and efficiently build and contextualize networks, which includes a pipeline for conducting parameter analysis, knockouts and easy and fast model investigation. The contextualized models could then provide qualitative and quantitative information about the network and suggest hypotheses about biological processes. FALCON is freely available for non-commercial users on GitHub under the GPLv3 licence. The toolbox, installation instructions, full documentation and test datasets are available at https://github.com/sysbiolux/FALCON. FALCON runs under Matlab (MathWorks) and requires the Optimization Toolbox. thomas.sauter@uni.lu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Real-time machine vision system using FPGA and soft-core processor

NASA Astrophysics Data System (ADS)

Malik, Abdul Waheed; Thörnberg, Benny; Meng, Xiaozhou; Imran, Muhammad

2012-06-01

This paper presents a machine vision system for real-time computation of distance and angle of a camera from reference points in the environment. Image pre-processing, component labeling and feature extraction modules were modeled at Register Transfer (RT) level and synthesized for implementation on field programmable gate arrays (FPGA). The extracted image component features were sent from the hardware modules to a soft-core processor, MicroBlaze, for computation of distance and angle. A CMOS imaging sensor operating at a clock frequency of 27MHz was used in our experiments to produce a video stream at the rate of 75 frames per second. Image component labeling and feature extraction modules were running in parallel having a total latency of 13ms. The MicroBlaze was interfaced with the component labeling and feature extraction modules through Fast Simplex Link (FSL). The latency for computing distance and angle of camera from the reference points was measured to be 2ms on the MicroBlaze, running at 100 MHz clock frequency. In this paper, we present the performance analysis, device utilization and power consumption for the designed system. The FPGA based machine vision system that we propose has high frame speed, low latency and a power consumption that is much lower compared to commercially available smart camera solutions.
Massively Parallel Processing for Fast and Accurate Stamping Simulations

NASA Astrophysics Data System (ADS)

Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

2005-08-01

The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
High-Throughput Computation and the Applicability of Monte Carlo Integration in Fatigue Load Estimation of Floating Offshore Wind Turbines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graf, Peter A.; Stewart, Gordon; Lackner, Matthew

Long-term fatigue loads for floating offshore wind turbines are hard to estimate because they require the evaluation of the integral of a highly nonlinear function over a wide variety of wind and wave conditions. Current design standards involve scanning over a uniform rectangular grid of metocean inputs (e.g., wind speed and direction and wave height and period), which becomes intractable in high dimensions as the number of required evaluations grows exponentially with dimension. Monte Carlo integration offers a potentially efficient alternative because it has theoretical convergence proportional to the inverse of the square root of the number of samples, whichmore » is independent of dimension. In this paper, we first report on the integration of the aeroelastic code FAST into NREL's systems engineering tool, WISDEM, and the development of a high-throughput pipeline capable of sampling from arbitrary distributions, running FAST on a large scale, and postprocessing the results into estimates of fatigue loads. Second, we use this tool to run a variety of studies aimed at comparing grid-based and Monte Carlo-based approaches with calculating long-term fatigue loads. We observe that for more than a few dimensions, the Monte Carlo approach can represent a large improvement in computational efficiency, but that as nonlinearity increases, the effectiveness of Monte Carlo is correspondingly reduced. The present work sets the stage for future research focusing on using advanced statistical methods for analysis of wind turbine fatigue as well as extreme loads.« less
A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation

NASA Astrophysics Data System (ADS)

Cho, Min Hyung; Cai, Wei

2010-12-01

A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.
A static data flow simulation study at Ames Research Center

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Howard, Lauri S.

1987-01-01

Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.
Slow Orbit Feedback at the ALS Using Matlab

DOE Office of Scientific and Technical Information (OSTI.GOV)

Portmann, G.

1999-03-25

The third generation Advanced Light Source (ALS) produces extremely bright and finely focused photon beams using undulatory, wigglers, and bend magnets. In order to position the photon beams accurately, a slow global orbit feedback system has been developed. The dominant causes of orbit motion at the ALS are temperature variation and insertion device motion. This type of motion can be removed using slow global orbit feedback with a data rate of a few Hertz. The remaining orbit motion in the ALS is only 1-3 micron rms. Slow orbit feedback does not require high computational throughput. At the ALS, the globalmore » orbit feedback algorithm, based on the singular valued decomposition method, is coded in MATLAB and runs on a control room workstation. Using the MATLAB environment to develop, test, and run the storage ring control algorithms has proven to be a fast and efficient way to operate the ALS.« less
FAST: A fully asynchronous and status-tracking pattern for geoprocessing services orchestration

NASA Astrophysics Data System (ADS)

Wu, Huayi; You, Lan; Gui, Zhipeng; Gao, Shuang; Li, Zhenqiang; Yu, Jingmin

2014-09-01

Geoprocessing service orchestration (GSO) provides a unified and flexible way to implement cross-application, long-lived, and multi-step geoprocessing service workflows by coordinating geoprocessing services collaboratively. Usually, geoprocessing services and geoprocessing service workflows are data and/or computing intensive. The intensity feature may make the execution process of a workflow time-consuming. Since it initials an execution request without blocking other interactions on the client side, an asynchronous mechanism is especially appropriate for GSO workflows. Many critical problems remain to be solved in existing asynchronous patterns for GSO including difficulties in improving performance, status tracking, and clarifying the workflow structure. These problems are a challenge when orchestrating performance efficiency, making statuses instantly available, and constructing clearly structured GSO workflows. A Fully Asynchronous and Status-Tracking (FAST) pattern that adopts asynchronous interactions throughout the whole communication tier of a workflow is proposed for GSO. The proposed FAST pattern includes a mechanism that actively pushes the latest status to clients instantly and economically. An independent proxy was designed to isolate the status tracking logic from the geoprocessing business logic, which assists the formation of a clear GSO workflow structure. A workflow was implemented in the FAST pattern to simulate the flooding process in the Poyang Lake region. Experimental results show that the proposed FAST pattern can efficiently tackle data/computing intensive geoprocessing tasks. The performance of all collaborative partners was improved due to the asynchronous mechanism throughout communication tier. A status-tracking mechanism helps users retrieve the latest running status of a GSO workflow in an efficient and instant way. The clear structure of the GSO workflow lowers the barriers for geospatial domain experts and model designers to compose asynchronous GSO workflows. Most importantly, it provides better support for locating and diagnosing potential exceptions.
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

NASA Astrophysics Data System (ADS)

Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Distributed user interfaces for clinical ubiquitous computing applications.

PubMed

Bång, Magnus; Larsson, Anders; Berglund, Erik; Eriksson, Henrik

2005-08-01

Ubiquitous computing with multiple interaction devices requires new interface models that support user-specific modifications to applications and facilitate the fast development of active workspaces. We have developed NOSTOS, a computer-augmented work environment for clinical personnel to explore new user interface paradigms for ubiquitous computing. NOSTOS uses several devices such as digital pens, an active desk, and walk-up displays that allow the system to track documents and activities in the workplace. We present the distributed user interface (DUI) model that allows standalone applications to distribute their user interface components to several devices dynamically at run-time. This mechanism permit clinicians to develop their own user interfaces and forms to clinical information systems to match their specific needs. We discuss the underlying technical concepts of DUIs and show how service discovery, component distribution, events and layout management are dealt with in the NOSTOS system. Our results suggest that DUIs--and similar network-based user interfaces--will be a prerequisite of future mobile user interfaces and essential to develop clinical multi-device environments.
CAMAC throughput of a new RISC-based data acquisition computer at the DIII-D tokamak

NASA Astrophysics Data System (ADS)

Vanderlaan, J. F.; Cummings, J. W.

1993-10-01

The amount of experimental data acquired per plasma discharge at DIII-D has continued to grow. The largest shot size in May 1991 was 49 Mbyte; in May 1992, 66 Mbyte; and in April 1993, 80 Mbyte. The increasing load has prompted the installation of a new Motorola 88100-based MODCOMP computer to supplement the existing core of three older MODCOMP data acquisition CPU's. New Kinetic Systems CAMAC serial highway driver hardware runs on the 88100 VME bus. The new operating system is MODCOMP REAL/IX version of AT&T System V UNIX with real-time extensions and networking capabilities; future plans call for installation of additional computers of this type for tokamak and neutral beam control functions. Experiences with the CAMAC hardware and software will be chronicled, including observation of data throughput. The Enhanced Serial Highway crate controller is advertised as twice as fast as the previous crate controller, and computer I/O speeds are expected to also increase data rates.

Convolutional networks for fast, energy-efficient neuromorphic computing

PubMed Central

Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.

2016-01-01

Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489
Convolutional networks for fast, energy-efficient neuromorphic computing.

PubMed

Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S

2016-10-11

Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Fast katz and commuters : efficient estimation of social relatedness in large networks.

DOE Office of Scientific and Technical Information (OSTI.GOV)

On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less
Geowall: Investigations into low-cost stereo display technologies

USGS Publications Warehouse

Steinwand, Daniel R.; Davis, Brian; Weeks, Nathan

2003-01-01

Recently, the combination of new projection technology, fast, low-cost graphics cards, and Linux-powered personal computers has made it possible to provide a stereoprojection and stereoviewing system that is much more affordable than previous commercial solutions. These Geowall systems are low-cost visualization systems built with commodity off-the-shelf components, run on open-source (and other) operating systems, and using open-source applications software. In short, they are ?Beowulf-class? visualization systems that provide a cost-effective way for the U. S. Geological Survey to broaden participation in the visualization community and view stereoimagery and three-dimensional models2.
Sensor Authentication: Embedded Processor Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Svoboda, John

2012-09-25

Described is the c code running on the embedded Microchip 32bit PIC32MX575F256H located on the INL developed noise analysis circuit board. The code performs the following functions: Controls the noise analysis circuit board preamplifier voltage gains of 1, 10, 100, 000 Initializes the analog to digital conversion hardware, input channel selection, Fast Fourier Transform (FFT) function, USB communications interface, and internal memory allocations Initiates high resolution 4096 point 200 kHz data acquisition Computes complex 2048 point FFT and FFT magnitude. Services Host command set Transfers raw data to Host Transfers FFT result to host Communication error checking
PconsD: ultra rapid, accurate model quality assessment for protein structure prediction.

PubMed

Skwark, Marcin J; Elofsson, Arne

2013-07-15

Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy. The source code for PconsD is freely available at http://d.pcons.net/. Supplementary benchmarking data are also available there. arne@bioinfo.se Supplementary data are available at Bioinformatics online.
A fast, robust and tunable synthetic gene oscillator.

PubMed

Stricker, Jesse; Cookson, Scott; Bennett, Matthew R; Mather, William H; Tsimring, Lev S; Hasty, Jeff

2008-11-27

One defining goal of synthetic biology is the development of engineering-based approaches that enable the construction of gene-regulatory networks according to 'design specifications' generated from computational modelling. This approach provides a systematic framework for exploring how a given regulatory network generates a particular phenotypic behaviour. Several fundamental gene circuits have been developed using this approach, including toggle switches and oscillators, and these have been applied in new contexts such as triggered biofilm development and cellular population control. Here we describe an engineered genetic oscillator in Escherichia coli that is fast, robust and persistent, with tunable oscillatory periods as fast as 13 min. The oscillator was designed using a previously modelled network architecture comprising linked positive and negative feedback loops. Using a microfluidic platform tailored for single-cell microscopy, we precisely control environmental conditions and monitor oscillations in individual cells through multiple cycles. Experiments reveal remarkable robustness and persistence of oscillations in the designed circuit; almost every cell exhibited large-amplitude fluorescence oscillations throughout observation runs. The oscillatory period can be tuned by altering inducer levels, temperature and the media source. Computational modelling demonstrates that the key design principle for constructing a robust oscillator is a time delay in the negative feedback loop, which can mechanistically arise from the cascade of cellular processes involved in forming a functional transcription factor. The positive feedback loop increases the robustness of the oscillations and allows for greater tunability. Examination of our refined model suggested the existence of a simplified oscillator design without positive feedback, and we construct an oscillator strain confirming this computational prediction.
FMM-Yukawa: An adaptive fast multipole method for screened Coulomb interactions

NASA Astrophysics Data System (ADS)

Huang, Jingfang; Jia, Jun; Zhang, Bo

2009-11-01

A Fortran program package is introduced for the rapid evaluation of the screened Coulomb interactions of N particles in three dimensions. The method utilizes an adaptive oct-tree structure, and is based on the new version of fast multipole method in which the exponential expansions are used to diagonalize the multipole-to-local translations. The program and its full description, as well as several closely related packages are also available at http://www.fastmultipole.org/. This paper is a brief review of the program and its performance. Catalogue identifier: AEEQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL 2.0 No. of lines in distributed program, including test data, etc.: 12 385 No. of bytes in distributed program, including test data, etc.: 79 222 Distribution format: tar.gz Programming language: Fortran77 and Fortran90 Computer: Any Operating system: Any RAM: Depends on the number of particles, their distribution, and the adaptive tree structure Classification: 4.8, 4.12 Nature of problem: To evaluate the screened Coulomb potential and force field of N charged particles, and to evaluate a convolution type integral where the Green's function is the fundamental solution of the modified Helmholtz equation. Solution method: An adaptive oct-tree is generated, and a new version of fast multipole method is applied in which the "multipole-to-local" translation operator is diagonalized. Restrictions: Only three and six significant digits accuracy options are provided in this version. Unusual features: Most of the codes are written in Fortran77. Functions for memory allocation from Fortran90 and above are used in one subroutine. Additional comments: For supplementary information see http://www.fastmultipole.org/ Running time: The running time varies depending on the number of particles (denoted by N) in the system and their distribution. The running time scales linearly as a function of N for nearly uniform particle distributions. For three digits accuracy, the solver breaks even with direct summation method at about N = 750. References: [1] L. Greengard, J. Huang, A new version of the fast multipole method for screened Coulomb interactions in three dimensions, J. Comput. Phys. 180 (2002) 642-658.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
[Ironman Triathlon].

PubMed

Knechtle, Beat; Nikolaidis, Pantelis T; Rosemann, Thomas; Rüst, Christoph A

2016-06-22

Every year, thousands of triathletes try to qualify for the «Ironman Hawaii» (3,8 km swimming, 180 km cycling and 42,195 km running), the World Championship of long-distance triathletes. In this overview, we present the recent findings in literature with the most important variables with an influence on Ironman triathlon performance. The most important performance-influencing factors for a fast Ironman race time for both women and men are a large training volume and a high intensity in training, a large volume being more important than a high intensity, a low percentage of body fat, an ideal age of 30–35 years, a fast personal best in the Olympic distance triathlon (1,5 km swimming, 40 km cycling and 10 km running), a fast personal best in marathon running and origin from the United States of America.
Online Calibration of the TPC Drift Time in the ALICE High Level Trigger

NASA Astrophysics Data System (ADS)

Rohr, David; Krzewicki, Mikolaj; Zampolli, Chiara; Wiechula, Jens; Gorbunov, Sergey; Chauvin, Alex; Vorobyev, Ivan; Weber, Steffen; Schweda, Kai; Lindenstruth, Volker

2017-06-01

A Large Ion Collider Experiment (ALICE) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN. The high level trigger (HLT) is a compute cluster, which reconstructs collisions as recorded by the ALICE detector in real-time. It employs a custom online data-transport framework to distribute data and workload among the compute nodes. ALICE employs subdetectors that are sensitive to environmental conditions such as pressure and temperature, e.g., the time projection chamber (TPC). A precise reconstruction of particle trajectories requires calibration of these detectors. Performing calibration in real time in the HLT improves the online reconstructions and renders certain offline calibration steps obsolete speeding up offline physics analysis. For LHC Run 3, starting in 2020 when data reduction will rely on reconstructed data, online calibration becomes a necessity. Reconstructed particle trajectories build the basis for the calibration making a fast online-tracking mandatory. The main detectors used for this purpose are the TPC and Inner Tracking System. Reconstructing the trajectories in the TPC is the most compute-intense step. We present several improvements to the ALICE HLT developed to facilitate online calibration. The main new development for online calibration is a wrapper that can run ALICE offline analysis and calibration tasks inside the HLT. In addition, we have added asynchronous processing capabilities to support long-running calibration tasks in the HLT framework, which runs event-synchronously otherwise. In order to improve the resiliency, an isolated process performs the asynchronous operations such that even a fatal error does not disturb data taking. We have complemented the original loop-free HLT chain with ZeroMQ data-transfer components. The ZeroMQ components facilitate a feedback loop that inserts the calibration result created at the end of the chain back into tracking components at the beginning of the chain, after a short delay. All these new features are implemented in a general way, such that they have use-cases aside from online calibration. In order to gather sufficient statistics for the calibration, the asynchronous calibration component must process enough events per time interval. Since the calibration is valid only for a certain time period, the delay until the feedback loop provides updated calibration data must not be too long. A first full-scale test of the online calibration functionality was performed during 2015 heavy-ion run under real conditions. Since then, online calibration is enabled and benchmarked in 2016 proton-proton data taking. We present a timing analysis of this first online-calibration test, which concludes that the HLT is capable of online TPC drift time calibration fast enough to calibrate the tracking via the feedback loop. We compare the calibration results with the offline calibration and present a comparison of the residuals of the TPC cluster coordinates with respect to offline reconstruction.
Transport Simulations for Fast Ignition on NIF

NASA Astrophysics Data System (ADS)

Strozzi, D. J.; Tabak, M.; Grote, D. P.; Town, R. P. J.; Kemp, A. J.

2009-11-01

Calculations of the transport and deposition of a relativistic electron beam into fast-ignition fuel configurations are presented. The hybrid PIC code LSP is used, run in implicit mode and with fluid background particles. The electron beam distribution is chosen based on explicit PIC simulations of the short-pulse LPI. These generally display two hot-electron temperatures, one close to the ponderomotive scaling and one that is much lower. Fast-electron collisions utilize the formulae of J. R. Davies [S. Atzeni et al., Plasma Phys. Controlled Fusion 51 (2009)], and are done with a conservative, relativistic grid-based method similar to Lemons et al., J. Comput. Phys. 228 (2009). We include energy loss off both bound and free electrons in partially-ionized media (such as a gold cone), and have started to use realistic ionization and non-ideal EOS models. We have found the fractional energy coupling into the dense fuel is higher for CD than DT targets, due to the enhanced resistivity and resulting magnetic fields. The coupling enhancement due to magnetic fields and beam characteristics (such as angular spectrum) will be quantified.
Another Program For Generating Interactive Graphics

NASA Technical Reports Server (NTRS)

Costenbader, Jay; Moleski, Walt; Szczur, Martha; Howell, David; Engelberg, Norm; Li, Tin P.; Misra, Dharitri; Miller, Philip; Neve, Leif; Wolf, Karl;

1991-01-01

VAX/Ultrix version of Transportable Applications Environment Plus (TAE+) computer program provides integrated, portable software environment for developing and running interactive window, text, and graphical-object-based application software systems. Enables programmer or nonprogrammer to construct easily custom software interface between user and application program and to move resulting interface program and its application program to different computers. When used throughout company for wide range of applications, makes both application program and computer seem transparent, with noticeable improvements in learning curve. Available in form suitable for following six different groups of computers: DEC VAX station and other VMS VAX computers, Macintosh II computers running AUX, Apollo Domain Series 3000, DEC VAX and reduced-instruction-set-computer workstations running Ultrix, Sun 3- and 4-series workstations running Sun OS and IBM RT/PC's and PS/2 computers running AIX, and HP 9000 S

Turbulence modeling of free shear layers for high-performance aircraft

NASA Technical Reports Server (NTRS)

Sondak, Douglas L.

1993-01-01

The High Performance Aircraft (HPA) Grand Challenge of the High Performance Computing and Communications (HPCC) program involves the computation of the flow over a high performance aircraft. A variety of free shear layers, including mixing layers over cavities, impinging jets, blown flaps, and exhaust plumes, may be encountered in such flowfields. Since these free shear layers are usually turbulent, appropriate turbulence models must be utilized in computations in order to accurately simulate these flow features. The HPCC program is relying heavily on parallel computers. A Navier-Stokes solver (POVERFLOW) utilizing the Baldwin-Lomax algebraic turbulence model was developed and tested on a 128-node Intel iPSC/860. Algebraic turbulence models run very fast, and give good results for many flowfields. For complex flowfields such as those mentioned above, however, they are often inadequate. It was therefore deemed that a two-equation turbulence model will be required for the HPA computations. The k-epsilon two-equation turbulence model was implemented on the Intel iPSC/860. Both the Chien low-Reynolds-number model and a generalized wall-function formulation were included.
Fast algorithms for computing phylogenetic divergence time.

PubMed

Crosby, Ralph W; Williams, Tiffani L

2017-12-06

The inference of species divergence time is a key step in most phylogenetic studies. Methods have been available for the last ten years to perform the inference, but the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. For example a study of 349 primate taxa was estimated to require over 9 months of processing time. In this work, we present a new algorithm, AncestralAge, that significantly improves the performance of the divergence time process. As part of AncestralAge, we demonstrate a new method for the computation of phylogenetic likelihood and our experiments show a 90% improvement in likelihood computation time on the aforementioned dataset of 349 primates taxa with over 60,000 DNA base pairs. Additionally, we show that our new method for the computation of the Bayesian prior on node ages reduces the running time for this computation on the 349 taxa dataset by 99%. Through the use of these new algorithms we open up the ability to perform divergence time inference on large phylogenetic studies.
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences.

PubMed

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in.
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences

PubMed Central

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. Availability PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in PMID:24616564
New technologies for advanced three-dimensional optimum shape design in aeronautics

NASA Astrophysics Data System (ADS)

Dervieux, Alain; Lanteri, Stéphane; Malé, Jean-Michel; Marco, Nathalie; Rostaing-Schmidt, Nicole; Stoufflet, Bruno

1999-05-01

The analysis of complex flows around realistic aircraft geometries is becoming more and more predictive. In order to obtain this result, the complexity of flow analysis codes has been constantly increasing, involving more refined fluid models and sophisticated numerical methods. These codes can only run on top computers, exhausting their memory and CPU capabilities. It is, therefore, difficult to introduce best analysis codes in a shape optimization loop: most previous works in the optimum shape design field used only simplified analysis codes. Moreover, as the most popular optimization methods are the gradient-based ones, the more complex the flow solver, the more difficult it is to compute the sensitivity code. However, emerging technologies are contributing to make such an ambitious project, of including a state-of-the-art flow analysis code into an optimisation loop, feasible. Among those technologies, there are three important issues that this paper wishes to address: shape parametrization, automated differentiation and parallel computing. Shape parametrization allows faster optimization by reducing the number of design variable; in this work, it relies on a hierarchical multilevel approach. The sensitivity code can be obtained using automated differentiation. The automated approach is based on software manipulation tools, which allow the differentiation to be quick and the resulting differentiated code to be rather fast and reliable. In addition, the parallel algorithms implemented in this work allow the resulting optimization software to run on increasingly larger geometries. Copyright
Improved performance in NASTRAN (R)

NASA Technical Reports Server (NTRS)

Chan, Gordon C.

1989-01-01

Three areas of improvement in COSMIC/NASTRAN, 1989 release, were incorporated recently that make the analysis program run faster on large problems. Actual log files and actual timings on a few test samples that were run on IBM, CDC, VAX, and CRAY computers were compiled. The speed improvement is proportional to the problem size and number of continuation cards. Vectorizing certain operations in BANDIT, makes BANDIT run twice as fast in some large problems using structural elements with many node points. BANDIT is a built-in NASTRAN processor that optimizes the structural matrix bandwidth. The VAX matrix packing routine BLDPK was modified so that it is now packing a column of a matrix 3 to 9 times faster. The denser and bigger the matrix, the greater is the speed improvement. This improvement makes a host of routines and modules that involve matrix operation run significantly faster, and saves disc space for dense matrices. A UNIX version, converted from 1988 COSMIC/NASTRAN, was tested successfully on a Silicon Graphics computer using the UNIX V Operating System, with Berkeley 4.3 Extensions. The Utility Modules INPUTT5 and OUTPUT5 were expanded to handle table data, as well as matrices. Both INPUTT5 and OUTPUT5 are general input/output modules that read and write FORTRAN files with or without format. More user informative messages are echoed from PARAMR, PARAMD, and SCALAR modules to ensure proper data values and data types being handled. Two new Utility Modules, GINOFILE and DATABASE, were written for the 1989 release. Seven rigid elements are added to COSMIC/NASTRAN. They are: CRROD, CRBAR, CRTRPLT, CRBE1, CRBE2, CRBE3, and CRSPLINE.
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.

Toward an automated parallel computing environment for geosciences

NASA Astrophysics Data System (ADS)

Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

2007-08-01

Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

NASA Astrophysics Data System (ADS)

Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

2012-09-01

By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Program For Evaluation Of Reliability Of Ceramic Parts

NASA Technical Reports Server (NTRS)

Nemeth, N.; Janosik, L. A.; Gyekenyesi, J. P.; Powers, Lynn M.

1996-01-01

CARES/LIFE predicts probability of failure of monolithic ceramic component as function of service time. Assesses risk that component fractures prematurely as result of subcritical crack growth (SCG). Effect of proof testing of components prior to service also considered. Coupled to such commercially available finite-element programs as ANSYS, ABAQUS, MARC, MSC/NASTRAN, and COSMOS/M. Also retains all capabilities of previous CARES code, which includes estimation of fast-fracture component reliability and Weibull parameters from inert strength (without SCG contributing to failure) specimen data. Estimates parameters that characterize SCG from specimen data as well. Written in ANSI FORTRAN 77 to be machine-independent. Program runs on any computer in which sufficient addressable memory (at least 8MB) and FORTRAN 77 compiler available. For IBM-compatible personal computer with minimum 640K memory, limited program available (CARES/PC, COSMIC number LEW-15248).
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph

NASA Astrophysics Data System (ADS)

Poat, M. D.; Lauret, J.; Betts, W.

2015-12-01

The STAR online computing infrastructure has become an intensive dynamic system used for first-hand data collection and analysis resulting in a dense collection of data output. As we have transitioned to our current state, inefficient, limited storage systems have become an impediment to fast feedback to online shift crews. Motivation for a centrally accessible, scalable and redundant distributed storage system had become a necessity in this environment. OpenStack Swift Object Storage and Ceph Object Storage are two eye-opening technologies as community use and development have led to success elsewhere. In this contribution, OpenStack Swift and Ceph have been put to the test with single and parallel I/O tests, emulating real world scenarios for data processing and workflows. The Ceph file system storage, offering a POSIX compliant file system mounted similarly to an NFS share was of particular interest as it aligned with our requirements and was retained as our solution. I/O performance tests were run against the Ceph POSIX file system and have presented surprising results indicating true potential for fast I/O and reliability. STAR'S online compute farm historical use has been for job submission and first hand data analysis. The goal of reusing the online compute farm to maintain a storage cluster and job submission will be an efficient use of the current infrastructure.
Effects of general, specific and combined warm-up on explosive muscular performance

PubMed Central

Henriquez–Olguín, C; Beltrán, AR; Ramírez, MA; Labarca, C; Cornejo, M; Álvarez, C; Ramírez-Campillo, R

2015-01-01

The purpose of this study was to compare the acute effects of general, specific and combined warm-up (WU) on explosive performance. Healthy male (n = 10) subjects participated in six WU protocols in a crossover randomized study design. Protocols were: passive rest (PR; 15 min of passive rest), running (Run; 5 min of running at 70% of maximum heart rate), stretching (STR; 5 min of static stretching exercise), jumping [Jump; 5 min of jumping exercises – 3x8 countermovement jumps (CMJ) and 3x8 drop jumps from 60 cm (DJ60)], and combined (COM; protocols Run+STR+Jump combined). Immediately before and after each WU, subjects were assessed for explosive concentric-only (i.e. squat jump – SJ), slow stretch-shortening cycle (i.e. CMJ), fast stretch-shortening cycle (i.e. DJ60) and contact time (CT) muscle performance. PR significantly reduced SJ performance (p =0.007). Run increased SJ (p =0.0001) and CMJ (p =0.002). STR increased CMJ (p =0.048). Specific WU (i.e. Jump) increased SJ (p =0.001), CMJ (p =0.028) and DJ60 (p =0.006) performance. COM increased CMJ performance (p =0.006). Jump was superior in SJ performance vs. PR (p =0.001). Jump reduced (p =0.03) CT in DJ60. In conclusion, general, specific and combined WU increase slow stretch-shortening cycle (SSC) muscle performance, but only specific WU increases fast SSC muscle performance. Therefore, to increase fast SSC performance, specific fast SSC muscle actions must be included during the WU. PMID:26060335
Effects of changing speed on knee and ankle joint load during walking and running.

PubMed

de David, Ana Cristina; Carpes, Felipe Pivetta; Stefanyshyn, Darren

2015-01-01

Joint moments can be used as an indicator of joint loading and have potential application for sports performance and injury prevention. The effects of changing walking and running speeds on joint moments for the different planes of motion still are debatable. Here, we compared knee and ankle moments during walking and running at different speeds. Data were collected from 11 recreational male runners to determine knee and ankle joint moments during different conditions. Conditions include walking at a comfortable speed (self-selected pacing), fast walking (fastest speed possible), slow running (speed corresponding to 30% slower than running) and running (at 4 m · s(-1) ± 10%). A different joint moment pattern was observed between walking and running. We observed a general increase in joint load for sagittal and frontal planes as speed increased, while the effects of speed were not clear in the transverse plane moments. Although differences tend to be more pronounced when gait changed from walking to running, the peak moments, in general, increased when speed increased from comfortable walking to fast walking and from slow running to running mainly in the sagittal and frontal planes. Knee flexion moment was higher in walking than in running due to larger knee extension. Results suggest caution when recommending walking over running in an attempt to reduce knee joint loading. The different effects of speed increments during walking and running should be considered with regard to the prevention of injuries and for rehabilitation purposes.
MEKS: A program for computation of inclusive jet cross sections at hadron colliders

NASA Astrophysics Data System (ADS)

Gao, Jun; Liang, Zhihua; Soper, Davison E.; Lai, Hung-Liang; Nadolsky, Pavel M.; Yuan, C.-P.

2013-06-01

EKS is a numerical program that predicts differential cross sections for production of single-inclusive hadronic jets and jet pairs at next-to-leading order (NLO) accuracy in a perturbative QCD calculation. We describe MEKS 1.0, an upgraded EKS program with increased numerical precision, suitable for comparisons to the latest experimental data from the Large Hadron Collider and Tevatron. The program integrates the regularized patron-level matrix elements over the kinematical phase space for production of two and three partons using the VEGAS algorithm. It stores the generated weighted events in finely binned two-dimensional histograms for fast offline analysis. A user interface allows one to customize computation of inclusive jet observables. Results of a benchmark comparison of the MEKS program and the commonly used FastNLO program are also documented. Program SummaryProgram title: MEKS 1.0 Catalogue identifier: AEOX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland. Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 9234 No. of bytes in distributed program, including test data, etc.: 51997 Distribution format: tar.gz Programming language: Fortran (main program), C (CUBA library and analysis program). Computer: All. Operating system: Any UNIX-like system. RAM: ˜300 MB Classification: 11.1. External routines: LHAPDF (https://lhapdf.hepforge.org/) Nature of problem: Computation of differential cross sections for inclusive production of single hadronic jets and jet pairs at next-to-leading order accuracy in perturbative quantum chromodynamics. Solution method: Upon subtraction of infrared singularities, the hard-scattering matrix elements are integrated over available phase space using an optimized VEGAS algorithm. Weighted events are generated and filled into a finely binned two-dimensional histogram, from which the final cross sections with typical experimental binning and cuts are computed by an independent analysis program. Monte Carlo sampling of event weights is tuned automatically to get better efficiency. Running time: Depends on details of the calculation and sought numerical accuracy. See benchmark performance in Section 4. The tests provided take approximately 27 min for the jetbin run and a few seconds for jetana.
Evaluation of the Ross fast solution of Richards’ equation in unfavourable conditions for standard finite element methods

NASA Astrophysics Data System (ADS)

Crevoisier, David; Chanzy, André; Voltz, Marc

2009-06-01

Ross [Ross PJ. Modeling soil water and solute transport - fast, simplified numerical solutions. Agron J 2003;95:1352-61] developed a fast, simplified method for solving Richards' equation. This non-iterative 1D approach, using Brooks and Corey [Brooks RH, Corey AT. Hydraulic properties of porous media. Hydrol. papers, Colorado St. Univ., Fort Collins; 1964] hydraulic functions, allows a significant reduction in computing time while maintaining the accuracy of the results. The first aim of this work is to confirm these results in a more extensive set of problems, including those that would lead to serious numerical difficulties for the standard numerical method. The second aim is to validate a generalisation of the Ross method to other mathematical representations of hydraulic functions. The Ross method is compared with the standard finite element model, Hydrus-1D [Simunek J, Sejna M, Van Genuchten MTh. The HYDRUS-1D and HYDRUS-2D codes for estimating unsaturated soil hydraulic and solutes transport parameters. Agron Abstr 357; 1999]. Computing time, accuracy of results and robustness of numerical schemes are monitored in 1D simulations involving different types of homogeneous soils, grids and hydrological conditions. The Ross method associated with modified Van Genuchten hydraulic functions [Vogel T, Cislerova M. On the reliability of unsaturated hydraulic conductivity calculated from the moisture retention curve. Transport Porous Media 1988;3:1-15] proves in every tested scenario to be more robust numerically, and the compromise of computing time/accuracy is seen to be particularly improved on coarse grids. Ross method run from 1.25 to 14 times faster than Hydrus-1D.
Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

PubMed

Eslami, Taban; Saeed, Fahad

2018-04-20

Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.
A computationally fast, reduced model for simulating landslide dynamics and tsunamis generated by landslides in natural terrains

NASA Astrophysics Data System (ADS)

Mohammed, F.

2016-12-01

Landslide hazards such as fast-moving debris flows, slow-moving landslides, and other mass flows cause numerous fatalities, injuries, and damage. Landslide occurrences in fjords, bays, and lakes can additionally generate tsunamis with locally extremely high wave heights and runups. Two-dimensional depth-averaged models can successfully simulate the entire lifecycle of the three-dimensional landslide dynamics and tsunami propagation efficiently and accurately with the appropriate assumptions. Landslide rheology is defined using viscous fluids, visco-plastic fluids, and granular material to account for the possible landslide source materials. Saturated and unsaturated rheologies are further included to simulate debris flow, debris avalanches, mudflows, and rockslides respectively. The models are obtained by reducing the fully three-dimensional Navier-Stokes equations with the internal rheological definition of the landslide material, the water body, and appropriate scaling assumptions to obtain the depth-averaged two-dimensional models. The landslide and tsunami models are coupled to include the interaction between the landslide and the water body for tsunami generation. The reduced models are solved numerically with a fast semi-implicit finite-volume, shock-capturing based algorithm. The well-balanced, positivity preserving algorithm accurately accounts for wet-dry interface transition for the landslide runout, landslide-water body interface, and the tsunami wave flooding on land. The models are implemented as a General-Purpose computing on Graphics Processing Unit-based (GPGPU) suite of models, either coupled or run independently within the suite. The GPGPU implementation provides up to 1000 times speedup over a CPU-based serial computation. This enables simulations of multiple scenarios of hazard realizations that provides a basis for a probabilistic hazard assessment. The models have been successfully validated against experiments, past studies, and field data for landslides and tsunamis.
Electronic mail.

PubMed Central

Pallen, M.

1995-01-01

Electronic mail (email) has many advantages over other forms of communication: it is easy to use, free of charge, fast, and delivers information in a digital format. As a text only medium, email is usually less formal in style than conventional correspondence and may contain acronyms and other features, such as smileys, that are peculiar to the Internet. Email client programs that run on your own microcomputer render email powerful and easy to use. With suitable encoding methods, email can be used to send any kind of computer file, including pictures, sounds, programs, and movies. Numerous biomedical electronic mailing lists and other Internet services are accessible by email. PMID:8520343
Onboard Run-Time Goal Selection for Autonomous Operations

NASA Technical Reports Server (NTRS)

Rabideau, Gregg; Chien, Steve; McLaren, David

2010-01-01

We describe an efficient, online goal selection algorithm for use onboard spacecraft and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
On the Use of Statistics in Design and the Implications for Deterministic Computer Experiments

NASA Technical Reports Server (NTRS)

Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.

1997-01-01

Perhaps the most prevalent use of statistics in engineering design is through Taguchi's parameter and robust design -- using orthogonal arrays to compute signal-to-noise ratios in a process of design improvement. In our view, however, there is an equally exciting use of statistics in design that could become just as prevalent: it is the concept of metamodeling whereby statistical models are built to approximate detailed computer analysis codes. Although computers continue to get faster, analysis codes always seem to keep pace so that their computational time remains non-trivial. Through metamodeling, approximations of these codes are built that are orders of magnitude cheaper to run. These metamodels can then be linked to optimization routines for fast analysis, or they can serve as a bridge for integrating analysis codes across different domains. In this paper we first review metamodeling techniques that encompass design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We discuss their existing applications in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of metamodeling techniques in given situations and how common pitfalls can be avoided.
A Three-Dimensional Unsteady CFD Model of Compressor Stability

NASA Technical Reports Server (NTRS)

Chima, Rodrick V.

2006-01-01

A three-dimensional unsteady CFD code called CSTALL has been developed and used to investigate compressor stability. The code solved the Euler equations through the entire annulus and all blade rows. Blade row turning, losses, and deviation were modeled using body force terms which required input data at stations between blade rows. The input data was calculated using a separate Navier-Stokes turbomachinery analysis code run at one operating point near stall, and was scaled to other operating points using overall characteristic maps. No information about the stalled characteristic was used. CSTALL was run in a 2-D throughflow mode for very fast calculations of operating maps and estimation of stall points. Calculated pressure ratio characteristics for NASA stage 35 agreed well with experimental data, and results with inlet radial distortion showed the expected loss of range. CSTALL was also run in a 3-D mode to investigate inlet circumferential distortion. Calculated operating maps for stage 35 with 120 degree distortion screens showed a loss in range and pressure rise. Unsteady calculations showed rotating stall with two part-span stall cells. The paper describes the body force formulation in detail, examines the computed results, and concludes with observations about the code.
MIGS-GPU: Microarray Image Gridding and Segmentation on the GPU.

PubMed

Katsigiannis, Stamos; Zacharia, Eleni; Maroulis, Dimitris

2017-05-01

Complementary DNA (cDNA) microarray is a powerful tool for simultaneously studying the expression level of thousands of genes. Nevertheless, the analysis of microarray images remains an arduous and challenging task due to the poor quality of the images that often suffer from noise, artifacts, and uneven background. In this study, the MIGS-GPU [Microarray Image Gridding and Segmentation on Graphics Processing Unit (GPU)] software for gridding and segmenting microarray images is presented. MIGS-GPU's computations are performed on the GPU by means of the compute unified device architecture (CUDA) in order to achieve fast performance and increase the utilization of available system resources. Evaluation on both real and synthetic cDNA microarray images showed that MIGS-GPU provides better performance than state-of-the-art alternatives, while the proposed GPU implementation achieves significantly lower computational times compared to the respective CPU approaches. Consequently, MIGS-GPU can be an advantageous and useful tool for biomedical laboratories, offering a user-friendly interface that requires minimum input in order to run.
CUDA-based acceleration of collateral filtering in brain MR images

NASA Astrophysics Data System (ADS)

Li, Cheng-Yuan; Chang, Herng-Hua

2017-02-01

Image denoising is one of the fundamental and essential tasks within image processing. In medical imaging, finding an effective algorithm that can remove random noise in MR images is important. This paper proposes an effective noise reduction method for brain magnetic resonance (MR) images. Our approach is based on the collateral filter which is a more powerful method than the bilateral filter in many cases. However, the computation of the collateral filter algorithm is quite time-consuming. To solve this problem, we improved the collateral filter algorithm with parallel computing using GPU. We adopted CUDA, an application programming interface for GPU by NVIDIA, to accelerate the computation. Our experimental evaluation on an Intel Xeon CPU E5-2620 v3 2.40GHz with a NVIDIA Tesla K40c GPU indicated that the proposed implementation runs dramatically faster than the traditional collateral filter. We believe that the proposed framework has established a general blueprint for achieving fast and robust filtering in a wide variety of medical image denoising applications.
Using NetMeeting for remote configuration of the Otto Bock C-Leg: technical considerations.

PubMed

Lemaire, E D; Fawcett, J A

2002-08-01

Telehealth has the potential to be a valuable tool for technical and clinical support of computer controlled prosthetic devices. This pilot study examined the use of Internet-based, desktop video conferencing for remote configuration of the Otto Bock C-Leg. Laboratory tests involved connecting two computers running Microsoft NetMeeting over a local area network (IP protocol). Over 56 Kbs(-1), DSL/Cable, and 10 Mbs(-1) LAN speeds, a prosthetist remotely configured a user's C-Leg by using Application Sharing, Live Video, and Live Audio. A similar test between sites in Ottawa and Toronto, Canada was limited by the notebook computer's 28 Kbs(-1) modem. At the 28 Kbs(-1) Internet-connection speed, NetMeeting's application sharing feature was not able to update the remote Sliders window fast enough to display peak toe loads and peak knee angles. These results support the use of NetMeeting as an accessible and cost-effective tool for remote C-Leg configuration, provided that sufficient Internet data transfer speed is available.
Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

NASA Technical Reports Server (NTRS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.

ROBNCA: robust network component analysis for recovering transcription factor activities.

PubMed

Noor, Amina; Ahmad, Aitzaz; Serpedin, Erchin; Nounou, Mohamed; Nounou, Hazem

2013-10-01

Network component analysis (NCA) is an efficient method of reconstructing the transcription factor activity (TFA), which makes use of the gene expression data and prior information available about transcription factor (TF)-gene regulations. Most of the contemporary algorithms either exhibit the drawback of inconsistency and poor reliability, or suffer from prohibitive computational complexity. In addition, the existing algorithms do not possess the ability to counteract the presence of outliers in the microarray data. Hence, robust and computationally efficient algorithms are needed to enable practical applications. We propose ROBust Network Component Analysis (ROBNCA), a novel iterative algorithm that explicitly models the possible outliers in the microarray data. An attractive feature of the ROBNCA algorithm is the derivation of a closed form solution for estimating the connectivity matrix, which was not available in prior contributions. The ROBNCA algorithm is compared with FastNCA and the non-iterative NCA (NI-NCA). ROBNCA estimates the TF activity profiles as well as the TF-gene control strength matrix with a much higher degree of accuracy than FastNCA and NI-NCA, irrespective of varying noise, correlation and/or amount of outliers in case of synthetic data. The ROBNCA algorithm is also tested on Saccharomyces cerevisiae data and Escherichia coli data, and it is observed to outperform the existing algorithms. The run time of the ROBNCA algorithm is comparable with that of FastNCA, and is hundreds of times faster than NI-NCA. The ROBNCA software is available at http://people.tamu.edu/∼amina/ROBNCA
On Your Mark! Get Set! Go!

ERIC Educational Resources Information Center

Rowland, Veronica

2006-01-01

This article describes how second- and third-grade students joined their teacher, Mary Ann McTiernan, a marathon runner from Cape Town, South Africa, in a one-mile run every Thursday morning while she was training. Mary Ann's students had been asking, "Where do you run?" "How far do you go?" "How fast can you run?"…
Absolute comparison of simulated and experimental protein-folding dynamics

NASA Astrophysics Data System (ADS)

Snow, Christopher D.; Nguyen, Houbi; Pande, Vijay S.; Gruebele, Martin

2002-11-01

Protein folding is difficult to simulate with classical molecular dynamics. Secondary structure motifs such as α-helices and β-hairpins can form in 0.1-10µs (ref. 1), whereas small proteins have been shown to fold completely in tens of microseconds. The longest folding simulation to date is a single 1-µs simulation of the villin headpiece; however, such single runs may miss many features of the folding process as it is a heterogeneous reaction involving an ensemble of transition states. Here, we have used a distributed computing implementation to produce tens of thousands of 5-20-ns trajectories (700µs) to simulate mutants of the designed mini-protein BBA5. The fast relaxation dynamics these predict were compared with the results of laser temperature-jump experiments. Our computational predictions are in excellent agreement with the experimentally determined mean folding times and equilibrium constants. The rapid folding of BBA5 is due to the swift formation of secondary structure. The convergence of experimentally and computationally accessible timescales will allow the comparison of absolute quantities characterizing in vitro and in silico (computed) protein folding.
Next generation simulation tools: the Systems Biology Workbench and BioSPICE integration.

PubMed

Sauro, Herbert M; Hucka, Michael; Finney, Andrew; Wellock, Cameron; Bolouri, Hamid; Doyle, John; Kitano, Hiroaki

2003-01-01

Researchers in quantitative systems biology make use of a large number of different software packages for modelling, analysis, visualization, and general data manipulation. In this paper, we describe the Systems Biology Workbench (SBW), a software framework that allows heterogeneous application components--written in diverse programming languages and running on different platforms--to communicate and use each others' capabilities via a fast binary encoded-message system. Our goal was to create a simple, high performance, opensource software infrastructure which is easy to implement and understand. SBW enables applications (potentially running on separate, distributed computers) to communicate via a simple network protocol. The interfaces to the system are encapsulated in client-side libraries that we provide for different programming languages. We describe in this paper the SBW architecture, a selection of current modules, including Jarnac, JDesigner, and SBWMeta-tool, and the close integration of SBW into BioSPICE, which enables both frameworks to share tools and compliment and strengthen each others capabilities.
Periodic spring-mass running over uneven terrain through feedforward control of landing conditions.

PubMed

Palmer, Luther R; Eaton, Caitrin E

2014-09-01

This work pursues a feedforward control algorithm for high-speed legged locomotion over uneven terrain. Being able to rapidly negotiate uneven terrain without visual or a priori information about the terrain will allow legged systems to be used in time-critical applications and alongside fast-moving humans or vehicles. The algorithm is shown here implemented on a spring-loaded inverted pendulum model in simulation, and can be configured to approach fixed running height over uneven terrain or self-stable terrain following. Offline search identifies unique landing conditions that achieve a desired apex height with a constant stride period over varying ground levels. Because the time between the apex and touchdown events is directly related to ground height, the landing conditions can be computed in real time as continuous functions of this falling time. Enforcing a constant stride period reduces the need for inertial sensing of the apex event, which is nontrivial for physical systems, and allows for clocked feedfoward control of the swing leg.
The Weekly Fab Five: Things You Should Do Every Week To Keep Your Computer Running in Tip-Top Shape.

ERIC Educational Resources Information Center

Crispen, Patrick

2001-01-01

Describes five steps that school librarians should follow every week to keep their computers running at top efficiency. Explains how to update virus definitions; run Windows update; run ScanDisk to repair errors on the hard drive; run a disk defragmenter; and backup all data. (LRW)
Radiative corrections from heavy fast-roll fields during inflation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Rajeev Kumar; Sandora, McCullen; Sloth, Martin S., E-mail: jain@cp3.dias.sdu.dk, E-mail: sandora@cp3.dias.sdu.dk, E-mail: sloth@cp3.dias.sdu.dk

2015-06-01

We investigate radiative corrections to the inflaton potential from heavy fields undergoing a fast-roll phase transition. We find that a logarithmic one-loop correction to the inflaton potential involving this field can induce a temporary running of the spectral index. The induced running can be a short burst of strong running, which may be related to the observed anomalies on large scales in the cosmic microwave spectrum, or extend over many e-folds, sustaining an effectively constant running to be searched for in the future. We implement this in a general class of models, where effects are mediated through a heavy messengermore » field sitting in its minimum. Interestingly, within the present framework it is a generic outcome that a large running implies a small field model with a vanishing tensor-to-scalar ratio, circumventing the normal expectation that small field models typically lead to an unobservably small running of the spectral index. An observable level of tensor modes can also be accommodated, but, surprisingly, this requires running to be induced by a curvaton. If upcoming observations are consistent with a small tensor-to-scalar ratio as predicted by small field models of inflation, then the present study serves as an explicit example contrary to the general expectation that the running will be unobservable.« less
Radiative corrections from heavy fast-roll fields during inflation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Rajeev Kumar; Sandora, McCullen; Sloth, Martin S.

2015-06-09

We investigate radiative corrections to the inflaton potential from heavy fields undergoing a fast-roll phase transition. We find that a logarithmic one-loop correction to the inflaton potential involving this field can induce a temporary running of the spectral index. The induced running can be a short burst of strong running, which may be related to the observed anomalies on large scales in the cosmic microwave spectrum, or extend over many e-folds, sustaining an effectively constant running to be searched for in the future. We implement this in a general class of models, where effects are mediated through a heavy messengermore » field sitting in its minimum. Interestingly, within the present framework it is a generic outcome that a large running implies a small field model with a vanishing tensor-to-scalar ratio, circumventing the normal expectation that small field models typically lead to an unobservably small running of the spectral index. An observable level of tensor modes can also be accommodated, but, surprisingly, this requires running to be induced by a curvaton. If upcoming observations are consistent with a small tensor-to-scalar ratio as predicted by small field models of inflation, then the present study serves as an explicit example contrary to the general expectation that the running will be unobservable.« less
Program For Generating Interactive Displays

NASA Technical Reports Server (NTRS)

Costenbader, Jay; Moleski, Walt; Szczur, Martha; Howell, David; Engelberg, Norm; Li, Tin P.; Misra, Dharitri; Miller, Philip; Neve, Leif; Wolf, Karl;

1991-01-01

Sun/Unix version of Transportable Applications Environment Plus (TAE+) computer program provides integrated, portable software environment for developing and running interactive window, text, and graphical-object-based application software systems. Enables programmer or nonprogrammer to construct easily custom software interface between user and application program and to move resulting interface program and its application program to different computers. Plus viewed as productivity tool for application developers and application end users, who benefit from resultant consistent and well-designed user interface sheltering them from intricacies of computer. Available in form suitable for following six different groups of computers: DEC VAX station and other VMS VAX computers, Macintosh II computers running AUX, Apollo Domain Series 3000, DEC VAX and reduced-instruction-set-computer workstations running Ultrix, Sun 3- and 4-series workstations running Sun OS and IBM RT/PC and PS/2 compute

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

PubMed

Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

2016-04-01

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

NASA Astrophysics Data System (ADS)

Demir, I.; Agliamzanov, R.

2014-12-01

Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
Model-Driven Development for scientific computing. Computations of RHEED intensities for a disordered surface. Part I

NASA Astrophysics Data System (ADS)

Daniluk, Andrzej

2010-03-01

Scientific computing is the field of study concerned with constructing mathematical models, numerical solution techniques and with using computers to analyse and solve scientific and engineering problems. Model-Driven Development (MDD) has been proposed as a means to support the software development process through the use of a model-centric approach. This paper surveys the core MDD technology that was used to develop an application that allows computation of the RHEED intensities dynamically for a disordered surface. New version program summaryProgram title: RHEED1DProcess Catalogue identifier: ADUY_v4_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUY_v4_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 31 971 No. of bytes in distributed program, including test data, etc.: 3 039 820 Distribution format: tar.gz Programming language: Embarcadero C++ Builder Computer: Intel Core Duo-based PC Operating system: Windows XP, Vista, 7 RAM: more than 1 GB Classification: 4.3, 7.2, 6.2, 8, 14 Catalogue identifier of previous version: ADUY_v3_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2394 Does the new version supersede the previous version?: No Nature of problem: An application that implements numerical simulations should be constructed according to the CSFAR rules: clear and well-documented, simple, fast, accurate, and robust. A clearly written, externally and internally documented program is much easier to understand and modify. A simple program is much less prone to error and is more easily modified than one that is complicated. Simplicity and clarity also help make the program flexible. Making the program fast has economic benefits. It also allows flexibility because some of the features that make a program efficient can be traded off for greater accuracy. Making the program fast also has the benefit of allowing longer calculations with better resolution. The compromise between speed and accuracy has always posted one of the most troublesome challenges for the programmer. Almost all advances in numerical analysis have come about trying to reach these twin goals. Change in the basic algorithms will give greater improvements in accuracy and speed than using special numerical tricks or changing programming language. A robust program works correctly over a broad spectrum of input data. Solution method: The computational model of the program is based on the use of a dynamical diffraction theory in which the electrons are taken to be diffracted by a potential, which is periodic in the dimension perpendicular to the surface. In the case of a disordered surface we can use the proportional model of the scattering potential, in which the potential of a partially filled layer is taken to be the product of the coverage of this layer and the potential of a fully filled layer: U(θ,z)=∑ θ(t/τ)U(1,z), where U(1,z) stands for the potential for the full nth layer, and U(θ,z) the potential of the growing layer. Reasons for new version: Responding to the user feedback the RHEEDGr_09 program has been upgraded to a standard that allows carrying out computations of the RHEED intensities for a disordered surface. Also, functionality and documentation of the program have been improved. Summary of revisions:The logical structure of the Platform-Specific Model of the RHEEDGr_09 program has been modified according to the scheme showed in Fig. 1*. The class diagram in Fig. 1* is a static view of the main platform-specific elements of the RHEED1DProcess architecture. Fig. 2* provides a dynamic view by showing the creation and destruction simplistic sequence diagram for the process. Fig. 3* shows the RHEED1DProcess use case model. As can be seen in Figs. 2-3* the RHEED1DProcess has been designed as a slave process that runs as a separate thread inside each transaction generated by the master Growth09 program (see pii:S0010-4655(09)00386-5 A. Daniluk, Model-Driven Development for scientific computing. Computations of RHEED intensities for a disordered surface. Part II The RHEED1DProcess requires the user to provide the appropriate parameters for the crystal structure under investigation. These parameters are loaded from the parameters.ini file at run-time. Instructions on the preparation of the .ini files can be found in the new distribution. The RHEED1DProcess requires the user to provide the appropriate values of the layers of coverage profiles. The CoverageProfiles.dat file (generated by Growth09 master application) at run-time loads these values. The RHEED1DProcess enables carrying out one-dimensional dynamical calculations for the fcc lattice, with a two-atoms basis and fcc lattice, with one atom basis but yet the zeroth Fourier component of the scattering potential in the TRHEED1D::crystPotUg() function can be modified according to users' specific application requirements. * The figures mentioned can be downloaded, see "Supplementary material" below. Unusual features: The program is distributed in the form of main projects RHEED1DProcess.cbproj and Graph2D0x.cbproj with associated files, and should be compiled using Embarcadero RAD Studio 2010 along with Together visual-modelling platform. The program should be compiled with English/USA regional and language options. Additional comments: This version of the RHEED program is designed to run in conjunction with the GROWTH09 (ADVL_v3_0) program. It does not replace the previous, stand alone, RHEEDGR-09 (ADUY_v3_0) version. Running time: The typical running time is machine and user-parameters dependent. References:[1] OMG, Model Driven Architecture Guide Version 1.0.1, 2003.
Small target detection using objectness and saliency

NASA Astrophysics Data System (ADS)

Zhang, Naiwen; Xiao, Yang; Fang, Zhiwen; Yang, Jian; Wang, Li; Li, Tao

2017-10-01

We are motived by the need for generic object detection algorithm which achieves high recall for small targets in complex scenes with acceptable computational efficiency. We propose a novel object detection algorithm, which has high localization quality with acceptable computational cost. Firstly, we obtain the objectness map as in BING[1] and use NMS to get the top N points. Then, k-means algorithm is used to cluster them into K classes according to their location. We set the center points of the K classes as seed points. For each seed point, an object potential region is extracted. Finally, a fast salient object detection algorithm[2] is applied to the object potential regions to highlight objectlike pixels, and a series of efficient post-processing operations are proposed to locate the targets. Our method runs at 5 FPS on 1000*1000 images, and significantly outperforms previous methods on small targets in cluttered background.
Efficient image compression algorithm for computer-animated images

NASA Astrophysics Data System (ADS)

Yfantis, Evangelos A.; Au, Matthew Y.; Miel, G.

1992-10-01

An image compression algorithm is described. The algorithm is an extension of the run-length image compression algorithm and its implementation is relatively easy. This algorithm was implemented and compared with other existing popular compression algorithms and with the Lempel-Ziv (LZ) coding. The Lempel-Ziv algorithm is available as a utility in the UNIX operating system and is also referred to as the UNIX uncompress. Sometimes our algorithm is best in terms of saving memory space, and sometimes one of the competing algorithms is best. The algorithm is lossless, and the intent is for the algorithm to be used in computer graphics animated images. Comparisons made with the LZ algorithm indicate that the decompression time using our algorithm is faster than that using the LZ algorithm. Once the data are in memory, a relatively simple and fast transformation is applied to uncompress the file.
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering.

PubMed

Sicat, Ronell; Krüger, Jens; Möller, Torsten; Hadwiger, Markus

2014-12-01

This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs.
Predictions of Supersonic Jet Mixing and Shock-Associated Noise Compared With Measured Far-Field Data

NASA Technical Reports Server (NTRS)

Dahl, Milo D.

2010-01-01

Codes for predicting supersonic jet mixing and broadband shock-associated noise were assessed using a database containing noise measurements of a jet issuing from a convergent nozzle. Two types of codes were used to make predictions. Fast running codes containing empirical models were used to compute both the mixing noise component and the shock-associated noise component of the jet noise spectrum. One Reynolds-averaged, Navier-Stokes-based code was used to compute only the shock-associated noise. To enable the comparisons of the predicted component spectra with data, the measured total jet noise spectra were separated into mixing noise and shock-associated noise components. Comparisons were made for 1/3-octave spectra and some power spectral densities using data from jets operating at 24 conditions covering essentially 6 fully expanded Mach numbers with 4 total temperature ratios.
Remote Visualization and Remote Collaboration On Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Watson, Val; Lasinski, T. A. (Technical Monitor)

1995-01-01

A new technology has been developed for remote visualization that provides remote, 3D, high resolution, dynamic, interactive viewing of scientific data (such as fluid dynamics simulations or measurements). Based on this technology, some World Wide Web sites on the Internet are providing fluid dynamics data for educational or testing purposes. This technology is also being used for remote collaboration in joint university, industry, and NASA projects in computational fluid dynamics and wind tunnel testing. Previously, remote visualization of dynamic data was done using video format (transmitting pixel information) such as video conferencing or MPEG movies on the Internet. The concept for this new technology is to send the raw data (e.g., grids, vectors, and scalars) along with viewing scripts over the Internet and have the pixels generated by a visualization tool running on the viewer's local workstation. The visualization tool that is currently used is FAST (Flow Analysis Software Toolkit).
[Method of file sorting for mini- and microcomputers].

PubMed

Chau, N; Legras, B; Benamghar, L; Martin, J

1983-05-01

The authors describe a new sorting method of files which belongs to the class of direct-addressing sorting methods. It makes use of a variant of the classical technique of 'virtual memory'. It is particularly well suited to mini- and micro-computers which have a small core memory (32 K words, for example) and are fitted with a direct-access peripheral device, such as a disc unit. When the file to be sorted is medium-sized (some thousand records), the running of the program essentially occurs inside the core memory and consequently, the method becomes very fast. This is very important because most medical files handled in our laboratory are in this category. However, the method is also suitable for big computers and large files; its implementation is easy. It does not require any magnetic tape unit, and it seems to us to be one of the fastest methods available.
Semantic 3d City Model to Raster Generalisation for Water Run-Off Modelling

NASA Astrophysics Data System (ADS)

Verbree, E.; de Vries, M.; Gorte, B.; Oude Elberink, S.; Karimlou, G.

2013-09-01

Water run-off modelling applied within urban areas requires an appropriate detailed surface model represented by a raster height grid. Accurate simulations at this scale level have to take into account small but important water barriers and flow channels given by the large-scale map definitions of buildings, street infrastructure, and other terrain objects. Thus, these 3D features have to be rasterised such that each cell represents the height of the object class as good as possible given the cell size limitations. Small grid cells will result in realistic run-off modelling but with unacceptable computation times; larger grid cells with averaged height values will result in less realistic run-off modelling but fast computation times. This paper introduces a height grid generalisation approach in which the surface characteristics that most influence the water run-off flow are preserved. The first step is to create a detailed surface model (1:1.000), combining high-density laser data with a detailed topographic base map. The topographic map objects are triangulated to a set of TIN-objects by taking into account the semantics of the different map object classes. These TIN objects are then rasterised to two grids with a 0.5m cell-spacing: one grid for the object class labels and the other for the TIN-interpolated height values. The next step is to generalise both raster grids to a lower resolution using a procedure that considers the class label of each cell and that of its neighbours. The results of this approach are tested and validated by water run-off model runs for different cellspaced height grids at a pilot area in Amersfoort (the Netherlands). Two national datasets were used in this study: the large scale Topographic Base map (BGT, map scale 1:1.000), and the National height model of the Netherlands AHN2 (10 points per square meter on average). Comparison between the original AHN2 height grid and the semantically enriched and then generalised height grids shows that water barriers are better preserved with the new method. This research confirms the idea that topographical information, mainly the boundary locations and object classes, can enrich the height grid for this hydrological application.
Easy and Fast Reconstruction of a 3D Avatar with an RGB-D Sensor.

PubMed

Mao, Aihua; Zhang, Hong; Liu, Yuxin; Zheng, Yinglong; Li, Guiqing; Han, Guoqiang

2017-05-12

This paper proposes a new easy and fast 3D avatar reconstruction method using an RGB-D sensor. Users can easily implement human body scanning and modeling just with a personal computer and a single RGB-D sensor such as a Microsoft Kinect within a small workspace in their home or office. To make the reconstruction of 3D avatars easy and fast, a new data capture strategy is proposed for efficient human body scanning, which captures only 18 frames from six views with a close scanning distance to fully cover the body; meanwhile, efficient alignment algorithms are presented to locally align the data frames in the single view and then globally align them in multi-views based on pairwise correspondence. In this method, we do not adopt shape priors or subdivision tools to synthesize the model, which helps to reduce modeling complexity. Experimental results indicate that this method can obtain accurate reconstructed 3D avatar models, and the running performance is faster than that of similar work. This research offers a useful tool for the manufacturers to quickly and economically create 3D avatars for products design, entertainment and online shopping.

Cloud computing for comparative genomics

PubMed Central

2010-01-01

Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Cloud computing for comparative genomics.

PubMed

Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

2010-05-18

Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
Metadyn View: Fast web-based viewer of free energy surfaces calculated by metadynamics

NASA Astrophysics Data System (ADS)

Hošek, Petr; Spiwok, Vojtěch

2016-01-01

Metadynamics is a highly successful enhanced sampling technique for simulation of molecular processes and prediction of their free energy surfaces. An in-depth analysis of data obtained by this method is as important as the simulation itself. Although there are several tools to compute free energy surfaces from metadynamics data, they usually lack user friendliness and a build-in visualization part. Here we introduce Metadyn View as a fast and user friendly viewer of bias potential/free energy surfaces calculated by metadynamics in Plumed package. It is based on modern web technologies including HTML5, JavaScript and Cascade Style Sheets (CSS). It can be used by visiting the web site and uploading a HILLS file. It calculates the bias potential/free energy surface on the client-side, so it can run online or offline without necessity to install additional web engines. Moreover, it includes tools for measurement of free energies and free energy differences and data/image export.
Reducing Time to Science: Unidata and JupyterHub Technology Using the Jetstream Cloud

NASA Astrophysics Data System (ADS)

Chastang, J.; Signell, R. P.; Fischer, J. L.

2017-12-01

Cloud computing can accelerate scientific workflows, discovery, and collaborations by reducing research and data friction. We describe the deployment of Unidata and JupyterHub technologies on the NSF-funded XSEDE Jetstream cloud. With the aid of virtual machines and Docker technology, we deploy a Unidata JupyterHub server co-located with a Local Data Manager (LDM), THREDDS data server (TDS), and RAMADDA geoscience content management system. We provide Jupyter Notebooks and the pre-built Python environments needed to run them. The notebooks can be used for instruction and as templates for scientific experimentation and discovery. We also supply a large quantity of NCEP forecast model results to allow data-proximate analysis and visualization. In addition, users can transfer data using Globus command line tools, and perform their own data-proximate analysis and visualization with Notebook technology. These data can be shared with others via a dedicated TDS server for scientific distribution and collaboration. There are many benefits of this approach. Not only is the cloud computing environment fast, reliable and scalable, but scientists can analyze, visualize, and share data using only their web browser. No local specialized desktop software or a fast internet connection is required. This environment will enable scientists to spend less time managing their software and more time doing science.
Fast-Running Aeroelastic Code Based on Unsteady Linearized Aerodynamic Solver Developed

NASA Technical Reports Server (NTRS)

Reddy, T. S. R.; Bakhle, Milind A.; Keith, T., Jr.

2003-01-01

The NASA Glenn Research Center has been developing aeroelastic analyses for turbomachines for use by NASA and industry. An aeroelastic analysis consists of a structural dynamic model, an unsteady aerodynamic model, and a procedure to couple the two models. The structural models are well developed. Hence, most of the development for the aeroelastic analysis of turbomachines has involved adapting and using unsteady aerodynamic models. Two methods are used in developing unsteady aerodynamic analysis procedures for the flutter and forced response of turbomachines: (1) the time domain method and (2) the frequency domain method. Codes based on time domain methods require considerable computational time and, hence, cannot be used during the design process. Frequency domain methods eliminate the time dependence by assuming harmonic motion and, hence, require less computational time. Early frequency domain analyses methods neglected the important physics of steady loading on the analyses for simplicity. A fast-running unsteady aerodynamic code, LINFLUX, which includes steady loading and is based on the frequency domain method, has been modified for flutter and response calculations. LINFLUX, solves unsteady linearized Euler equations for calculating the unsteady aerodynamic forces on the blades, starting from a steady nonlinear aerodynamic solution. First, we obtained a steady aerodynamic solution for a given flow condition using the nonlinear unsteady aerodynamic code TURBO. A blade vibration analysis was done to determine the frequencies and mode shapes of the vibrating blades, and an interface code was used to convert the steady aerodynamic solution to a form required by LINFLUX. A preprocessor was used to interpolate the mode shapes from the structural dynamic mesh onto the computational dynamics mesh. Then, we used LINFLUX to calculate the unsteady aerodynamic forces for a given mode, frequency, and phase angle. A postprocessor read these unsteady pressures and calculated the generalized aerodynamic forces, eigenvalues, and response amplitudes. The eigenvalues determine the flutter frequency and damping. As a test case, the flutter of a helical fan was calculated with LINFLUX and compared with calculations from TURBO-AE, a nonlinear time domain code, and from ASTROP2, a code based on linear unsteady aerodynamics.
A 3D interactive method for estimating body segmental parameters in animals: application to the turning and running performance of Tyrannosaurus rex.

PubMed

Hutchinson, John R; Ng-Thow-Hing, Victor; Anderson, Frank C

2007-06-21

We developed a method based on interactive B-spline solids for estimating and visualizing biomechanically important parameters for animal body segments. Although the method is most useful for assessing the importance of unknowns in extinct animals, such as body contours, muscle bulk, or inertial parameters, it is also useful for non-invasive measurement of segmental dimensions in extant animals. Points measured directly from bodies or skeletons are digitized and visualized on a computer, and then a B-spline solid is fitted to enclose these points, allowing quantification of segment dimensions. The method is computationally fast enough so that software implementations can interactively deform the shape of body segments (by warping the solid) or adjust the shape quantitatively (e.g., expanding the solid boundary by some percentage or a specific distance beyond measured skeletal coordinates). As the shape changes, the resulting changes in segment mass, center of mass (CM), and moments of inertia can be recomputed immediately. Volumes of reduced or increased density can be embedded to represent lungs, bones, or other structures within the body. The method was validated by reconstructing an ostrich body from a fleshed and defleshed carcass and comparing the estimated dimensions to empirically measured values from the original carcass. We then used the method to calculate the segmental masses, centers of mass, and moments of inertia for an adult Tyrannosaurus rex, with measurements taken directly from a complete skeleton. We compare these results to other estimates, using the model to compute the sensitivities of unknown parameter values based upon 30 different combinations of trunk, lung and air sac, and hindlimb dimensions. The conclusion that T. rex was not an exceptionally fast runner remains strongly supported by our models-the main area of ambiguity for estimating running ability seems to be estimating fascicle lengths, not body dimensions. Additionally, the craniad position of the CM in all of our models reinforces the notion that T. rex did not stand or move with extremely columnar, elephantine limbs. It required some flexion in the limbs to stand still, but how much flexion depends directly on where its CM is assumed to lie. Finally we used our model to test an unsolved problem in dinosaur biomechanics: how fast a huge biped like T. rex could turn. Depending on the assumptions, our whole body model integrated with a musculoskeletal model estimates that turning 45 degrees on one leg could be achieved slowly, in about 1-2s.
Real-time Experiment Interface for Biological Control Applications

PubMed Central

Lin, Risa J.; Bettencourt, Jonathan; White, John A.; Christini, David J.; Butera, Robert J.

2013-01-01

The Real-time Experiment Interface (RTXI) is a fast and versatile real-time biological experimentation system based on Real-Time Linux. RTXI is open source and free, can be used with an extensive range of experimentation hardware, and can be run on Linux or Windows computers (when using the Live CD). RTXI is currently used extensively for two experiment types: dynamic patch clamp and closed-loop stimulation pattern control in neural and cardiac single cell electrophysiology. RTXI includes standard plug-ins for implementing commonly used electrophysiology protocols with synchronized stimulation, event detection, and online analysis. These and other user-contributed plug-ins can be found on the website (http://www.rtxi.org). PMID:21096883
Dominant takeover regimes for genetic algorithms

NASA Technical Reports Server (NTRS)

Noever, David; Baskaran, Subbiah

1995-01-01

The genetic algorithm (GA) is a machine-based optimization routine which connects evolutionary learning to natural genetic laws. The present work addresses the problem of obtaining the dominant takeover regimes in the GA dynamics. Estimated GA run times are computed for slow and fast convergence in the limits of high and low fitness ratios. Using Euler's device for obtaining partial sums in closed forms, the result relaxes the previously held requirements for long time limits. Analytical solution reveal that appropriately accelerated regimes can mark the ascendancy of the most fit solution. In virtually all cases, the weak (logarithmic) dependence of convergence time on problem size demonstrates the potential for the GA to solve large N-P complete problems.
Benchmarking and performance analysis of the CM-2. [SIMD computer

NASA Technical Reports Server (NTRS)

Myers, David W.; Adams, George B., II

1988-01-01

A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.
Interactive analysis of geographically distributed population imaging data collections over light-path data networks

NASA Astrophysics Data System (ADS)

van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.

2015-03-01

The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.
A comparison of two types of running wheel in terms of mouse preference, health, and welfare.

PubMed

Walker, Michael; Mason, Georgia

2018-07-01

Voluntary wheel running occurs in mice of all strains, sexes, and ages. Mice find voluntary wheel running rewarding, and it leads to numerous health benefits. For this reason wheels are used both to enhance welfare and to create models of exercise. However, many designs of running wheel are used. This makes between-study comparisons difficult, as this variability could potentially affect the amount, pattern, and/or intensity of running behaviour, and thence the wheels' effects on welfare and exercise-related changes in anatomy and physiology. This study therefore evaluated two commercially available models, chosen because safe for group-housed mice: Bio Serv®'s "fast-trac" wheel combo and Ware Manufacturing Inc.'s stainless steel mesh 5″ upright wheel. Working with a total of three hundred and fifty one female C57BL/6, DBA/2 and BALB/c mice, we assessed these wheels' relative utilization by mice when access was free; the strength of motivation for each wheel-type when access required crossing an electrified grid; and the impact each wheel had on mouse well-being (inferred from acoustic startle responses and neophobia) and exercise-related anatomical changes (BMI; heart and hind limb masses). Mice ran more on the "fast-trac" wheel regardless of whether both wheel-types were available at once, or only if one was present. In terms of motivation, subjects required to work to access a single wheel worked equally hard for both wheel-types (even if locked and thus not useable for running), but if provided with one working wheel for free and the other type of wheel (again unlocked) accessible via crossing the electrified grid, the "fast-trac" wheel emerged as more motivating, as the Maximum Price Paid for the Ware metal wheel was lower than that paid for the "fast-trac" plastic wheel, at least for C57BL/6s and DBA/2s. No deleterious consequences were noted with either wheel in terms of health and welfare, but only mice with plastic wheels developed significantly larger hearts and hind limbs than control animals with locked wheels. Thus, where differences emerged, Bio Serv®'s "fast-trac" wheel combos appeared to better meet the aims of exercise provision than Ware Manufacturing's steel upright wheels. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparing Effects of Feedstock and Run Conditions on Pyrolysis Products Produced at Pilot-Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dunning, Timothy C; Gaston, Katherine R; Wilcox, Esther

2018-01-19

Fast pyrolysis is a promising pathway for mass production of liquid transportable biofuels. The Thermochemical Process Development Unit (TCPDU) pilot plant at NREL is conducting research to support the Bioenergy Technologies Office's 2017 goal of a $3 per gallon biofuel. In preparation for down select of feedstock and run conditions, four different feedstocks were run at three different run conditions. The products produced were characterized extensively. Hot pyrolysis vapors and light gasses were analyzed on a slip stream, and oil and char samples were characterized post run.
The Effects of Treadmill Running on Aging Laryngeal Muscle Structure

PubMed Central

Kletzien, Heidi; Russell, John A.; Connor, Nadine P.

2015-01-01

Levels of Evidence NA (animal study) Objective Age-related changes in laryngeal muscle structure and function may contribute to deficits in voice and swallowing observed in elderly people. We hypothesized that treadmill running, an exercise that increases respiratory drive to upper airway muscles, would induce changes in thyroarytenoid muscle myosin heavy chain (MHC) isoforms consistent with a fast-slow transformation in muscle fiber type. Study Design Randomized parallel group controlled trial. Methods Fifteen young adult and 14 old Fischer 344/Brown Norway rats received either treadmill running or no exercise (5 days/week/8 weeks). Myosin heavy chain isoform composition in the thyroarytenoid muscle was examined at the end of 8 weeks. Results Significant age and treatment effects were found. The young adult group had the greatest proportion of superfast contracting MHCIIL. The treadmill running group had the lowest proportion of MHCIIL and the greatest proportion of MHCIIx. Conclusion Thyroarytenoid muscle structure was affected both by age and treadmill running in a fast-slow transition that is characteristic of exercise manipulations in other skeletal muscles. PMID:26256100
Fast and robust shape diameter function.

PubMed

Chen, Shuangmin; Liu, Taijun; Shu, Zhenyu; Xin, Shiqing; He, Ying; Tu, Changhe

2018-01-01

The shape diameter function (SDF) is a scalar function defined on a closed manifold surface, measuring the neighborhood diameter of the object at each point. Due to its pose oblivious property, SDF is widely used in shape analysis, segmentation and retrieval. However, computing SDF is computationally expensive since one has to place an inverted cone at each point and then average the penetration distances for a number of rays inside the cone. Furthermore, the shape diameters are highly sensitive to local geometric features as well as the normal vectors, hence diminishing their applications to real-world meshes which often contain rich geometric details and/or various types of defects, such as noise and gaps. In order to increase the robustness of SDF and promote it to a wide range of 3D models, we define SDF by offsetting the input object a little bit. This seemingly minor change brings three significant benefits: First, it allows us to compute SDF in a robust manner since the offset surface is able to give reliable normal vectors. Second, it runs many times faster since at each point we only need to compute the penetration distance along a single direction, rather than tens of directions. Third, our method does not require watertight surfaces as the input-it supports both point clouds and meshes with noise and gaps. Extensive experimental results show that the offset-surface based SDF is robust to noise and insensitive to geometric details, and it also runs about 10 times faster than the existing method. We also exhibit its usefulness using two typical applications including shape retrieval and shape segmentation, and observe a significant improvement over the existing SDF.
Quantification of uncertainties in the tsunami hazard for Cascadia using statistical emulation

NASA Astrophysics Data System (ADS)

Guillas, S.; Day, S. J.; Joakim, B.

2016-12-01

We present new high resolution tsunami wave propagation and coastal inundation for the Cascadia region in the Pacific Northwest. The coseismic representation in this analysis is novel, and more realistic than in previous studies, as we jointly parametrize multiple aspects of the seabed deformation. Due to the large computational cost of such simulators, statistical emulation is required in order to carry out uncertainty quantification tasks, as emulators efficiently approximate simulators. The emulator replaces the tsunami model VOLNA by a fast surrogate, so we are able to efficiently propagate uncertainties from the source characteristics to wave heights, in order to probabilistically assess tsunami hazard for Cascadia. We employ a new method for the design of the computer experiments in order to reduce the number of runs while maintaining good approximations properties of the emulator. Out of the initial nine parameters, mostly describing the geometry and time variation of the seabed deformation, we drop two parameters since these turn out to not have an influence on the resulting tsunami waves at the coast. We model the impact of another parameter linearly as its influence on the wave heights is identified as linear. We combine this screening approach with the sequential design algorithm MICE (Mutual Information for Computer Experiments), that adaptively selects the input values at which to run the computer simulator, in order to maximize the expected information gain (mutual information) over the input space. As a result, the emulation is made possible and accurate. Starting from distributions of the source parameters that encapsulate geophysical knowledge of the possible source characteristics, we derive distributions of the tsunami wave heights along the coastline.
Seismic waveform modeling over cloud

NASA Astrophysics Data System (ADS)

Luo, Cong; Friederich, Wolfgang

2016-04-01

With the fast growing computational technologies, numerical simulation of seismic wave propagation achieved huge successes. Obtaining the synthetic waveforms through numerical simulation receives an increasing amount of attention from seismologists. However, computational seismology is a data-intensive research field, and the numerical packages usually come with a steep learning curve. Users are expected to master considerable amount of computer knowledge and data processing skills. Training users to use the numerical packages, correctly access and utilize the computational resources is a troubled task. In addition to that, accessing to HPC is also a common difficulty for many users. To solve these problems, a cloud based solution dedicated on shallow seismic waveform modeling has been developed with the state-of-the-art web technologies. It is a web platform integrating both software and hardware with multilayer architecture: a well designed SQL database serves as the data layer, HPC and dedicated pipeline for it is the business layer. Through this platform, users will no longer need to compile and manipulate various packages on the local machine within local network to perform a simulation. By providing users professional access to the computational code through its interfaces and delivering our computational resources to the users over cloud, users can customize the simulation at expert-level, submit and run the job through it.
Scalable Metropolis Monte Carlo for simulation of hard shapes

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

2016-07-01

We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.
Adaptive Integration of Nonsmooth Dynamical Systems

DTIC Science & Technology

2017-10-11

controlled time stepping method to interactively design running robots. [1] John Shepherd, Samuel Zapolsky, and Evan M. Drumwright, “Fast multi-body...software like this to test software running on my robots. Started working in simulation after attempting to use software like this to test software... running on my robots. The libraries that produce these beautiful results have failed at simulating robotic manipulation. Postulate: It is easier to
TH-A-19A-08: Intel Xeon Phi Implementation of a Fast Multi-Purpose Monte Carlo Simulation for Proton Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souris, K; Lee, J; Sterpin, E

2014-06-15

Purpose: Recent studies have demonstrated the capability of graphics processing units (GPUs) to compute dose distributions using Monte Carlo (MC) methods within clinical time constraints. However, GPUs have a rigid vectorial architecture that favors the implementation of simplified particle transport algorithms, adapted to specific tasks. Our new, fast, and multipurpose MC code, named MCsquare, runs on Intel Xeon Phi coprocessors. This technology offers 60 independent cores, and therefore more flexibility to implement fast and yet generic MC functionalities, such as prompt gamma simulations. Methods: MCsquare implements several models and hence allows users to make their own tradeoff between speed andmore » accuracy. A 200 MeV proton beam is simulated in a heterogeneous phantom using Geant4 and two configurations of MCsquare. The first one is the most conservative and accurate. The method of fictitious interactions handles the interfaces and secondary charged particles emitted in nuclear interactions are fully simulated. The second, faster configuration simplifies interface crossings and simulates only secondary protons after nuclear interaction events. Integral depth-dose and transversal profiles are compared to those of Geant4. Moreover, the production profile of prompt gammas is compared to PENH results. Results: Integral depth dose and transversal profiles computed by MCsquare and Geant4 are within 3%. The production of secondaries from nuclear interactions is slightly inaccurate at interfaces for the fastest configuration of MCsquare but this is unlikely to have any clinical impact. The computation time varies between 90 seconds for the most conservative settings to merely 59 seconds in the fastest configuration. Finally prompt gamma profiles are also in very good agreement with PENH results. Conclusion: Our new, fast, and multi-purpose Monte Carlo code simulates prompt gammas and calculates dose distributions in less than a minute, which complies with clinical time constraints. It has been successfully validated with Geant4. This work has been financialy supported by InVivoIGT, a public/private partnership between UCL and IBA.« less
System and method for controlling power consumption in a computer system based on user satisfaction

DOEpatents

Yang, Lei; Dick, Robert P; Chen, Xi; Memik, Gokhan; Dinda, Peter A; Shy, Alex; Ozisikyilmaz, Berkin; Mallik, Arindam; Choudhary, Alok

2014-04-22

Systems and methods for controlling power consumption in a computer system. For each of a plurality of interactive applications, the method changes a frequency at which a processor of the computer system runs, receives an indication of user satisfaction, determines a relationship between the changed frequency and the user satisfaction of the interactive application, and stores the determined relationship information. The determined relationship can distinguish between different users and different interactive applications. A frequency may be selected from the discrete frequencies at which the processor of the computer system runs based on the determined relationship information for a particular user and a particular interactive application running on the processor of the computer system. The processor may be adapted to run at the selected frequency.

Fast Running Urban Dispersion Model for Radiological Dispersal Device (RDD) Releases: Model Description and Validation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gowardhan, Akshay; Neuscamman, Stephanie; Donetti, John

Aeolus is an efficient three-dimensional computational fluid dynamics code based on finite volume method developed for predicting transport and dispersion of contaminants in a complex urban area. It solves the time dependent incompressible Navier-Stokes equation on a regular Cartesian staggered grid using a fractional step method. It also solves a scalar transport equation for temperature and using the Boussinesq approximation. The model also includes a Lagrangian dispersion model for predicting the transport and dispersion of atmospheric contaminants. The model can be run in an efficient Reynolds Average Navier-Stokes (RANS) mode with a run time of several minutes, or a moremore » detailed Large Eddy Simulation (LES) mode with run time of hours for a typical simulation. This report describes the model components, including details on the physics models used in the code, as well as several model validation efforts. Aeolus wind and dispersion predictions are compared to field data from the Joint Urban Field Trials 2003 conducted in Oklahoma City (Allwine et al 2004) including both continuous and instantaneous releases. Newly implemented Aeolus capabilities include a decay chain model and an explosive Radiological Dispersal Device (RDD) source term; these capabilities are described. Aeolus predictions using the buoyant explosive RDD source are validated against two experimental data sets: the Green Field explosive cloud rise experiments conducted in Israel (Sharon et al 2012) and the Full-Scale RDD Field Trials conducted in Canada (Green et al 2016).« less
Mechanistic prediction of fission-gas behavior during in-cell transient heating tests on LWR fuel using the GRASS-SST and FASTGRASS computer codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rest, J; Gehl, S M

1979-01-01

GRASS-SST and FASTGRASS are mechanistic computer codes for predicting fission-gas behavior in UO/sub 2/-base fuels during steady-state and transient conditions. FASTGRASS was developed in order to satisfy the need for a fast-running alternative to GRASS-SST. Althrough based on GRASS-SST, FASTGRASS is approximately an order of magnitude quicker in execution. The GRASS-SST transient analysis has evolved through comparisons of code predictions with the fission-gas release and physical phenomena that occur during reactor operation and transient direct-electrical-heating (DEH) testing of irradiated light-water reactor fuel. The FASTGRASS calculational procedure is described in this paper, along with models of key physical processes included inmore » both FASTGRASS and GRASS-SST. Predictions of fission-gas release obtained from GRASS-SST and FASTGRASS analyses are compared with experimental observations from a series of DEH tests. The major conclusions is that the computer codes should include an improved model for the evolution of the grain-edge porosity.« less
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

NASA Astrophysics Data System (ADS)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Limits to high-speed simulations of spiking neural networks using general-purpose computers.

PubMed

Zenke, Friedemann; Gerstner, Wulfram

2014-01-01

To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
TADSim: Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics

DOE PAGES

Mniszewski, Susan M.; Junghans, Christoph; Voter, Arthur F.; ...

2015-04-16

Next-generation high-performance computing will require more scalable and flexible performance prediction tools to evaluate software--hardware co-design choices relevant to scientific applications and hardware architectures. Here, we present a new class of tools called application simulators—parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation. Parameterized choices for the algorithmic method and hardware options provide a rich space for design exploration and allow us to quickly find well-performing software--hardware combinations. We demonstrate our approach with a TADSim simulator that models the temperature-accelerated dynamics (TAD) method, an algorithmically complex and parameter-rich member of the accelerated molecular dynamics (AMD) family ofmore » molecular dynamics methods. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We accomplish this by identifying the time-intensive elements, quantifying algorithm steps in terms of those elements, abstracting them out, and replacing them by the passage of time. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We extend TADSim to model algorithm extensions, such as speculative spawning of the compute-bound stages, and predict performance improvements without having to implement such a method. Validation against the actual TAD code shows close agreement for the evolution of an example physical system, a silver surface. Finally, focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights and suggested extensions.« less
Specvis: Free and open-source software for visual field examination.

PubMed

Dzwiniel, Piotr; Gola, Mateusz; Wójcik-Gryciuk, Anna; Waleszczyk, Wioletta J

2017-01-01

Visual field impairment affects more than 100 million people globally. However, due to the lack of the access to appropriate ophthalmic healthcare in undeveloped regions as a result of associated costs and expertise this number may be an underestimate. Improved access to affordable diagnostic software designed for visual field examination could slow the progression of diseases, such as glaucoma, allowing for early diagnosis and intervention. We have developed Specvis, a free and open-source application written in Java programming language that can run on any personal computer to meet this requirement (http://www.specvis.pl/). Specvis was tested on glaucomatous, retinitis pigmentosa and stroke patients and the results were compared to results using the Medmont M700 Automated Static Perimeter. The application was also tested for inter-test intrapersonal variability. The results from both validation studies indicated low inter-test intrapersonal variability, and suitable reliability for a fast and simple assessment of visual field impairment. Specvis easily identifies visual field areas of zero sensitivity and allows for evaluation of its levels throughout the visual field. Thus, Specvis is a new, reliable application that can be successfully used for visual field examination and can fill the gap between confrontation and perimetry tests. The main advantages of Specvis over existing methods are its availability (free), affordability (runs on any personal computer), and reliability (comparable to high-cost solutions).
Specvis: Free and open-source software for visual field examination

PubMed Central

Dzwiniel, Piotr; Gola, Mateusz; Wójcik-Gryciuk, Anna

2017-01-01

Visual field impairment affects more than 100 million people globally. However, due to the lack of the access to appropriate ophthalmic healthcare in undeveloped regions as a result of associated costs and expertise this number may be an underestimate. Improved access to affordable diagnostic software designed for visual field examination could slow the progression of diseases, such as glaucoma, allowing for early diagnosis and intervention. We have developed Specvis, a free and open-source application written in Java programming language that can run on any personal computer to meet this requirement (http://www.specvis.pl/). Specvis was tested on glaucomatous, retinitis pigmentosa and stroke patients and the results were compared to results using the Medmont M700 Automated Static Perimeter. The application was also tested for inter-test intrapersonal variability. The results from both validation studies indicated low inter-test intrapersonal variability, and suitable reliability for a fast and simple assessment of visual field impairment. Specvis easily identifies visual field areas of zero sensitivity and allows for evaluation of its levels throughout the visual field. Thus, Specvis is a new, reliable application that can be successfully used for visual field examination and can fill the gap between confrontation and perimetry tests. The main advantages of Specvis over existing methods are its availability (free), affordability (runs on any personal computer), and reliability (comparable to high-cost solutions). PMID:29028825
Large-scale virtual screening on public cloud resources with Apache Spark.

PubMed

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola

2017-01-01

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
Active Acoustics using Bellhop-DRDC: Run Time Tests and Suggested Configurations for a Tracking Exercise in Shallow Scotian Waters

DTIC Science & Technology

2005-05-01

simulée d’essai pour obtenir les diagrammes de perte de transmission et de réverbération pour 18 éléments (une source, un réseau remorqué et 16 bouées...were recorded using a 1.5GHz Pentium 4 processor. The test results indicate that the Bellhop program runs fast enough to provide the required acoustic...was determined that the Bellhop program will be fast enough for these clients. Future Plans It is intended to integrate further enhancements that
Proximity of fast food restaurants to schools: do neighborhood income and type of school matter?

PubMed

Simon, Paul A; Kwan, David; Angelescu, Aida; Shih, Margaret; Fielding, Jonathan E

2008-09-01

To investigate the proximity of fast food restaurants to public schools and examine proximity by neighborhood income and school level (elementary, middle, or high school). Geocoded school and restaurant databases from 2005 and 2003, respectively, were used to determine the percentage of schools with one or more fast food restaurants within 400 m and 800 m of all public schools in Los Angeles County, California. Single-factor analysis of variance (ANOVA) models were run to examine fast food restaurant proximity to schools by median household income of the surrounding census tract and by school level. Two-factor ANOVA models were run to assess the additional influence of neighborhood level of commercialization. Overall, 23.3% and 64.8% of schools had one or more fast food restaurants located within 400 m and 800 m, respectively. Fast food restaurant proximity was greater for high schools than for middle and elementary schools, and was inversely related to neighborhood income for schools in the highest commercial areas. No association with income was observed in less commercial areas. Fast food restaurants are located in close proximity to many schools in this large metropolitan area, especially high schools and schools located in low income highly commercial neighborhoods. Further research is needed to assess the relationship between fast food proximity and student dietary practices and obesity risk.
4P: fast computing of population genetics statistics from large DNA polymorphism panels

PubMed Central

Benazzo, Andrea; Panziera, Alex; Bertorelle, Giorgio

2015-01-01

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations. PMID:25628874
TU-AB-BRA-12: Impact of Image Registration Algorithms On the Prediction of Pathological Response with Radiomic Textures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yip, S; Coroller, T; Niu, N

2015-06-15

Purpose: Tumor regions-of-interest (ROI) can be propagated from the pre-onto the post-treatment PET/CT images using image registration of their CT counterparts, providing an automatic way to compute texture features on longitudinal scans. This exploratory study assessed the impact of image registration algorithms on textures to predict pathological response. Methods: Forty-six esophageal cancer patients (1 tumor/patient) underwent PET/CT scans before and after chemoradiotherapy. Patients were classified into responders and non-responders after the surgery. Physician-defined tumor ROIs on pre-treatment PET were propagated onto the post-treatment PET using rigid and ten deformable registration algorithms. One co-occurrence, two run-length and size zone matrix texturesmore » were computed within all ROIs. The relative difference of each texture at different treatment time-points was used to predict the pathologic responders. Their predictive value was assessed using the area under the receiver-operating-characteristic curve (AUC). Propagated ROIs and texture quantification resulting from different algorithms were compared using overlap volume (OV) and coefficient of variation (CoV), respectively. Results: Tumor volumes were better captured by ROIs propagated by deformable rather than the rigid registration. The OV between rigidly and deformably propagated ROIs were 69%. The deformably propagated ROIs were found to be similar (OV∼80%) except for fast-demons (OV∼60%). Rigidly propagated ROIs with run-length matrix textures failed to significantly differentiate between responders and non-responders (AUC=0.65, p=0.07), while the differentiation was significant with other textures (AUC=0.69–0.72, p<0.03). Among the deformable algorithms, fast-demons was the least predictive (AUC=0.68–0.71, p<0.04). ROIs propagated by all other deformable algorithms with any texture significantly predicted pathologic responders (AUC=0.71–0.78, p<0.01) despite substantial variation in texture quantification (CoV>70%). Conclusion: Propagated ROIs using deformable registration for all textures can lead to accurate prediction of pathologic response, potentially expediting the temporal texture analysis process. However, rigid and fast-demons deformable algorithms are not recommended due to their inferior performance compared to other algorithms. The project was supported in part by a Kaye Scholar Award.« less
Dual-comb spectroscopy of water vapor with a free-running semiconductor disk laser.

PubMed

Link, S M; Maas, D J H C; Waldburger, D; Keller, U

2017-06-16

Dual-comb spectroscopy offers the potential for high accuracy combined with fast data acquisition. Applications are often limited, however, by the complexity of optical comb systems. Here we present dual-comb spectroscopy of water vapor using a substantially simplified single-laser system. Very good spectroscopy measurements with fast sampling rates are achieved with a free-running dual-comb mode-locked semiconductor disk laser. The absolute stability of the optical comb modes is characterized both for free-running operation and with simple microwave stabilization. This approach drastically reduces the complexity for dual-comb spectroscopy. Band-gap engineering to tune the center wavelength from the ultraviolet to the mid-infrared could optimize frequency combs for specific gas targets, further enabling dual-comb spectroscopy for a wider range of industrial applications. Copyright © 2017, American Association for the Advancement of Science.
Generating unstructured nuclear reactor core meshes in parallel

DOE PAGES

Jain, Rajeev; Tautges, Timothy J.

2014-10-24

Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
A 500 megabyte/second disk array

NASA Technical Reports Server (NTRS)

Ruwart, Thomas M.; Okeefe, Matthew T.

1994-01-01

Applications at the Army High Performance Computing Research Center's (AHPCRC) Graphic and Visualization Laboratory (GVL) at the University of Minnesota require a tremendous amount of I/O bandwidth and this appetite for data is growing. Silicon Graphics workstations are used to perform the post-processing, visualization, and animation of multi-terabyte size datasets produced by scientific simulations performed of AHPCRC supercomputers. The M.A.X. (Maximum Achievable Xfer) was designed to find the maximum achievable I/O performance of the Silicon Graphics CHALLENGE/Onyx-class machines that run these applications. Running a fully configured Onyx machine with 12-150MHz R4400 processors, 512MB of 8-way interleaved memory, 31 fast/wide SCSI-2 channel each with a Ciprico disk array controller we were able to achieve a maximum sustained transfer rate of 509.8 megabytes per second. However, after analyzing the results it became clear that the true maximum transfer rate is somewhat beyond this figure and we will need to do further testing with more disk array controllers in order to find the true maximum.
Pedestrian detection in crowded scenes with the histogram of gradients principle

NASA Astrophysics Data System (ADS)

Sidla, O.; Rosner, M.; Lypetskyy, Y.

2006-10-01

This paper describes a close to real-time scale invariant implementation of a pedestrian detector system which is based on the Histogram of Oriented Gradients (HOG) principle. Salient HOG features are first selected from a manually created very large database of samples with an evolutionary optimization procedure that directly trains a polynomial Support Vector Machine (SVM). Real-time operation is achieved by a cascaded 2-step classifier which uses first a very fast linear SVM (with the same features as the polynomial SVM) to reject most of the irrelevant detections and then computes the decision function with a polynomial SVM on the remaining set of candidate detections. Scale invariance is achieved by running the detector of constant size on scaled versions of the original input images and by clustering the results over all resolutions. The pedestrian detection system has been implemented in two versions: i) fully body detection, and ii) upper body only detection. The latter is especially suited for very busy and crowded scenarios. On a state-of-the-art PC it is able to run at a frequency of 8 - 20 frames/sec.
A distributed control system for the lower-hybrid current drive system on the Tokamak de Varennes

NASA Astrophysics Data System (ADS)

Bagdoo, J.; Guay, J. M.; Chaudron, G.-A.; Decoste, R.; Demers, Y.; Hubbard, A.

1990-08-01

An rf current drive system with an output power of 1 MW at 3.7 GHz is under development for the Tokamak de Varennes. The control system is based on an Ethernet local-area network of programmable logic controllers as front end, personal computers as consoles, and CAMAC-based DSP processors. The DSP processors ensure the PID control of the phase and rf power of each klystron, and the fast protection of high-power rf hardware, all within a 40 μs loop. Slower control and protection, event sequencing and the run-time database are provided by the programmable logic controllers, which communicate, via the LAN, with the consoles. The latter run a commercial process-control console software. The LAN protocol respects the first four layers of the ISO/OSI 802.3 standard. Synchronization with the tokamak control system is provided by commercially available CAMAC timing modules which trigger shot-related events and reference waveform generators. A detailed description of each subsystem and a performance evaluation of the system will be presented.
Gravitational tree-code on graphics processing units: implementation in CUDA

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-05-01

We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of θ ≈ 0.5. The code has a convenient user interface and is freely available for use. http://castle.strw.leidenuniv.nl/software/octgrav.html
A 640-MHz 32-megachannel real-time polyphase-FFT spectrum analyzer

NASA Technical Reports Server (NTRS)

Zimmerman, G. A.; Garyantes, M. F.; Grimm, M. J.; Charny, B.

1991-01-01

A polyphase fast Fourier transform (FFT) spectrum analyzer being designed for NASA's Search for Extraterrestrial Intelligence (SETI) Sky Survey at the Jet Propulsion Laboratory is described. By replacing the time domain multiplicative window preprocessing with polyphase filter processing, much of the processing loss of windowed FFTs can be eliminated. Polyphase coefficient memory costs are minimized by effective use of run length compression. Finite word length effects are analyzed, producing a balanced system with 8 bit inputs, 16 bit fixed point polyphase arithmetic, and 24 bit fixed point FFT arithmetic. Fixed point renormalization midway through the computation is seen to be naturally accommodated by the matrix FFT algorithm proposed. Simulation results validate the finite word length arithmetic analysis and the renormalization technique.
HALOS: fast, autonomous, holographic adaptive optics

NASA Astrophysics Data System (ADS)

Andersen, Geoff P.; Gelsinger-Austin, Paul; Gaddipati, Ravi; Gaddipati, Phani; Ghebremichael, Fassil

2014-08-01

We present progress on our holographic adaptive laser optics system (HALOS): a compact, closed-loop aberration correction system that uses a multiplexed hologram to deconvolve the phase aberrations in an input beam. The wavefront characterization is based on simple, parallel measurements of the intensity of fixed focal spots and does not require any complex calculations. As such, the system does not require a computer and is thus much cheaper, less complex than conventional approaches. We present details of a fully functional, closed-loop prototype incorporating a 32-element MEMS mirror, operating at a bandwidth of over 10kHz. Additionally, since the all-optical sensing is made in parallel, the speed is independent of actuator number - running at the same bandwidth for one actuator as for a million.

How to Run FAST Simulations.

PubMed

Zimmerman, M I; Bowman, G R

2016-01-01

Molecular dynamics (MD) simulations are a powerful tool for understanding enzymes' structures and functions with full atomistic detail. These physics-based simulations model the dynamics of a protein in solution and store snapshots of its atomic coordinates at discrete time intervals. Analysis of the snapshots from these trajectories provides thermodynamic and kinetic properties such as conformational free energies, binding free energies, and transition times. Unfortunately, simulating biologically relevant timescales with brute force MD simulations requires enormous computing resources. In this chapter we detail a goal-oriented sampling algorithm, called fluctuation amplification of specific traits, that quickly generates pertinent thermodynamic and kinetic information by using an iterative series of short MD simulations to explore the vast depths of conformational space. © 2016 Elsevier Inc. All rights reserved.
Density-based parallel skin lesion border detection with webCL

PubMed Central

2015-01-01

Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.

PubMed

Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu

2015-01-01

Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
Methods for building an inexpensive computer-controlled olfactometer for temporally precise experiments

PubMed Central

Lundström, Johan N.; Gordon, Amy R.; Alden, Eva C.; Boesveldt, Sanne; Albrecht, Jessica

2010-01-01

Many human olfactory experiments call for fast and stable stimulus-rise times as well as exact and stable stimulus-onset times. Due to these temporal demands, an olfactometer is often needed. However, an olfactometer is a piece of equipment that either comes with a high price tag or requires a high degree of technical expertise to build and/or to run. Here, we detail the construction of an olfactometer that is constructed almost exclusively with “off-the-shelf” parts, requires little technical knowledge to build, has relatively low price tags, and is controlled by E-Prime, a turnkey-ready and easily-programmable software commonly used in psychological experiments. The olfactometer can present either solid or liquid odor sources, and it exhibits a fast stimulus-rise time and a fast and stable stimulus-onset time. We provide a detailed description of the olfactometer construction, a list of its individual parts and prices, as well as potential modifications to the design. In addition, we present odor onset and concentration curves as measured with a photoionization detector, together with corresponding GC/MS analyses of signal-intensity drop (5.9%) over a longer period of use. Finally, we present data from behavioral and psychophysiological recordings demonstrating that the olfactometer is suitable for use during event-related EEG experiments. PMID:20688109
Vertical-angle control system in the LLMC

NASA Astrophysics Data System (ADS)

Li, Binhua; Yang, Lei; Tie, Qiongxian; Mao, Wei

2000-10-01

A control system of the vertical angle transmission used in the Lower Latitude Meridian Circle (LLMC) is described in this paper. The transmission system can change the zenith distance of the tube quickly and precisely. It works in three modes: fast motion, slow motion and lock mode. The fast motion mode and the slow motion mode are that the tube of the instrument is driven by a fast motion stepper motor and a slow motion one separately. The lock mode is running for lock mechanism that is driven by a lock stepper motor. These three motors are controlled together by a single chip microcontroller, which is controlled in turn by a host personal computer. The slow motion mechanism and its rotational step angle are fully discussed because the mechanism is not used before. Then the hardware structure of this control system based on a microcontroller is described. Control process of the system is introduced during a normal observation, which is divided into eleven steps. All the steps are programmed in our control software in C++ and/or in ASM. The C++ control program is set up in the host PC, while the ASM control program is in the microcontroller system. Structures and functions of these rprograms are presented. Some details and skills for programming are discussed in the paper too.
Be Active Your Way: A Guide for Adults

MedlinePlus

... try): ❑ Aerobic dance ❑ Basketball ❑ Fast dancing ❑ Jumping rope ❑ Martial arts (such as karate) ❑ Race walking, jogging, or running ❑ ... Heavy gardening (digging,hoeing) • Hiking uphill • Jumping rope • Martial arts (such as karate) • Race walking,jogging,or running • ...
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

PubMed Central

Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

2014-01-01

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Pre-Hardware Optimization and Implementation Of Fast Optics Closed Control Loop Algorithms

NASA Technical Reports Server (NTRS)

Kizhner, Semion; Lyon, Richard G.; Herman, Jay R.; Abuhassan, Nader

2004-01-01

One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent - the Fast Fourier Transform (FFT). The FFT is particularly useful in two-dimensional (2-D) image processing (FFT2) within optical systems control. However, timing constraints of a fast optics closed control loop would require a supercomputer to run the software implementation of the FFT2 and its inverse, as well as other image processing representative algorithm, such as numerical image folding and fringe feature extraction. A laboratory supercomputer is not always available even for ground operations and is not feasible for a night project. However, the computationally intensive algorithms still warrant alternative implementation using reconfigurable computing technologies (RC) such as Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), which provide low cost compact super-computing capabilities. We present a new RC hardware implementation and utilization architecture that significantly reduces the computational complexity of a few basic image-processing algorithm, such as FFT2, image folding and phase diversity for the NASA Solar Viewing Interferometer Prototype (SVIP) using a cluster of DSPs and FPGAs. The DSP cluster utilization architecture also assures avoidance of a single point of failure, while using commercially available hardware. This, combined with the control algorithms pre-hardware optimization, or the first time allows construction of image-based 800 Hertz (Hz) optics closed control loops on-board a spacecraft, based on the SVIP ground instrument. That spacecraft is the proposed Earth Atmosphere Solar Occultation Imager (EASI) to study greenhouse gases CO2, C2H, H2O, O3, O2, N2O from Lagrange-2 point in space. This paper provides an advanced insight into a new type of science capabilities for future space exploration missions based on on-board image processing for control and for robotics missions using vision sensors. It presents a top-level description of technologies required for the design and construction of SVIP and EASI and to advance the spatial-spectral imaging and large-scale space interferometry science and engineering.
Simulating three dimensional wave run-up over breakwaters covered by antifer units

NASA Astrophysics Data System (ADS)

Najafi-Jilani, A.; Niri, M. Zakiri; Naderi, Nader

2014-06-01

The paper presents the numerical analysis of wave run-up over rubble-mound breakwaters covered by antifer units using a technique integrating Computer-Aided Design (CAD) and Computational Fluid Dynamics (CFD) software. Direct application of Navier-Stokes equations within armour blocks, is used to provide a more reliable approach to simulate wave run-up over breakwaters. A well-tested Reynolds-averaged Navier-Stokes (RANS) Volume of Fluid (VOF) code (Flow-3D) was adopted for CFD computations. The computed results were compared with experimental data to check the validity of the model. Numerical results showed that the direct three dimensional (3D) simulation method can deliver accurate results for wave run-up over rubble mound breakwaters. The results showed that the placement pattern of antifer units had a great impact on values of wave run-up so that by changing the placement pattern from regular to double pyramid can reduce the wave run-up by approximately 30%. Analysis was done to investigate the influences of surface roughness, energy dissipation in the pores of the armour layer and reduced wave run-up due to inflow into the armour and stone layer.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Endurance running ability at adolescence as a predictor of blood pressure levels and hypertension in men: a 25-year follow-up study.

PubMed

Mikkelsson, L; Kaprio, J; Kautiainen, H; Nupponen, H; Tikkanen, M J; Kujala, U M

2005-01-01

The aim was to study whether aerobic fitness measured by a maximal endurance running test at adolescence predicts prevalence of hypertension or blood pressure levels in adulthood. From the 413 (197 slow runners and 216 fast runners) participating in a 2000-meter running test at adolescence in 1976 and responding to a health and fitness questionnaire in 2001, 29 subjects (15 very slow runners and 14 very fast runners) participated in a clinical follow-up study in 2001. Compared to those who were fast runners in adolescence, those who were slow runners tended to have higher age-adjusted risk of hypertension at follow-up (OR 2.7, 95 % CI 0.9 to 7.5; p=0.07). The result persisted after further adjustment for body mass index at follow-up (OR 2.9, 95 % CI 1.0 to 8.3; p=0.05). Diastolic blood pressure was higher for very slow runners at adolescence compared to very fast runners, the age-adjusted mean diastolic blood pressure being 90 mm Hg (95 % CI 86 to 93) vs. 83 mm Hg (95 % CI 80 to 87), age-adjusted p=0.013. High endurance type fitness in adolescence predicts low risk of hypertension and low resting diastolic blood pressure levels in adult men.
An Epoch of Reionization simulation pipeline based on BEARS

NASA Astrophysics Data System (ADS)

Krause, Fabian; Thomas, Rajat M.; Zaroubi, Saleem; Abdalla, Filipe B.

2018-10-01

The quest to unlock the mysteries of the Epoch of Reionization (EoR) is well poised with many experiments at diverse wavelengths beginning to gather data. Albeit these efforts, we are yet uncertain about the various factors that influence the EoR which include, the nature of the sources, their spectral characteristics (blackbody temperatures, power-law indices), clustering property, efficiency, duty cycle etc. Given these physical uncertainties that define the EoR, we need fast and efficient computational methods to model and analyze the data in order to provide confidence bounds on the parameters that influence the brightness temperature at 21-cm. Towards this goal we developed a pipeline that combines dark matter-only N-body simulations with exact 1-dimensional radiative transfer computations to approximate exact 3-dimensional radiative transfer. Because these simulations are about two to three orders of magnitude faster than the exact 3-dimensional methods, they can be used to explore the parameter space of the EoR systematically. A fast scheme like this pipeline could be incorporated into a Bayesian framework for parameter estimation. In this paper we detail the construction of the pipeline and describe how to use the software which is being made publicly available. We show the results of running the pipeline for four test cases of sources with various spectral energy distributions and compare their outputs using various statistics.
WinSCP for Windows File Transfers | High-Performance Computing | NREL

Science.gov Websites

WinSCP for Windows File Transfers WinSCP for Windows File Transfers WinSCP for can used to securely transfer files between your local computer running Microsoft Windows and a remote computer running Linux
libRoadRunner: a high performance SBML simulation and analysis library

PubMed Central

Somogyi, Endre T.; Bouteiller, Jean-Marie; Glazier, James A.; König, Matthias; Medley, J. Kyle; Swat, Maciej H.; Sauro, Herbert M.

2015-01-01

Motivation: This article presents libRoadRunner, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using Systems Biology Markup Language (SBML). SBML is the most widely used standard for representing dynamic networks, especially biochemical networks. libRoadRunner is fast enough to support large-scale problems such as tissue models, studies that require large numbers of repeated runs and interactive simulations. Results: libRoadRunner is a self-contained library, able to run both as a component inside other tools via its C++ and C bindings, and interactively through its Python interface. Its Python Application Programming Interface (API) is similar to the APIs of MATLAB (www.mathworks.com) and SciPy (http://www.scipy.org/), making it fast and easy to learn. libRoadRunner uses a custom Just-In-Time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a variety of processors, making it appropriate for solving extremely large models or repeated runs. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) including several SBML extensions (composition and distributions). It offers multiple deterministic and stochastic integrators, as well as tools for steady-state analysis, stability analysis and structural analysis of the stoichiometric matrix. Availability and implementation: libRoadRunner binary distributions are available for Mac OS X, Linux and Windows. The library is licensed under Apache License Version 2.0. libRoadRunner is also available for ARM-based computers such as the Raspberry Pi. http://www.libroadrunner.org provides online documentation, full build instructions, binaries and a git source repository. Contacts: hsauro@u.washington.edu or somogyie@indiana.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26085503
libRoadRunner: a high performance SBML simulation and analysis library.

PubMed

Somogyi, Endre T; Bouteiller, Jean-Marie; Glazier, James A; König, Matthias; Medley, J Kyle; Swat, Maciej H; Sauro, Herbert M

2015-10-15

This article presents libRoadRunner, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using Systems Biology Markup Language (SBML). SBML is the most widely used standard for representing dynamic networks, especially biochemical networks. libRoadRunner is fast enough to support large-scale problems such as tissue models, studies that require large numbers of repeated runs and interactive simulations. libRoadRunner is a self-contained library, able to run both as a component inside other tools via its C++ and C bindings, and interactively through its Python interface. Its Python Application Programming Interface (API) is similar to the APIs of MATLAB ( WWWMATHWORKSCOM: ) and SciPy ( HTTP//WWWSCIPYORG/: ), making it fast and easy to learn. libRoadRunner uses a custom Just-In-Time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a variety of processors, making it appropriate for solving extremely large models or repeated runs. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) including several SBML extensions (composition and distributions). It offers multiple deterministic and stochastic integrators, as well as tools for steady-state analysis, stability analysis and structural analysis of the stoichiometric matrix. libRoadRunner binary distributions are available for Mac OS X, Linux and Windows. The library is licensed under Apache License Version 2.0. libRoadRunner is also available for ARM-based computers such as the Raspberry Pi. http://www.libroadrunner.org provides online documentation, full build instructions, binaries and a git source repository. hsauro@u.washington.edu or somogyie@indiana.edu Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
RAPPORT: running scientific high-performance computing applications on the cloud.

PubMed

Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

2013-01-28

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Transient Vibration Prediction for Rotors on Ball Bearings Using Load-dependent Non-linear Bearing Stiffness

NASA Technical Reports Server (NTRS)

Fleming, David P.; Poplawski, J. V.

2002-01-01

Rolling-element bearing forces vary nonlinearly with bearing deflection. Thus an accurate rotordynamic transient analysis requires bearing forces to be determined at each step of the transient solution. Analyses have been carried out to show the effect of accurate bearing transient forces (accounting for non-linear speed and load dependent bearing stiffness) as compared to conventional use of average rolling-element bearing stiffness. Bearing forces were calculated by COBRA-AHS (Computer Optimized Ball and Roller Bearing Analysis - Advanced High Speed) and supplied to the rotordynamics code ARDS (Analysis of Rotor Dynamic Systems) for accurate simulation of rotor transient behavior. COBRA-AHS is a fast-running 5 degree-of-freedom computer code able to calculate high speed rolling-element bearing load-displacement data for radial and angular contact ball bearings and also for cylindrical and tapered roller beatings. Results show that use of nonlinear bearing characteristics is essential for accurate prediction of rotordynamic behavior.
Multicore-based 3D-DWT video encoder

NASA Astrophysics Data System (ADS)

Galiano, Vicente; López-Granado, Otoniel; Malumbres, Manuel P.; Migallón, Hector

2013-12-01

Three-dimensional wavelet transform (3D-DWT) encoders are good candidates for applications like professional video editing, video surveillance, multi-spectral satellite imaging, etc. where a frame must be reconstructed as quickly as possible. In this paper, we present a new 3D-DWT video encoder based on a fast run-length coding engine. Furthermore, we present several multicore optimizations to speed-up the 3D-DWT computation. An exhaustive evaluation of the proposed encoder (3D-GOP-RL) has been performed, and we have compared the evaluation results with other video encoders in terms of rate/distortion (R/D), coding/decoding delay, and memory consumption. Results show that the proposed encoder obtains good R/D results for high-resolution video sequences with nearly in-place computation using only the memory needed to store a group of pictures. After applying the multicore optimization strategies over the 3D DWT, the proposed encoder is able to compress a full high-definition video sequence in real-time.
Oak Ridge National Laboratory Support of Non-light Water Reactor Technologies: Capabilities Assessment for NRC Near-term Implementation Action Plans for Non-light Water Reactors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belles, Randy; Jain, Prashant K.; Powers, Jeffrey J.

The Oak Ridge National Laboratory (ORNL) has a rich history of support for light water reactor (LWR) and non-LWR technologies. The ORNL history involves operation of 13 reactors at ORNL including the graphite reactor dating back to World War II, two aqueous homogeneous reactors, two molten salt reactors (MSRs), a fast-burst health physics reactor, and seven LWRs. Operation of the High Flux Isotope Reactor (HFIR) has been ongoing since 1965. Expertise exists amongst the ORNL staff to provide non-LWR training; support evaluation of non-LWR licensing and safety issues; perform modeling and simulation using advanced computational tools; run laboratory experiments usingmore » equipment such as the liquid salt component test facility; and perform in-depth fuel performance and thermal-hydraulic technology reviews using a vast suite of computer codes and tools. Summaries of this expertise are included in this paper.« less

Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

NASA Astrophysics Data System (ADS)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering

PubMed Central

Sicat, Ronell; Krüger, Jens; Möller, Torsten; Hadwiger, Markus

2015-01-01

This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs. PMID:26146475
GEANT4 distributed computing for compact clusters

NASA Astrophysics Data System (ADS)

Harrawood, Brian P.; Agasthya, Greeshma A.; Lakshmanan, Manu N.; Raterman, Gretchen; Kapadia, Anuj J.

2014-11-01

A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C++ class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client-server work flow model to specify the parameters for each individual run of the simulation. The new g4DistributedRunManager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through-put of a single model.
A novel algorithm for fast grasping of unknown objects using C-shape configuration

NASA Astrophysics Data System (ADS)

Lei, Qujiang; Chen, Guangming; Meijer, Jonathan; Wisse, Martijn

2018-02-01

Increasing grasping efficiency is very important for the robots to grasp unknown objects especially subjected to unfamiliar environments. To achieve this, a new algorithm is proposed based on the C-shape configuration. Specifically, the geometric model of the used under-actuated gripper is approximated as a C-shape. To obtain an appropriate graspable position, this C-shape configuration is applied to fit geometric model of an unknown object. The geometric model of unknown object is constructed by using a single-view partial point cloud. To examine the algorithm using simulations, a comparison of the commonly used motion planners is made. The motion planner with the highest number of solved runs, lowest computing time and the shortest path length is chosen to execute grasps found by this grasping algorithm. The simulation results demonstrate that excellent grasping efficiency is achieved by adopting our algorithm. To validate this algorithm, experiment tests are carried out using a UR5 robot arm and an under-actuated gripper. The experimental results show that steady grasping actions are obtained. Hence, this research provides a novel algorithm for fast grasping of unknown objects.
Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units.

PubMed

Rath, N; Kato, S; Levesque, J P; Mauel, M E; Navratil, G A; Peng, Q

2014-04-01

Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.
Conditions Database for the Belle II Experiment

NASA Astrophysics Data System (ADS)

Wood, L.; Elsethagen, T.; Schram, M.; Stephan, E.

2017-10-01

The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II conditions database was designed with a straightforward goal: make it as easily maintainable as possible. To this end, HEP-specific software tools were avoided as much as possible and industry standard tools used instead. HTTP REST services were selected as the application interface, which provide a high-level interface to users through the use of standard libraries such as curl. The application interface itself is written in Java and runs in an embedded Payara-Micro Java EE application server. Scalability at the application interface is provided by use of Hazelcast, an open source In-Memory Data Grid (IMDG) providing distributed in-memory computing and supporting the creation and clustering of new application interface instances as demand increases. The IMDG provides fast and efficient access to conditions data via in-memory caching.
The MROI fast tip-tilt correction and target acquisition system

NASA Astrophysics Data System (ADS)

Young, John; Buscher, David; Fisher, Martin; Haniff, Christopher; Rea, Alexander; Seneta, Eugene B.; Sun, Xiaowei; Wilson, Donald; Farris, Allen; Olivares, Andres; Selina, Robert

2012-07-01

The fast tip-tilt correction system for the Magdalena Ridge Observatory Interferometer (MROI) is being designed and fabricated by the University of Cambridge. The design of the system is currently at an advanced stage and the performance of its critical subsystems has been verified in the laboratory. The system has been designed to meet a demanding set of specifications including satisfying all performance requirements in ambient temperatures down to -5 °C, maintaining the stability of the tip-tilt fiducial over a 5 °C temperature change without recourse to an optical reference, and a target acquisition mode with a 60” field-of-view. We describe the important technical features of the system, which uses an Andor electron-multiplying CCD camera protected by a thermal enclosure, a transmissive optical system with mounts incorporating passive thermal compensation, and custom control software running under Xenomai real-time Linux. We also report results from laboratory tests that demonstrate (a) the high stability of the custom optic mounts and (b) the low readout and compute latencies that will allow us to achieve a 40 Hz closed-loop bandwidth on bright targets.
Computational Methods for Feedback Controllers for Aerodynamics Flow Applications

DTIC Science & Technology

2007-08-15

Iteration #, and y-translation by: »> Fy=[unf(:,8);runA(:,8);runB(:,8);runC(:,8);runD(:,S); runE (:,8)]; >> Oy-[unf(:,23) ;runA(:,23) ;runB(:,23) ;runC(:,23...runD(:,23) ; runE (:,23)]; >> Iter-[unf(:,1);runA(U ,l);runB(:,l);runC(:,l) ;runD(:,l); runE (:,l)]; >> plot(Fy) Cobalt version 4.0 €blso!,,tic,,. ř-21
Proposal for grid computing for nuclear applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Idris, Faridah Mohamad; Ismail, Saaidi; Haris, Mohd Fauzi B.

2014-02-12

The use of computer clusters for computational sciences including computational physics is vital as it provides computing power to crunch big numbers at a faster rate. In compute intensive applications that requires high resolution such as Monte Carlo simulation, the use of computer clusters in a grid form that supplies computational power to any nodes within the grid that needs computing power, has now become a necessity. In this paper, we described how the clusters running on a specific application could use resources within the grid, to run the applications to speed up the computing process.
Dynamic multistation photometer

DOEpatents

Bauer, Martin L.; Johnson, Wayne F.; Lakomy, Dale G.

1977-01-01

A portable fast analyzer is provided that uses a magnetic clutch/brake to rapidly accelerate the analyzer rotor, and employs a microprocessor for automatic analyzer operation. The rotor is held stationary while the drive motor is run up to speed. When it is desired to mix the sample(s) and reagent(s), the brake is deenergized and the clutch is energized wherein the rotor is very rapidly accelerated to the running speed. The parallel path rotor that is used allows the samples and reagents to be mixed the moment they are spun out into the rotor cuvetes and data acquisition begins immediately. The analyzer will thus have special utility for fast reactions.
Fast running restricts evolutionary change of the vertebral column in mammals.

PubMed

Galis, Frietson; Carrier, David R; van Alphen, Joris; van der Mije, Steven D; Van Dooren, Tom J M; Metz, Johan A J; ten Broek, Clara M A

2014-08-05

The mammalian vertebral column is highly variable, reflecting adaptations to a wide range of lifestyles, from burrowing in moles to flying in bats. However, in many taxa, the number of trunk vertebrae is surprisingly constant. We argue that this constancy results from strong selection against initial changes of these numbers in fast running and agile mammals, whereas such selection is weak in slower-running, sturdier mammals. The rationale is that changes of the number of trunk vertebrae require homeotic transformations from trunk into sacral vertebrae, or vice versa, and mutations toward such transformations generally produce transitional lumbosacral vertebrae that are incompletely fused to the sacrum. We hypothesize that such incomplete homeotic transformations impair flexibility of the lumbosacral joint and thereby threaten survival in species that depend on axial mobility for speed and agility. Such transformations will only marginally affect performance in slow, sturdy species, so that sufficient individuals with transitional vertebrae survive to allow eventual evolutionary changes of trunk vertebral numbers. We present data on fast and slow carnivores and artiodactyls and on slow afrotherians and monotremes that strongly support this hypothesis. The conclusion is that the selective constraints on the count of trunk vertebrae stem from a combination of developmental and biomechanical constraints.
Voluntary resistance running with short distance enhances spatial memory related to hippocampal BDNF signaling.

PubMed

Lee, Min Chul; Okamoto, Masahiro; Liu, Yu Fan; Inoue, Koshiro; Matsui, Takashi; Nogami, Haruo; Soya, Hideaki

2012-10-15

Although voluntary running has beneficial effects on hippocampal cognitive functions if done abundantly, it is still uncertain whether resistance running would be the same. For this purpose, voluntary resistance wheel running (RWR) with a load is a suitable model, since it allows increased work levels and resultant muscular adaptation in fast-twitch muscle. Here, we examined whether RWR would have potential effects on hippocampal cognitive functions with enhanced hippocampal brain-derived neurotrophic factor (BDNF), as does wheel running without a load (WR). Ten-week-old male Wistar rats were assigned randomly to sedentary (Sed), WR, and RWR (to a maximum load of 30% of body weight) groups for 4 wk. We found that in RWR, work levels increased with load, but running distance decreased by about half, which elicited muscular adaptation for fast-twitch plantaris muscle without causing any negative stress effects. Both RWR and WR led to improved spatial learning and memory as well as gene expressions of hippocampal BDNF signaling-related molecules. RWR increased hippocampal BDNF, tyrosine-related kinase B (TrkB), and cAMP response element-binding (CREB) protein levels, whereas WR increased only BDNF. With both exercise groups, there were correlations between spatial memory and BDNF protein (r = 0.41), p-CREB protein (r = 0.44), and work levels (r = 0.77). These results suggest that RWR plays a beneficial role in hippocampus-related cognitive functions associated with hippocampal BDNF signaling, even with short distances, and that work levels rather than running distance are more determinant of exercise-induced beneficial effects in wheel running with and without a load.
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed

2011-05-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units—Radial Distribution Function Histogramming

PubMed Central

Stone, John E.; Kohlmeyer, Axel

2011-01-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
Multiple elastic scattering of electrons in condensed matter

NASA Astrophysics Data System (ADS)

Jablonski, A.

2017-01-01

Since the 1940s, much attention has been devoted to the problem of accurate theoretical description of electron transport in condensed matter. The needed information for describing different aspects of the electron transport is the angular distribution of electron directions after multiple elastic collisions. This distribution can be expanded into a series of Legendre polynomials with coefficients, Al. In the present work, a database of these coefficients for all elements up to uranium (Z=92) and a dense grid of electron energies varying from 50 to 5000 eV has been created. The database makes possible the following applications: (i) accurate interpolation of coefficients Al for any element and any energy from the above range, (ii) fast calculations of the differential and total elastic-scattering cross sections, (iii) determination of the angular distribution of directions after multiple collisions, (iv) calculations of the probability of elastic backscattering from solids, and (v) calculations of the calibration curves for determination of the inelastic mean free paths of electrons. The last two applications provide data with comparable accuracy to Monte Carlo simulations, yet the running time is decreased by several orders of magnitude. All of the above applications are implemented in the Fortran program MULTI_SCATT. Numerous illustrative runs of this program are described. Despite a relatively large volume of the database of coefficients Al, the program MULTI_SCATT can be readily run on personal computers.
Grid Oriented Implementation of the Tephra Model

NASA Astrophysics Data System (ADS)

Coltelli, M.; D'Agostino, M.; Drago, A.; Pistagna, F.; Prestifilippo, M.; Reitano, D.; Scollo, S.; Spata, G.

2009-04-01

TEPHRA is a two dimensional advection-diffusion model implemented by Bonadonna et al. [2005] that describes the sedimentation process of particles from volcanic plumes. The model is used by INGV - Istituto Nazionale di Geofisica e Vulcanologia, Sezione di Catania, to forecast tephra dispersion during Etna volcanic events. Every day weather forecast provided by the Italian Air Force Meteorological Office in Rome and by the hydrometeorological service of ARPA in Emilia Romagna are processed by TEPHRA model with other volcanological parameters to simulate two different eruptive scenarios of Mt. Etna (corresponding to 1998 and 2002-03 Etna eruptions). The model outputs are plotted on maps and transferred to Civil Protection which takes the trouble to give public warnings and plan mitigation measures. The TEPHRA model is implemented in ANSI-C code using MPI commands to maximize parallel computation. Actually the model runs on an INGV Beowulf cluster. In order to provide better performances we worked on porting it to PI2S2 sicilian grid infrastructure inside the "PI2S2 Project" (2006-2008). We configured the application to run on grid, using Glite middleware, analyzed the obtained performances and comparing them with ones obtained on the local cluster. As TEPHRA needs to be run in a short time in order to transfer fastly the dispersion maps to Civil Protection, we also worked to minimize and stabilize grid job-scheduling time by using customized high-priority queues called Emergency Queue.
Simultaneous Quantitative Detection of Helicobacter Pylori Based on a Rapid and Sensitive Testing Platform using Quantum Dots-Labeled Immunochromatiographic Test Strips

NASA Astrophysics Data System (ADS)

Zheng, Yu; Wang, Kan; Zhang, Jingjing; Qin, Weijian; Yan, Xinyu; Shen, Guangxia; Gao, Guo; Pan, Fei; Cui, Daxiang

2016-02-01

Quantum dots-labeled urea-enzyme antibody-based rapid immunochromatographic test strips have been developed as quantitative fluorescence point-of-care tests (POCTs) to detect helicobacter pylori. Presented in this study is a new test strip reader designed to run on tablet personal computers (PCs), which is portable for outdoor detection even without an alternating current (AC) power supply. A Wi-Fi module was integrated into the reader to improve its portability. Patient information was loaded by a barcode scanner, and an application designed to run on tablet PCs was developed to handle the acquired images. A vision algorithm called Kmeans was used for picture processing. Different concentrations of various human blood samples were tested to evaluate the stability and accuracy of the fabricated device. Results demonstrate that the reader can provide an easy, rapid, simultaneous, quantitative detection for helicobacter pylori. The proposed test strip reader has a lighter weight than existing detection readers, and it can run for long durations without an AC power supply, thus verifying that it possesses advantages for outdoor detection. Given its fast detection speed and high accuracy, the proposed reader combined with quantum dots-labeled test strips is suitable for POCTs and owns great potential in applications such as screening patients with infection of helicobacter pylori, etc. in near future.
SSL - THE SIMPLE SOCKETS LIBRARY

NASA Technical Reports Server (NTRS)

Campbell, C. E.

1994-01-01

The Simple Sockets Library (SSL) allows C programmers to develop systems of cooperating programs using Berkeley streaming Sockets running under the TCP/IP protocol over Ethernet. The SSL provides a simple way to move information between programs running on the same or different machines and does so with little overhead. The SSL can create three types of Sockets: namely a server, a client, and an accept Socket. The SSL's Sockets are designed to be used in a fashion reminiscent of the use of FILE pointers so that a C programmer who is familiar with reading and writing files will immediately feel comfortable with reading and writing with Sockets. The SSL consists of three parts: the library, PortMaster, and utilities. The user of the SSL accesses it by linking programs to the SSL library. The PortMaster initializes connections between clients and servers. The PortMaster also supports a "firewall" facility to keep out socket requests from unapproved machines. The "firewall" is a file which contains Internet addresses for all approved machines. There are three utilities provided with the SSL. SKTDBG can be used to debug programs that make use of the SSL. SPMTABLE lists the servers and port numbers on requested machine(s). SRMSRVR tells the PortMaster to forcibly remove a server name from its list. The package also includes two example programs: multiskt.c, which makes multiple accepts on one server, and sktpoll.c, which repeatedly attempts to connect a client to some server at one second intervals. SSL is a machine independent library written in the C-language for computers connected via Ethernet using the TCP/IP protocol. It has been successfully compiled and implemented on a variety of platforms, including Sun series computers running SunOS, DEC VAX series computers running VMS, SGI computers running IRIX, DECstations running ULTRIX, DEC alpha AXPs running OSF/1, IBM RS/6000 computers running AIX, IBM PC and compatibles running BSD/386 UNIX and HP Apollo 3000/4000/9000/400T computers running HP-UX. SSL requires 45K of RAM to run under SunOS and 80K of RAM to run under VMS. For use on IBM PC series computers and compatibles running DOS, SSL requires Microsoft C 6.0 and the Wollongong TCP/IP package. Source code for sample programs and debugging tools are provided. The documentation is available on the distribution medium in TeX and PostScript formats. The standard distribution medium for SSL is a .25 inch streaming magnetic tape cartridge (QIC-24) in UNIX tar format. It is also available on a 3.5 inch diskette in UNIX tar format and a 5.25 inch 360K MS-DOS format diskette. The SSL was developed in 1992 and was updated in 1993.
Japan’s Nuclear Future: Policy Debate, Prospects, and U.S. Interests

DTIC Science & Technology

2008-05-09

raised in particular over the construction of an industrial- scale reprocessing facility in Japan,. Additionally, fast breeder reactors also produce more...Nuclear Fuel Cycle Engineering Laboratories. 10 A fast breeder reactor is a fast neutron reactor that produces more plutonium than it consumes, which can...Japan Nuclear Fuel Limited (JNFL) has built and is currently running active testing on a large - scale commercial reprocessing plant at Rokkasho-mura
Cross-Approximate Entropy parallel computation on GPUs for biomedical signal analysis. Application to MEG recordings.

PubMed

Martínez-Zarzuela, Mario; Gómez, Carlos; Díaz-Pernas, Francisco Javier; Fernández, Alberto; Hornero, Roberto

2013-10-01

Cross-Approximate Entropy (Cross-ApEn) is a useful measure to quantify the statistical dissimilarity of two time series. In spite of the advantage of Cross-ApEn over its one-dimensional counterpart (Approximate Entropy), only a few studies have applied it to biomedical signals, mainly due to its high computational cost. In this paper, we propose a fast GPU-based implementation of the Cross-ApEn that makes feasible its use over a large amount of multidimensional data. The scheme followed is fully scalable, thus maximizes the use of the GPU despite of the number of neural signals being processed. The approach consists in processing many trials or epochs simultaneously, with independence of its origin. In the case of MEG data, these trials can proceed from different input channels or subjects. The proposed implementation achieves an average speedup greater than 250× against a CPU parallel version running on a processor containing six cores. A dataset of 30 subjects containing 148 MEG channels (49 epochs of 1024 samples per channel) can be analyzed using our development in about 30min. The same processing takes 5 days on six cores and 15 days when running on a single core. The speedup is much larger if compared to a basic sequential Matlab(®) implementation, that would need 58 days per subject. To our knowledge, this is the first contribution of Cross-ApEn measure computation using GPUs. This study demonstrates that this hardware is, to the day, the best option for the signal processing of biomedical data with Cross-ApEn. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

An algorithm for fast elastic wave simulation using a vectorized finite difference operator

NASA Astrophysics Data System (ADS)

Malkoti, Ajay; Vedanti, Nimisha; Tiwari, Ram Krishna

2018-07-01

Modern geophysical imaging techniques exploit the full wavefield information which can be simulated numerically. These numerical simulations are computationally expensive due to several factors, such as a large number of time steps and nodes, big size of the derivative stencil and huge model size. Besides these constraints, it is also important to reformulate the numerical derivative operator for improved efficiency. In this paper, we have introduced a vectorized derivative operator over the staggered grid with shifted coordinate systems. The operator increases the efficiency of simulation by exploiting the fact that each variable can be represented in the form of a matrix. This operator allows updating all nodes of a variable defined on the staggered grid, in a manner similar to the collocated grid scheme and thereby reducing the computational run-time considerably. Here we demonstrate an application of this operator to simulate the seismic wave propagation in elastic media (Marmousi model), by discretizing the equations on a staggered grid. We have compared the performance of this operator on three programming languages, which reveals that it can increase the execution speed by a factor of at least 2-3 times for FORTRAN and MATLAB; and nearly 100 times for Python. We have further carried out various tests in MATLAB to analyze the effect of model size and the number of time steps on total simulation run-time. We find that there is an additional, though small, computational overhead for each step and it depends on total number of time steps used in the simulation. A MATLAB code package, 'FDwave', for the proposed simulation scheme is available upon request.
BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs

PubMed Central

Eklund, Anders; Dufort, Paul; Villani, Mattias; LaConte, Stephen

2014-01-01

Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/). PMID:24672471
Simulation of LHC events on a millions threads

NASA Astrophysics Data System (ADS)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

2015-12-01

Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest supercomputer in the world, can run roughly five times the number of parallel processes that the ATLAS experiment typically uses on the Grid. We ported Alpgen, a serial x86 code, to run as a parallel application under MPI on the Blue Gene/Q architecture. By analysis of the Alpgen code, we reduced the memory footprint to allow running 64 threads per node, utilizing the four hardware threads available per core on the PowerPC A2 processor. Event generation and unweighting, typically run as independent serial phases, are coupled together in a single job in this scenario, reducing intermediate writes to the filesystem. By these optimizations, we have successfully run LHC proton-proton physics event generation at the scale of a million threads, filling two-thirds of Mira.
Running Jobs on the Peregrine System | High-Performance Computing | NREL

Science.gov Websites

on the Peregrine high-performance computing (HPC) system. Running Different Types of Jobs Batch jobs scheduling policies - queue names, limits, etc. Requesting different node types Sample batch scripts
Fast words boundaries localization in text fields for low quality document images

NASA Astrophysics Data System (ADS)

Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry

2018-04-01

The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3
A fast - Monte Carlo toolkit on GPU for treatment plan dose recalculation in proton therapy

NASA Astrophysics Data System (ADS)

Senzacqua, M.; Schiavi, A.; Patera, V.; Pioli, S.; Battistoni, G.; Ciocca, M.; Mairani, A.; Magro, G.; Molinelli, S.

2017-10-01

In the context of the particle therapy a crucial role is played by Treatment Planning Systems (TPSs), tools aimed to compute and optimize the tratment plan. Nowadays one of the major issues related to the TPS in particle therapy is the large CPU time needed. We developed a software toolkit (FRED) for reducing dose recalculation time by exploiting Graphics Processing Units (GPU) hardware. Thanks to their high parallelization capability, GPUs significantly reduce the computation time, up to factor 100 respect to a standard CPU running software. The transport of proton beams in the patient is accurately described through Monte Carlo methods. Physical processes reproduced are: Multiple Coulomb Scattering, energy straggling and nuclear interactions of protons with the main nuclei composing the biological tissues. FRED toolkit does not rely on the water equivalent translation of tissues, but exploits the Computed Tomography anatomical information by reconstructing and simulating the atomic composition of each crossed tissue. FRED can be used as an efficient tool for dose recalculation, on the day of the treatment. In fact it can provide in about one minute on standard hardware the dose map obtained combining the treatment plan, earlier computed by the TPS, and the current patient anatomic arrangement.
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

PubMed Central

Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

2015-01-01

Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Geant4 Computing Performance Benchmarking and Monitoring

DOE PAGES

Dotti, Andrea; Elvira, V. Daniel; Folger, Gunter; ...

2015-12-23

Performance evaluation and analysis of large scale computing applications is essential for optimal use of resources. As detector simulation is one of the most compute intensive tasks and Geant4 is the simulation toolkit most widely used in contemporary high energy physics (HEP) experiments, it is important to monitor Geant4 through its development cycle for changes in computing performance and to identify problems and opportunities for code improvements. All Geant4 development and public releases are being profiled with a set of applications that utilize different input event samples, physics parameters, and detector configurations. Results from multiple benchmarking runs are compared tomore » previous public and development reference releases to monitor CPU and memory usage. Observed changes are evaluated and correlated with code modifications. Besides the full summary of call stack and memory footprint, a detailed call graph analysis is available to Geant4 developers for further analysis. The set of software tools used in the performance evaluation procedure, both in sequential and multi-threaded modes, include FAST, IgProf and Open|Speedshop. In conclusion, the scalability of the CPU time and memory performance in multi-threaded application is evaluated by measuring event throughput and memory gain as a function of the number of threads for selected event samples.« less
The Impact of Odor--Reward Memory on Chemotaxis in Larval "Drosophila"

ERIC Educational Resources Information Center

Schleyer, Michael; Reid, Samuel F.; Pamir, Evren; Saumweber, Timo; Paisios, Emmanouil; Davies, Alexander; Gerber, Bertram; Louis, Matthieu

2015-01-01

How do animals adaptively integrate innate with learned behavioral tendencies? We tackle this question using chemotaxis as a paradigm. Chemotaxis in the "Drosophila" larva largely results from a sequence of runs and oriented turns. Thus, the larvae minimally need to determine (i) how fast to run, (ii) when to initiate a turn, and (iii)…
The Red Queen Visits Minkowski Space

ERIC Educational Resources Information Center

Low, Robert J.

2007-01-01

When Alice went "Through the Looking Glass", she found herself in a situation where she had to run as fast as she could in order to stay still. In accordance with the dictum that truth is stranger than fiction, we will see that it is possible to find a situation in special relativity where running towards one's target is actually…
DAMQT: A package for the analysis of electron density in molecules

NASA Astrophysics Data System (ADS)

López, Rafael; Rico, Jaime Fernández; Ramírez, Guillermo; Ema, Ignacio; Zorrilla, David

2009-09-01

DAMQT is a package for the analysis of the electron density in molecules and the fast computation of the density, density deformations, electrostatic potential and field, and Hellmann-Feynman forces. The method is based on the partition of the electron density into atomic fragments by means of a least deformation criterion. Each atomic fragment of the density is expanded in regular spherical harmonics times radial factors, which are piecewise represented in terms of analytical functions. This representation is used for the fast evaluation of the electrostatic potential and field generated by the electron density and nuclei, as well as for the computation of the Hellmann-Feynman forces on the nuclei. An analysis of the atomic and molecular deformations of the density can be also carried out, yielding a picture that connects with several concepts of the empirical structural chemistry. Program summaryProgram title: DAMQT1.0 Catalogue identifier: AEDL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv3 No. of lines in distributed program, including test data, etc.: 278 356 No. of bytes in distributed program, including test data, etc.: 31 065 317 Distribution format: tar.gz Programming language: Fortran90 and C++ Computer: Any Operating system: Linux, Windows (Xp, Vista) RAM: 190 Mbytes Classification: 16.1 External routines: Trolltech's Qt (4.3 or higher) ( http://www.qtsoftware.com/products), OpenGL (1.1 or higher) ( http://www.opengl.org/), GLUT 3.7 ( http://www.opengl.org/resources/libraries/glut/). Nature of problem: Analysis of the molecular electron density and density deformations, including fast evaluation of electrostatic potential, electric field and Hellmann-Feynman forces on nuclei. Solution method: The method of Deformed Atoms in Molecules, reported elsewhere [1], is used for partitioning the molecular electron density into atomic fragments, which are further expanded in spherical harmonics times radial factors. The partition is used for defining molecular density deformations and for the fast calculation of several properties associated to density. Restrictions: The current version is limited to 120 atoms, 2000 contracted functions, and l=5 in basis functions. Density must come from a LCAO calculation (any level) with spherical (not Cartesian) Gaussian functions. Unusual features: The program contains an OPEN statement to binary files (stream) in file GOPENMOL.F90. This statement has not a standard syntax in Fortran 90. Two possibilities are considered in conditional compilation: Intel's ifort and Fortran2003 standard. This latter is applied to compilers other than ifort (gfortran uses this one, for instance). Additional comments: The distribution file for this program is over 30 Mbytes and therefore is not delivered directly when download or e-mail is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Largely dependent on the system size and the module run (from fractions of a second to hours). References: [1] J. Fernández Rico, R. López, I. Ema, G. Ramírez, J. Mol. Struct. (Theochem) 727 (2005) 115.
Using MathCad to Evaluate Exact Integral Formulations of Spacecraft Orbital Heats for Primitive Surfaces at Any Orientation

NASA Technical Reports Server (NTRS)

Pinckney, John

2010-01-01

With the advent of high speed computing Monte Carlo ray tracing techniques has become the preferred method for evaluating spacecraft orbital heats. Monte Carlo has its greatest advantage where there are many interacting surfaces. However Monte Carlo programs are specialized programs that suffer from some inaccuracy, long calculation times and high purchase cost. A general orbital heating integral is presented here that is accurate, fast and runs on MathCad, a generally available engineering mathematics program. The integral is easy to read, understand and alter. The integral can be applied to unshaded primitive surfaces at any orientation. The method is limited to direct heating calculations. This integral formulation can be used for quick orbit evaluations and spot checking Monte Carlo results.
a Virtual Trip to the Schwarzschild-De Sitter Black Hole

NASA Astrophysics Data System (ADS)

Bakala, Pavel; Hledík, Stanislav; Stuchlík, Zdenĕk; Truparová, Kamila; Čermák, Petr

2008-09-01

We developed realistic fully general relativistic computer code for simulation of optical projection in a strong, spherically symmetric gravitational field. Standard theoretical analysis of optical projection for an observer in the vicinity of a Schwarzschild black hole is extended to black hole spacetimes with a repulsive cosmological constant, i.e, Schwarzschild-de Sitter (SdS) spacetimes. Influence of the cosmological constant is investigated for static observers and observers radially free-falling from static radius. Simulation includes effects of gravitational lensing, multiple images, Doppler and gravitational frequency shift, as well as the amplification of intensity. The code generates images of static observers sky and a movie simulations for radially free-falling observers. Techniques of parallel programming are applied to get high performance and fast run of the simulation code.
Constraints on the pre-impact orbits of Solar system giant impactors

NASA Astrophysics Data System (ADS)

Jackson, Alan P.; Gabriel, Travis S. J.; Asphaug, Erik I.

2018-03-01

We provide a fast method for computing constraints on impactor pre-impact orbits, applying this to the late giant impacts in the Solar system. These constraints can be used to make quick, broad comparisons of different collision scenarios, identifying some immediately as low-probability events, and narrowing the parameter space in which to target follow-up studies with expensive N-body simulations. We benchmark our parameter space predictions, finding good agreement with existing N-body studies for the Moon. We suggest that high-velocity impact scenarios in the inner Solar system, including all currently proposed single impact scenarios for the formation of Mercury, should be disfavoured. This leaves a multiple hit-and-run scenario as the most probable currently proposed for the formation of Mercury.
Computational Approaches to Simulation and Optimization of Global Aircraft Trajectories

NASA Technical Reports Server (NTRS)

Ng, Hok Kwan; Sridhar, Banavar

2016-01-01

This study examines three possible approaches to improving the speed in generating wind-optimal routes for air traffic at the national or global level. They are: (a) using the resources of a supercomputer, (b) running the computations on multiple commercially available computers and (c) implementing those same algorithms into NASAs Future ATM Concepts Evaluation Tool (FACET) and compares those to a standard implementation run on a single CPU. Wind-optimal aircraft trajectories are computed using global air traffic schedules. The run time and wait time on the supercomputer for trajectory optimization using various numbers of CPUs ranging from 80 to 10,240 units are compared with the total computational time for running the same computation on a single desktop computer and on multiple commercially available computers for potential computational enhancement through parallel processing on the computer clusters. This study also re-implements the trajectory optimization algorithm for further reduction of computational time through algorithm modifications and integrates that with FACET to facilitate the use of the new features which calculate time-optimal routes between worldwide airport pairs in a wind field for use with existing FACET applications. The implementations of trajectory optimization algorithms use MATLAB, Python, and Java programming languages. The performance evaluations are done by comparing their computational efficiencies and based on the potential application of optimized trajectories. The paper shows that in the absence of special privileges on a supercomputer, a cluster of commercially available computers provides a feasible approach for national and global air traffic system studies.
Statistical Compression for Climate Model Output

NASA Astrophysics Data System (ADS)

Hammerling, D.; Guinness, J.; Soh, Y. J.

2017-12-01

Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.
PCEMCAN - Probabilistic Ceramic Matrix Composites Analyzer: User's Guide, Version 1.0

NASA Technical Reports Server (NTRS)

Shah, Ashwin R.; Mital, Subodh K.; Murthy, Pappu L. N.

1998-01-01

PCEMCAN (Probabalistic CEramic Matrix Composites ANalyzer) is an integrated computer code developed at NASA Lewis Research Center that simulates uncertainties associated with the constituent properties, manufacturing process, and geometric parameters of fiber reinforced ceramic matrix composites and quantifies their random thermomechanical behavior. The PCEMCAN code can perform the deterministic as well as probabilistic analyses to predict thermomechanical properties. This User's guide details the step-by-step procedure to create input file and update/modify the material properties database required to run PCEMCAN computer code. An overview of the geometric conventions, micromechanical unit cell, nonlinear constitutive relationship and probabilistic simulation methodology is also provided in the manual. Fast probability integration as well as Monte-Carlo simulation methods are available for the uncertainty simulation. Various options available in the code to simulate probabilistic material properties and quantify sensitivity of the primitive random variables have been described. The description of deterministic as well as probabilistic results have been described using demonstration problems. For detailed theoretical description of deterministic and probabilistic analyses, the user is referred to the companion documents "Computational Simulation of Continuous Fiber-Reinforced Ceramic Matrix Composite Behavior," NASA TP-3602, 1996 and "Probabilistic Micromechanics and Macromechanics for Ceramic Matrix Composites", NASA TM 4766, June 1997.
FastMag: Fast micromagnetic simulator for complex magnetic structures (invited)

NASA Astrophysics Data System (ADS)

Chang, R.; Li, S.; Lubarda, M. V.; Livshitz, B.; Lomakin, V.

2011-04-01

A fast micromagnetic simulator (FastMag) for general problems is presented. FastMag solves the Landau-Lifshitz-Gilbert equation and can handle multiscale problems with a high computational efficiency. The simulator derives its high performance from efficient methods for evaluating the effective field and from implementations on massively parallel graphics processing unit (GPU) architectures. FastMag discretizes the computational domain into tetrahedral elements and therefore is highly flexible for general problems. The magnetostatic field is computed via the superposition principle for both volume and surface parts of the computational domain. This is accomplished by implementing efficient quadrature rules and analytical integration for overlapping elements in which the integral kernel is singular. Thus, discretized superposition integrals are computed using a nonuniform grid interpolation method, which evaluates the field from N sources at N collocated observers in O(N) operations. This approach allows handling objects of arbitrary shape, allows easily calculating of the field outside the magnetized domains, does not require solving a linear system of equations, and requires little memory. FastMag is implemented on GPUs with ?> GPU-central processing unit speed-ups of 2 orders of magnitude. Simulations are shown of a large array of magnetic dots and a recording head fully discretized down to the exchange length, with over a hundred million tetrahedral elements on an inexpensive desktop computer.
WinHPC System | High-Performance Computing | NREL

Science.gov Websites

System WinHPC System NREL's WinHPC system is a computing cluster running the Microsoft Windows operating system. It allows users to run jobs requiring a Windows environment such as ANSYS and MATLAB
Cloud-based calculators for fast and reliable access to NOAA's geomagnetic field models

NASA Astrophysics Data System (ADS)

Woods, A.; Nair, M. C.; Boneh, N.; Chulliat, A.

2017-12-01

While the Global Positioning System (GPS) provides accurate point locations, it does not provide pointing directions. Therefore, the absolute directional information provided by the Earth's magnetic field is of primary importance for navigation and for the pointing of technical devices such as aircrafts, satellites and lately, mobile phones. The major magnetic sources that affect compass-based navigation are the Earth's core, its magnetized crust and the electric currents in the ionosphere and magnetosphere. NOAA/CIRES Geomagnetism (ngdc.noaa.gov/geomag/) group develops and distributes models that describe all these important sources to aid navigation. Our geomagnetic models are used in variety of platforms including airplanes, ships, submarines and smartphones. While the magnetic field from Earth's core can be described in relatively fewer parameters and is suitable for offline computation, the magnetic sources from Earth's crust, ionosphere and magnetosphere require either significant computational resources or real-time capabilities and are not suitable for offline calculation. This is especially important for small navigational devices or embedded systems, where computational resources are limited. Recognizing the need for a fast and reliable access to our geomagnetic field models, we developed cloud-based application program interfaces (APIs) for NOAA's ionospheric and magnetospheric magnetic field models. In this paper we will describe the need for reliable magnetic calculators, the challenges faced in running geomagnetic field models in the cloud in real-time and the feedback from our user community. We discuss lessons learned harvesting and validating the data which powers our cloud services, as well as our strategies for maintaining near real-time service, including load-balancing, real-time monitoring, and instance cloning. We will also briefly talk about the progress we achieved on NOAA's Big Earth Data Initiative (BEDI) funded project to develop API interface to our Enhanced Magnetic Model (EMM).

Sythesis of MCMC and Belief Propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Sungsoo; Chertkov, Michael; Shin, Jinwoo

Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus (LC) approach whichmore » allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we first propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops. Furthermore, we also design an efficient rejection-free MCMC scheme for approximating the full series. The main novelty underlying our design is in utilizing the concept of cycle basis, which provides an efficient decomposition of the generalized loops. In essence, the proposed MCMC schemes run on transformed GM built upon the non-trivial BP solution, and our experiments show that this synthesis of BP and MCMC outperforms both direct MCMC and bare BP schemes.« less
Improving fast generation of halo catalogues with higher order Lagrangian perturbation theory

NASA Astrophysics Data System (ADS)

Munari, Emiliano; Monaco, Pierluigi; Sefusatti, Emiliano; Castorina, Emanuele; Mohammad, Faizan G.; Anselmi, Stefano; Borgani, Stefano

2017-03-01

We present the latest version of PINOCCHIO, a code that generates catalogues of dark matter haloes in an approximate but fast way with respect to an N-body simulation. This code version implements a new on-the-fly production of halo catalogue on the past light cone with continuous time sampling, and the computation of particle and halo displacements are extended up to third-order Lagrangian perturbation theory (LPT), in contrast with previous versions that used Zel'dovich approximation. We run PINOCCHIO on the same initial configuration of a reference N-body simulation, so that the comparison extends to the object-by-object level. We consider haloes at redshifts 0 and 1, using different LPT orders either for halo construction or to compute halo final positions. We compare the clustering properties of PINOCCHIO haloes with those from the simulation by computing the power spectrum and two-point correlation function in real and redshift space (monopole and quadrupole), the bispectrum and the phase difference of halo distributions. We find that 2LPT and 3LPT give noticeable improvement. 3LPT provides the best agreement with N-body when it is used to displace haloes, while 2LPT gives better results for constructing haloes. At the highest orders, linear bias is typically recovered at a few per cent level. In Fourier space and using 3LPT for halo displacements, the halo power spectrum is recovered to within 10 per cent up to kmax ∼ 0.5 h Mpc-1. The results presented in this paper have interesting implications for the generation of large ensemble of mock surveys for the scientific exploitation of data from big surveys.
Fast running restricts evolutionary change of the vertebral column in mammals

PubMed Central

Galis, Frietson; Carrier, David R.; van Alphen, Joris; van der Mije, Steven D.; Van Dooren, Tom J. M.; Metz, Johan A. J.; ten Broek, Clara M. A.

2014-01-01

The mammalian vertebral column is highly variable, reflecting adaptations to a wide range of lifestyles, from burrowing in moles to flying in bats. However, in many taxa, the number of trunk vertebrae is surprisingly constant. We argue that this constancy results from strong selection against initial changes of these numbers in fast running and agile mammals, whereas such selection is weak in slower-running, sturdier mammals. The rationale is that changes of the number of trunk vertebrae require homeotic transformations from trunk into sacral vertebrae, or vice versa, and mutations toward such transformations generally produce transitional lumbosacral vertebrae that are incompletely fused to the sacrum. We hypothesize that such incomplete homeotic transformations impair flexibility of the lumbosacral joint and thereby threaten survival in species that depend on axial mobility for speed and agility. Such transformations will only marginally affect performance in slow, sturdy species, so that sufficient individuals with transitional vertebrae survive to allow eventual evolutionary changes of trunk vertebral numbers. We present data on fast and slow carnivores and artiodactyls and on slow afrotherians and monotremes that strongly support this hypothesis. The conclusion is that the selective constraints on the count of trunk vertebrae stem from a combination of developmental and biomechanical constraints. PMID:25024205
High Performance Computing (HPC) Innovation Service Portal Pilots Cloud Computing (HPC-ISP Pilot Cloud Computing)

DTIC Science & Technology

2011-08-01

5 Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis...classification of streaming data. Example input images (top left). All digit prototypes (cluster centers) found, with size proportional to frequency (top...Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis 1 http
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

PubMed

Ren, Shaoqing; He, Kaiming; Girshick, Ross; Sun, Jian

2017-06-01

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
Active Flash: Performance-Energy Tradeoffs for Out-of-Core Processing on Non-Volatile Memory Devices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boboila, Simona; Kim, Youngjae; Vazhkudai, Sudharshan S

2012-01-01

In this abstract, we study the performance and energy tradeoffs involved in migrating data analysis into the flash device, a process we refer to as Active Flash. The Active Flash paradigm is similar to 'active disks', which has received considerable attention. Active Flash allows us to move processing closer to data, thereby minimizing data movement costs and reducing power consumption. It enables true out-of-core computation. The conventional definition of out-of-core solvers refers to an approach to process data that is too large to fit in the main memory and, consequently, requires access to disk. However, in Active Flash, processing outsidemore » the host CPU literally frees the core and achieves real 'out-of-core' analysis. Moving analysis to data has long been desirable, not just at this level, but at all levels of the system hierarchy. However, this requires a detailed study on the tradeoffs involved in achieving analysis turnaround under an acceptable energy envelope. To this end, we first need to evaluate if there is enough computing power on the flash device to warrant such an exploration. Flash processors require decent computing power to run the internal logic pertaining to the Flash Translation Layer (FTL), which is responsible for operations such as address translation, garbage collection (GC) and wear-leveling. Modern SSDs are composed of multiple packages and several flash chips within a package. The packages are connected using multiple I/O channels to offer high I/O bandwidth. SSD computing power is also expected to be high enough to exploit such inherent internal parallelism within the drive to increase the bandwidth and to handle fast I/O requests. More recently, SSD devices are being equipped with powerful processing units and are even embedded with multicore CPUs (e.g. ARM Cortex-A9 embedded processor is advertised to reach 2GHz frequency and deliver 5000 DMIPS; OCZ RevoDrive X2 SSD has 4 SandForce controllers, each with 780MHz max frequency Tensilica core). Efforts that take advantage of the available computing cycles on the processors on SSDs to run auxiliary tasks other than actual I/O requests are beginning to emerge. Kim et al. investigate database scan operations in the context of processing on the SSDs, and propose dedicated hardware logic to speed up scans. Also, cluster architectures have been explored, which consist of low-power embedded CPUs coupled with small local flash to achieve fast, parallel access to data. Processor utilization on SSD is highly dependent on workloads and, therefore, they can be idle during periods with no I/O accesses. We propose to use the available processing capability on the SSD to run tasks that can be offloaded from the host. This paper makes the following contributions: (1) We have investigated Active Flash and its potential to optimize the total energy cost, including power consumption on the host and the flash device; (2) We have developed analytical models to analyze the performance-energy tradeoffs for Active Flash, by treating the SSD as a blackbox, this is particularly valuable due to the proprietary nature of the SSD internal hardware; and (3) We have enhanced a well-known SSD simulator (from MSR) to implement 'on-the-fly' data compression using Active Flash. Our results provide a window into striking a balance between energy consumption and application performance.« less
INDDGO: Integrated Network Decomposition & Dynamic programming for Graph Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groer, Christopher S; Sullivan, Blair D; Weerapurage, Dinesh P

2012-10-01

It is well-known that dynamic programming algorithms can utilize tree decompositions to provide a way to solve some \\emph{NP}-hard problems on graphs where the complexity is polynomial in the number of nodes and edges in the graph, but exponential in the width of the underlying tree decomposition. However, there has been relatively little computational work done to determine the practical utility of such dynamic programming algorithms. We have developed software to construct tree decompositions using various heuristics and have created a fast, memory-efficient dynamic programming implementation for solving maximum weighted independent set. We describe our software and the algorithms wemore » have implemented, focusing on memory saving techniques for the dynamic programming. We compare the running time and memory usage of our implementation with other techniques for solving maximum weighted independent set, including a commercial integer programming solver and a semi-definite programming solver. Our results indicate that it is possible to solve some instances where the underlying decomposition has width much larger than suggested by the literature. For certain types of problems, our dynamic programming code runs several times faster than these other methods.« less
PELE web server: atomistic study of biomolecular systems at your fingertips.

PubMed

Madadkar-Sobhani, Armin; Guallar, Victor

2013-07-01

PELE, Protein Energy Landscape Exploration, our novel technology based on protein structure prediction algorithms and a Monte Carlo sampling, is capable of modelling the all-atom protein-ligand dynamical interactions in an efficient and fast manner, with two orders of magnitude reduced computational cost when compared with traditional molecular dynamics techniques. PELE's heuristic approach generates trial moves based on protein and ligand perturbations followed by side chain sampling and global/local minimization. The collection of accepted steps forms a stochastic trajectory. Furthermore, several processors may be run in parallel towards a collective goal or defining several independent trajectories; the whole procedure has been parallelized using the Message Passing Interface. Here, we introduce the PELE web server, designed to make the whole process of running simulations easier and more practical by minimizing input file demand, providing user-friendly interface and producing abstract outputs (e.g. interactive graphs and tables). The web server has been implemented in C++ using Wt (http://www.webtoolkit.eu) and MySQL (http://www.mysql.com). The PELE web server, accessible at http://pele.bsc.es, is free and open to all users with no login requirement.
GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

PubMed Central

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128
Mean Line Pump Flow Model in Rocket Engine System Simulation

NASA Technical Reports Server (NTRS)

Veres, Joseph P.; Lavelle, Thomas M.

2000-01-01

A mean line pump flow modeling method has been developed to provide a fast capability for modeling turbopumps of rocket engines. Based on this method, a mean line pump flow code PUMPA has been written that can predict the performance of pumps at off-design operating conditions, given the loss of the diffusion system at the design point. The pump code can model axial flow inducers, mixed-flow and centrifugal pumps. The code can model multistage pumps in series. The code features rapid input setup and computer run time, and is an effective analysis and conceptual design tool. The map generation capability of the code provides the map information needed for interfacing with a rocket engine system modeling code. The off-design and multistage modeling capabilities of the code permit parametric design space exploration of candidate pump configurations and provide pump performance data for engine system evaluation. The PUMPA code has been integrated with the Numerical Propulsion System Simulation (NPSS) code and an expander rocket engine system has been simulated. The mean line pump flow code runs as an integral part of the NPSS rocket engine system simulation and provides key pump performance information directly to the system model at all operating conditions.
Comparison of maximum runup through analytical and numerical approaches for different fault parameters estimates

NASA Astrophysics Data System (ADS)

Kanoglu, U.; Wronna, M.; Baptista, M. A.; Miranda, J. M. A.

2017-12-01

The one-dimensional analytical runup theory in combination with near shore synthetic waveforms is a promising tool for tsunami rapid early warning systems. Its application in realistic cases with complex bathymetry and initial wave condition from inverse modelling have shown that maximum runup values can be estimated reasonably well. In this study we generate a simplistic bathymetry domains which resemble realistic near-shore features. We investigate the accuracy of the analytical runup formulae to the variation of fault source parameters and near-shore bathymetric features. To do this we systematically vary the fault plane parameters to compute the initial tsunami wave condition. Subsequently, we use the initial conditions to run the numerical tsunami model using coupled system of four nested grids and compare the results to the analytical estimates. Variation of the dip angle of the fault plane showed that analytical estimates have less than 10% difference for angles 5-45 degrees in a simple bathymetric domain. These results shows that the use of analytical formulae for fast run up estimates constitutes a very promising approach in a simple bathymetric domain and might be implemented in Hazard Mapping and Early Warning.
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

PubMed

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
Design for Run-Time Monitor on Cloud Computing

NASA Astrophysics Data System (ADS)

Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Cloud Computing for Complex Performance Codes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Appel, Gordon John; Hadgu, Teklu; Klein, Brandon Thorin

This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.
A Modular Environment for Geophysical Inversion and Run-time Autotuning using Heterogeneous Computing Systems

NASA Astrophysics Data System (ADS)

Myre, Joseph M.

Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Man and Running. Russia's Best-Selling Book on Exercise, Health, and Medicine. A Worldwide Literature Search.

ERIC Educational Resources Information Center

Volkov, Vladimir M.; Milner, Evgeny G.

This book attempts to systematize and generalize the data of world literature concerning the advantages of fast walking and slow running for persons with various cardiovascular diseases. The information and the fitness program outlined are based on experience and research conducted at the Nadezha Health Club in Smolensk, Russia. Major risk factors…
Nonlinear Analysis of a Bolted Marine Riser Connector Using NASTRAN Substructuring

NASA Technical Reports Server (NTRS)

Fox, G. L.

1984-01-01

Results of an investigation of the behavior of a bolted, flange type marine riser connector is reported. The method used to account for the nonlinear effect of connector separation due to bolt preload and axial tension load is described. The automated multilevel substructing capability of COSMIC/NASTRAN was employed at considerable savings in computer run time. Simplified formulas for computer resources, i.e., computer run times for modules SDCOMP, FBS, and MPYAD, as well as disk storage space, are presented. Actual run time data on a VAX-11/780 is compared with the formulas presented.
Estimating the economic opportunity cost of water use with river basin simulators in a computationally efficient way

NASA Astrophysics Data System (ADS)

Rougé, Charles; Harou, Julien J.; Pulido-Velazquez, Manuel; Matrosov, Evgenii S.

2017-04-01

The marginal opportunity cost of water refers to benefits forgone by not allocating an additional unit of water to its most economically productive use at a specific location in a river basin at a specific moment in time. Estimating the opportunity cost of water is an important contribution to water management as it can be used for better water allocation or better system operation, and can suggest where future water infrastructure could be most beneficial. Opportunity costs can be estimated using 'shadow values' provided by hydro-economic optimization models. Yet, such models' use of optimization means the models had difficulty accurately representing the impact of operating rules and regulatory and institutional mechanisms on actual water allocation. In this work we use more widely available river basin simulation models to estimate opportunity costs. This has been done before by adding in the model a small quantity of water at the place and time where the opportunity cost should be computed, then running a simulation and comparing the difference in system benefits. The added system benefits per unit of water added to the system then provide an approximation of the opportunity cost. This approximation can then be used to design efficient pricing policies that provide incentives for users to reduce their water consumption. Yet, this method requires one simulation run per node and per time step, which is demanding computationally for large-scale systems and short time steps (e.g., a day or a week). Besides, opportunity cost estimates are supposed to reflect the most productive use of an additional unit of water, yet the simulation rules do not necessarily use water that way. In this work, we propose an alternative approach, which computes the opportunity cost through a double backward induction, first recursively from outlet to headwaters within the river network at each time step, then recursively backwards in time. Both backward inductions only require linear operations, and the resulting algorithm tracks the maximal benefit that can be obtained by having an additional unit of water at any node in the network and at any date in time. Results 1) can be obtained from the results of a rule-based simulation using a single post-processing run, and 2) are exactly the (gross) benefit forgone by not allocating an additional unit of water to its most productive use. The proposed method is applied to London's water resource system to track the value of storage in the city's water supply reservoirs on the Thames River throughout a weekly 85-year simulation. Results, obtained in 0.4 seconds on a single processor, reflect the environmental cost of water shortage. This fast computation allows visualizing the seasonal variations of the opportunity cost depending on reservoir levels, demonstrating the potential of this approach for exploring water values and its variations using simulation models with multiple runs (e.g. of stochastically generated plausible future river inflows).
Annual Historical Report - AMEDD Activities, Calendar Year 1987

DTIC Science & Technology

1987-01-01

levels. Insulin levels were increased 2.2 fold with no change in glucagon. This profile is unlike that associated with weight loss due to fasting and may...were just as fast on the final PT test as the high mileage company. It appeared that for each mile run a quantifiable risk could be attached which was...various levels of fatness. Hum. Biol. 59:281-298, 1987. Knapik, J. J., B. H. Jones, C. Meredith, W. .J. Evans. Influence of 3.5 day fast on physical
Influence of a 3.5 Day Fast on Physical Performance Running Heading: Fasting and Performance,

DTIC Science & Technology

1986-01-01

induced atrophy. Two weeks of a hypocaloric diet has been shown to result in selective atrophy of Type II muscle fibers (28). However, it is unlike that...J, Marliss EB, Jeejeebhoy KN. Skeletal muse-le function during hypocaloric diets and fasting: a comparison with standard nutritional assessment...KN. Metabolic and structural changes in skeletal muscle during hypocaloric dieting . Am. J. Clin. Nutr. 1984; 39:503-513. 0 .4. .4.% Ii 20 29. Taylor H

Scalable computing for evolutionary genomics.

PubMed

Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

2012-01-01

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Fingerprinting Communication and Computation on HPC Machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peisert, Sean

2010-06-02

How do we identify what is actually running on high-performance computing systems? Names of binaries, dynamic libraries loaded, or other elements in a submission to a batch queue can give clues, but binary names can be changed, and libraries provide limited insight and resolution on the code being run. In this paper, we present a method for"fingerprinting" code running on HPC machines using elements of communication and computation. We then discuss how that fingerprint can be used to determine if the code is consistent with certain other types of codes, what a user usually runs, or what the user requestedmore » an allocation to do. In some cases, our techniques enable us to fingerprint HPC codes using runtime MPI data with a high degree of accuracy.« less
Computation of transform domain covariance matrices

NASA Technical Reports Server (NTRS)

Fino, B. J.; Algazi, V. R.

1975-01-01

It is often of interest in applications to compute the covariance matrix of a random process transformed by a fast unitary transform. Here, the recursive definition of fast unitary transforms is used to derive recursive relations for the covariance matrices of the transformed process. These relations lead to fast methods of computation of covariance matrices and to substantial reductions of the number of arithmetic operations required.
Computer-Based Video Instruction to Teach Students with Intellectual Disabilities to Verbally Respond to Questions and Make Purchases in Fast Food Restaurants

ERIC Educational Resources Information Center

Mechling, Linda C.; Pridgen, Leslie S.; Cronin, Beth A.

2005-01-01

Computer-based video instruction (CBVI) was used to teach verbal responses to questions presented by cashiers and purchasing skills in fast food restaurants. A multiple probe design across participants was used to evaluate the effectiveness of CBVI. Instruction occurred through simulations of three fast food restaurants on the computer using video…
Incompressible SPH (ISPH) with fast Poisson solver on a GPU

NASA Astrophysics Data System (ADS)

Chow, Alex D.; Rogers, Benedict D.; Lind, Steven J.; Stansby, Peter K.

2018-05-01

This paper presents a fast incompressible SPH (ISPH) solver implemented to run entirely on a graphics processing unit (GPU) capable of simulating several millions of particles in three dimensions on a single GPU. The ISPH algorithm is implemented by converting the highly optimised open-source weakly-compressible SPH (WCSPH) code DualSPHysics to run ISPH on the GPU, combining it with the open-source linear algebra library ViennaCL for fast solutions of the pressure Poisson equation (PPE). Several challenges are addressed with this research: constructing a PPE matrix every timestep on the GPU for moving particles, optimising the limited GPU memory, and exploiting fast matrix solvers. The ISPH pressure projection algorithm is implemented as 4 separate stages, each with a particle sweep, including an algorithm for the population of the PPE matrix suitable for the GPU, and mixed precision storage methods. An accurate and robust ISPH boundary condition ideal for parallel processing is also established by adapting an existing WCSPH boundary condition for ISPH. A variety of validation cases are presented: an impulsively started plate, incompressible flow around a moving square in a box, and dambreaks (2-D and 3-D) which demonstrate the accuracy, flexibility, and speed of the methodology. Fragmentation of the free surface is shown to influence the performance of matrix preconditioners and therefore the PPE matrix solution time. The Jacobi preconditioner demonstrates robustness and reliability in the presence of fragmented flows. For a dambreak simulation, GPU speed ups demonstrate up to 10-18 times and 1.1-4.5 times compared to single-threaded and 16-threaded CPU run times respectively.
GASPRNG: GPU accelerated scalable parallel random number generator library

NASA Astrophysics Data System (ADS)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Robot computer problem solving system

NASA Technical Reports Server (NTRS)

Becker, J. D.; Merriam, E. W.

1974-01-01

The conceptual, experimental, and practical phases of developing a robot computer problem solving system are outlined. Robot intelligence, conversion of the programming language SAIL to run under the THNEX monitor, and the use of the network to run several cooperating jobs at different sites are discussed.
A new fast algorithm for computing a complex number: Theoretic transforms

NASA Technical Reports Server (NTRS)

Reed, I. S.; Liu, K. Y.; Truong, T. K.

1977-01-01

A high-radix fast Fourier transformation (FFT) algorithm for computing transforms over GF(sq q), where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
Creep force modelling for rail traction vehicles based on the Fastsim algorithm

NASA Astrophysics Data System (ADS)

Spiryagin, Maksym; Polach, Oldrich; Cole, Colin

2013-11-01

The evaluation of creep forces is a complex task and their calculation is a time-consuming process for multibody simulation (MBS). A methodology of creep forces modelling at large traction creepages has been proposed by Polach [Creep forces in simulations of traction vehicles running on adhesion limit. Wear. 2005;258:992-1000; Influence of locomotive tractive effort on the forces between wheel and rail. Veh Syst Dyn. 2001(Suppl);35:7-22] adapting his previously published algorithm [Polach O. A fast wheel-rail forces calculation computer code. Veh Syst Dyn. 1999(Suppl);33:728-739]. The most common method for creep force modelling used by software packages for MBS of running dynamics is the Fastsim algorithm by Kalker [A fast algorithm for the simplified theory of rolling contact. Veh Syst Dyn. 1982;11:1-13]. However, the Fastsim code has some limitations which do not allow modelling the creep force - creep characteristic in agreement with measurements for locomotives and other high-power traction vehicles, mainly for large traction creep at low-adhesion conditions. This paper describes a newly developed methodology based on a variable contact flexibility increasing with the ratio of the slip area to the area of adhesion. This variable contact flexibility is introduced in a modification of Kalker's code Fastsim by replacing the constant Kalker's reduction factor, widely used in MBS, by a variable reduction factor together with a slip-velocity-dependent friction coefficient decreasing with increasing global creepage. The proposed methodology is presented in this work and compared with measurements for different locomotives. The modification allows use of the well recognised Fastsim code for simulation of creep forces at large creepages in agreement with measurements without modifying the proven modelling methodology at small creepages.
Active Nodal Task Seeking for High-Performance, Ultra-Dependable Computing

DTIC Science & Technology

1994-07-01

implementation. Figure 1 shows a hardware organization of ANTS: stand-alone computing nodes inter - connected by buses. 2.1 Run Time Partitioning The...nodes in 14 respond to changing loads [27] or system reconfiguration [26]. Existing techniques are all source-initiated or server-initiated [27]. 5.1...short-running task segments. The task segments must be short-running in order that processors will become avalable often enough to satisfy changing
GATE Monte Carlo simulation in a cloud computing environment

NASA Astrophysics Data System (ADS)

Rowedder, Blake Austin

The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
The Impact and Promise of Open-Source Computational Material for Physics Teaching

NASA Astrophysics Data System (ADS)

Christian, Wolfgang

2017-01-01

A computer-based modeling approach to teaching must be flexible because students and teachers have different skills and varying levels of preparation. Learning how to run the ``software du jour'' is not the objective for integrating computational physics material into the curriculum. Learning computational thinking, how to use computation and computer-based visualization to communicate ideas, how to design and build models, and how to use ready-to-run models to foster critical thinking is the objective. Our computational modeling approach to teaching is a research-proven pedagogy that predates computers. It attempts to enhance student achievement through the Modeling Cycle. This approach was pioneered by Robert Karplus and the SCIS Project in the 1960s and 70s and later extended by the Modeling Instruction Program led by Jane Jackson and David Hestenes at Arizona State University. This talk describes a no-cost open-source computational approach aligned with a Modeling Cycle pedagogy. Our tools, curricular material, and ready-to-run examples are freely available from the Open Source Physics Collection hosted on the AAPT-ComPADRE digital library. Examples will be presented.
Colt: an experiment in wormhole run-time reconfiguration

NASA Astrophysics Data System (ADS)

Bittner, Ray; Athanas, Peter M.; Musgrove, Mark

1996-10-01

Wormhole run-time reconfiguration (RTR) is an attempt to create a refined computing paradigm for high performance computational tasks. By combining concepts from field programmable gate array (FPGA) technologies with data flow computing, the Colt/Stallion architecture achieves high utilization of hardware resources, and facilitates rapid run-time reconfiguration. Targeted mainly at DSP-type operations, the Colt integrated circuit -- a prototype wormhole RTR device -- compares favorably to contemporary DSP alternatives in terms of silicon area consumed per unit computation and in computing performance. Although emphasis has been placed on signal processing applications, general purpose computation has not been overlooked. Colt is a prototype that defines an architecture not only at the chip level but also in terms of an overall system design. As this system is realized, the concept of wormhole RTR will be applied to numerical computation and DSP applications including those common to image processing, communications systems, digital filters, acoustic processing, real-time control systems and simulation acceleration.
Searching for Fast Radio Bursts with the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO)

NASA Astrophysics Data System (ADS)

Fisher, Ryan Patrick; Hughey, Brennan; Howell, Eric; LIGO Collaboration

2018-01-01

Although Fast Radio Bursts (FRB) are being detected with increasing frequency, their progenitor systems are still mostly a mystery. We present the plan to conduct targeted searches for gravitational-wave counterparts to these FRB events in the data from the first and second observing runs of the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO).
Dynamical scales for multi-TeV top-pair production at the LHC

NASA Astrophysics Data System (ADS)

Czakon, Michał; Heymes, David; Mitov, Alexander

2017-04-01

We calculate all major differential distributions with stable top-quarks at the LHC. The calculation covers the multi-TeV range that will be explored during LHC Run II and beyond. Our results are in the form of high-quality binned distributions. We offer predictions based on three different parton distribution function (pdf) sets. In the near future we will make our results available also in the more flexible fastNLO format that allows fast re-computation with any other pdf set. In order to be able to extend our calculation into the multi-TeV range we have had to derive a set of dynamic scales. Such scales are selected based on the principle of fastest perturbative convergence applied to the differential and inclusive cross-section. Many observations from our study are likely to be applicable and useful to other precision processes at the LHC. With scale uncertainty now under good control, pdfs arise as the leading source of uncertainty for TeV top production. Based on our findings, true precision in the boosted regime will likely only be possible after new and improved pdf sets appear. We expect that LHC top-quark data will play an important role in this process.
Fast response air-to-fuel ratio measurements using a novel device based on a wide band lambda sensor

NASA Astrophysics Data System (ADS)

Regitz, S.; Collings, N.

2008-07-01

A crucial parameter influencing the formation of pollutant gases in internal combustion engines is the air-to-fuel ratio (AFR). During transients on gasoline and diesel engines, significant AFR excursions from target values can occur, but cycle-by-cycle AFR resolution, which is helpful in understanding the origin of deviations, is difficult to achieve with existing hardware. This is because current electrochemical devices such as universal exhaust gas oxygen (UEGO) sensors have a time constant of 50-100 ms, depending on the engine running conditions. This paper describes the development of a fast reacting device based on a wide band lambda sensor which has a maximum time constant of ~20 ms and enables cyclic AFR measurements for engine speeds of up to ~4000 rpm. The design incorporates a controlled sensor environment which results in insensitivity to sample temperature and pressure. In order to guide the development process, a computational model was developed to predict the effect of pressure and temperature on the diffusion mechanism. Investigations regarding the sensor output and response were carried out, and sensitivities to temperature and pressure are examined. Finally, engine measurements are presented.
A fast non-local means algorithm based on integral image and reconstructed similar kernel

NASA Astrophysics Data System (ADS)

Lin, Zheng; Song, Enmin

2018-03-01

Image denoising is one of the essential methods in digital image processing. The non-local means (NLM) denoising approach is a remarkable denoising technique. However, its time complexity of the computation is high. In this paper, we design a fast NLM algorithm based on integral image and reconstructed similar kernel. First, the integral image is introduced in the traditional NLM algorithm. In doing so, it reduces a great deal of repetitive operations in the parallel processing, which will greatly improves the running speed of the algorithm. Secondly, in order to amend the error of the integral image, we construct a similar window resembling the Gaussian kernel in the pyramidal stacking pattern. Finally, in order to eliminate the influence produced by replacing the Gaussian weighted Euclidean distance with Euclidean distance, we propose a scheme to construct a similar kernel with a size of 3 x 3 in a neighborhood window which will reduce the effect of noise on a single pixel. Experimental results demonstrate that the proposed algorithm is about seventeen times faster than the traditional NLM algorithm, yet produce comparable results in terms of Peak Signal-to- Noise Ratio (the PSNR increased 2.9% in average) and perceptual image quality.
Fast and accurate Monte Carlo modeling of a kilovoltage X-ray therapy unit using a photon-source approximation for treatment planning in complex media.

PubMed

Zeinali-Rafsanjani, B; Mosleh-Shirazi, M A; Faghihi, R; Karbasi, S; Mosalaei, A

2015-01-01

To accurately recompute dose distributions in chest-wall radiotherapy with 120 kVp kilovoltage X-rays, an MCNP4C Monte Carlo model is presented using a fast method that obviates the need to fully model the tube components. To validate the model, half-value layer (HVL), percentage depth doses (PDDs) and beam profiles were measured. Dose measurements were performed for a more complex situation using thermoluminescence dosimeters (TLDs) placed within a Rando phantom. The measured and computed first and second HVLs were 3.8, 10.3 mm Al and 3.8, 10.6 mm Al, respectively. The differences between measured and calculated PDDs and beam profiles in water were within 2 mm/2% for all data points. In the Rando phantom, differences for majority of data points were within 2%. The proposed model offered an approximately 9500-fold reduced run time compared to the conventional full simulation. The acceptable agreement, based on international criteria, between the simulations and the measurements validates the accuracy of the model for its use in treatment planning and radiobiological modeling studies of superficial therapies including chest-wall irradiation using kilovoltage beam.
Acute Caffeinated Coffee Consumption Does not Improve Time Trial Performance in an 800-m Run: A Randomized, Double-Blind, Crossover, Placebo-Controlled Study.

PubMed

Marques, Alexandre C; Jesus, Alison A; Giglio, Bruna M; Marini, Ana C; Lobo, Patrícia C B; Mota, João F; Pimentel, Gustavo D

2018-05-23

Studies evaluating caffeinated coffee (CAF) can reveal ergogenic effects; however, studies on the effects of caffeinated coffee on running are scarce and controversial. To investigate the effects of CAF consumption compared to decaffeinated coffee (DEC) consumption on time trial performances in an 800-m run in overnight-fasting runners. A randomly counterbalanced, double-blind, crossover, placebo-controlled study was conducted with 12 healthy adult males with experience in amateur endurance running. Participants conducted two trials on two different occasions, one day with either CAF or DEC, with a one-week washout. After arriving at the data collection site, participants consumed the soluble CAF (5.5 mg/kg of caffeine) or DEC and after 60 min the run was started. Before and after the 800-m race, blood pressure and lactate and glucose concentrations were measured. At the end of the run, the ratings of perceived exertion (RPE) scale was applied. The runners were light consumers of habitual caffeine, with an average ingestion of 91.3 mg (range 6⁻420 mg/day). Time trial performances did not change between trials (DEF: 2.38 + 0.10 vs. CAF: 2.39 + 0.09 min, p = 0.336), nor did the RPE (DEC: 16.5 + 2.68 vs. CAF: 17.0 + 2.66, p = 0.326). No difference between the trials was observed for glucose and lactate concentrations, or for systolic and diastolic blood pressure levels. CAF consumption failed to enhance the time trial performance of an 800-m run in overnight-fasting runners, when compared with DEC ingestion. In addition, no change was found in RPE, blood pressure levels, or blood glucose and lactate concentrations between the two trials.
Virtualization and cloud computing in dentistry.

PubMed

Chow, Frank; Muftu, Ali; Shorter, Richard

2014-01-01

The use of virtualization and cloud computing has changed the way we use computers. Virtualization is a method of placing software called a hypervisor on the hardware of a computer or a host operating system. It allows a guest operating system to run on top of the physical computer with a virtual machine (i.e., virtual computer). Virtualization allows multiple virtual computers to run on top of one physical computer and to share its hardware resources, such as printers, scanners, and modems. This increases the efficient use of the computer by decreasing costs (e.g., hardware, electricity administration, and management) since only one physical computer is needed and running. This virtualization platform is the basis for cloud computing. It has expanded into areas of server and storage virtualization. One of the commonly used dental storage systems is cloud storage. Patient information is encrypted as required by the Health Insurance Portability and Accountability Act (HIPAA) and stored on off-site private cloud services for a monthly service fee. As computer costs continue to increase, so too will the need for more storage and processing power. Virtual and cloud computing will be a method for dentists to minimize costs and maximize computer efficiency in the near future. This article will provide some useful information on current uses of cloud computing.

LCFM - LIVING COLOR FRAME MAKER: PC GRAPHICS GENERATION AND MANAGEMENT TOOL FOR REAL-TIME APPLICATIONS

NASA Technical Reports Server (NTRS)

Truong, L. V.

1994-01-01

Computer graphics are often applied for better understanding and interpretation of data under observation. These graphics become more complicated when animation is required during "run-time", as found in many typical modern artificial intelligence and expert systems. Living Color Frame Maker is a solution to many of these real-time graphics problems. Living Color Frame Maker (LCFM) is a graphics generation and management tool for IBM or IBM compatible personal computers. To eliminate graphics programming, the graphic designer can use LCFM to generate computer graphics frames. The graphical frames are then saved as text files, in a readable and disclosed format, which can be easily accessed and manipulated by user programs for a wide range of "real-time" visual information applications. For example, LCFM can be implemented in a frame-based expert system for visual aids in management of systems. For monitoring, diagnosis, and/or controlling purposes, circuit or systems diagrams can be brought to "life" by using designated video colors and intensities to symbolize the status of hardware components (via real-time feedback from sensors). Thus status of the system itself can be displayed. The Living Color Frame Maker is user friendly with graphical interfaces, and provides on-line help instructions. All options are executed using mouse commands and are displayed on a single menu for fast and easy operation. LCFM is written in C++ using the Borland C++ 2.0 compiler for IBM PC series computers and compatible computers running MS-DOS. The program requires a mouse and an EGA/VGA display. A minimum of 77K of RAM is also required for execution. The documentation is provided in electronic form on the distribution medium in WordPerfect format. A sample MS-DOS executable is provided on the distribution medium. The standard distribution medium for this program is one 5.25 inch 360K MS-DOS format diskette. The contents of the diskette are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The Living Color Frame Maker tool was developed in 1992.
Fast prediction of RNA-RNA interaction using heuristic algorithm.

PubMed

Montaseri, Soheila

2015-01-01

Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
Machine musicianship

NASA Astrophysics Data System (ADS)

Rowe, Robert

2002-05-01

The training of musicians begins by teaching basic musical concepts, a collection of knowledge commonly known as musicianship. Computer programs designed to implement musical skills (e.g., to make sense of what they hear, perform music expressively, or compose convincing pieces) can similarly benefit from access to a fundamental level of musicianship. Recent research in music cognition, artificial intelligence, and music theory has produced a repertoire of techniques that can make the behavior of computer programs more musical. Many of these were presented in a recently published book/CD-ROM entitled Machine Musicianship. For use in interactive music systems, we are interested in those which are fast enough to run in real time and that need only make reference to the material as it appears in sequence. This talk will review several applications that are able to identify the tonal center of musical material during performance. Beyond this specific task, the design of real-time algorithmic listening through the concurrent operation of several connected analyzers is examined. The presentation includes discussion of a library of C++ objects that can be combined to perform interactive listening and a demonstration of their capability.
Astronomy research via the Internet

NASA Astrophysics Data System (ADS)

Ratnatunga, Kavan U.

Small developing countries may not have a dark site with good seeing for an astronomical observatory or be able to afford the financial commitment to set up and support such a facility. Much of astronomical research today is however done with remote observations, such as from telescopes in space, or obtained by service observing at large facilities on the ground. Cutting-edge astronomical research can now be done with low-cost computers, with a good Internet connection to get on-line access to astronomical observations, journals and most recent preprints. E-mail allows fast easy collaboration between research scientitists around the world. An international program with some short-term collaborative visits, could mine data and publish results from available astronomical observations for a fraction of the investment and cost of running even a small local observatory. Students who have been trained in the use of computers and software by such a program would also be more employable in the current job market. The Internet can reach you wherever you like to be and give you direct access to whatever you need for astronomical research.
High performance computing environment for multidimensional image analysis

PubMed Central

Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo

2007-01-01

Background The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. Results We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478× speedup. Conclusion Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets. PMID:17634099
High performance computing environment for multidimensional image analysis.

PubMed

Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo

2007-07-10

The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.
Acceleration of the direct reconstruction of linear parametric images using nested algorithms.

PubMed

Wang, Guobao; Qi, Jinyi

2010-03-07

Parametric imaging using dynamic positron emission tomography (PET) provides important information for biological research and clinical diagnosis. Indirect and direct methods have been developed for reconstructing linear parametric images from dynamic PET data. Indirect methods are relatively simple and easy to implement because the image reconstruction and kinetic modeling are performed in two separate steps. Direct methods estimate parametric images directly from raw PET data and are statistically more efficient. However, the convergence rate of direct algorithms can be slow due to the coupling between the reconstruction and kinetic modeling. Here we present two fast gradient-type algorithms for direct reconstruction of linear parametric images. The new algorithms decouple the reconstruction and linear parametric modeling at each iteration by employing the principle of optimization transfer. Convergence speed is accelerated by running more sub-iterations of linear parametric estimation because the computation cost of the linear parametric modeling is much less than that of the image reconstruction. Computer simulation studies demonstrated that the new algorithms converge much faster than the traditional expectation maximization (EM) and the preconditioned conjugate gradient algorithms for dynamic PET.
Quantum Mechanics/Molecular Mechanics Modeling of Enzymatic Processes: Caveats and Breakthroughs.

PubMed

Quesne, Matthew G; Borowski, Tomasz; de Visser, Sam P

2016-02-18

Nature has developed large groups of enzymatic catalysts with the aim to transfer substrates into useful products, which enables biosystems to perform all their natural functions. As such, all biochemical processes in our body (we drink, we eat, we breath, we sleep, etc.) are governed by enzymes. One of the problems associated with research on biocatalysts is that they react so fast that details of their reaction mechanisms cannot be obtained with experimental work. In recent years, major advances in computational hardware and software have been made and now large (bio)chemical systems can be studied using accurate computational techniques. One such technique is the quantum mechanics/molecular mechanics (QM/MM) technique, which has gained major momentum in recent years. Unfortunately, it is not a black-box method that is easily applied, but requires careful set-up procedures. In this work we give an overview on the technical difficulties and caveats of QM/MM and discuss work-protocols developed in our groups for running successful QM/MM calculations. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sorting signed permutations by inversions in O(nlogn) time.

PubMed

Swenson, Krister M; Rajan, Vaibhav; Lin, Yu; Moret, Bernard M E

2010-03-01

The study of genomic inversions (or reversals) has been a mainstay of computational genomics for nearly 20 years. After the initial breakthrough of Hannenhalli and Pevzner, who gave the first polynomial-time algorithm for sorting signed permutations by inversions, improved algorithms have been designed, culminating with an optimal linear-time algorithm for computing the inversion distance and a subquadratic algorithm for providing a shortest sequence of inversions--also known as sorting by inversions. Remaining open was the question of whether sorting by inversions could be done in O(nlogn) time. In this article, we present a qualified answer to this question, by providing two new sorting algorithms, a simple and fast randomized algorithm and a deterministic refinement. The deterministic algorithm runs in time O(nlogn + kn), where k is a data-dependent parameter. We provide the results of extensive experiments showing that both the average and the standard deviation for k are small constants, independent of the size of the permutation. We conclude (but do not prove) that almost all signed permutations can be sorted by inversions in O(nlogn) time.
Multi-jagged: A scalable parallel spatial partitioning algorithm

DOE PAGES

Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...

2015-03-18

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Makino, Junichiro; Kawai, Atsushi

2005-12-01

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
Direct-Solve Image-Based Wavefront Sensing

NASA Technical Reports Server (NTRS)

Lyon, Richard G.

2009-01-01

A method of wavefront sensing (more precisely characterized as a method of determining the deviation of a wavefront from a nominal figure) has been invented as an improved means of assessing the performance of an optical system as affected by such imperfections as misalignments, design errors, and fabrication errors. The method is implemented by software running on a single-processor computer that is connected, via a suitable interface, to the image sensor (typically, a charge-coupled device) in the system under test. The software collects a digitized single image from the image sensor. The image is displayed on a computer monitor. The software directly solves for the wavefront in a time interval of a fraction of a second. A picture of the wavefront is displayed. The solution process involves, among other things, fast Fourier transforms. It has been reported to the effect that some measure of the wavefront is decomposed into modes of the optical system under test, but it has not been reported whether this decomposition is postprocessing of the solution or part of the solution process.
Freud: a software suite for high-throughput simulation analysis

NASA Astrophysics Data System (ADS)

Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon

Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
Current and planned numerical development for improving computing performance for long duration and/or low pressure transients

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faydide, B.

1997-07-01

This paper presents the current and planned numerical development for improving computing performance in case of Cathare applications needing real time, like simulator applications. Cathare is a thermalhydraulic code developed by CEA (DRN), IPSN, EDF and FRAMATOME for PWR safety analysis. First, the general characteristics of the code are presented, dealing with physical models, numerical topics, and validation strategy. Then, the current and planned applications of Cathare in the field of simulators are discussed. Some of these applications were made in the past, using a simplified and fast-running version of Cathare (Cathare-Simu); the status of the numerical improvements obtained withmore » Cathare-Simu is presented. The planned developments concern mainly the Simulator Cathare Release (SCAR) project which deals with the use of the most recent version of Cathare inside simulators. In this frame, the numerical developments are related with the speed up of the calculation process, using parallel processing and improvement of code reliability on a large set of NPP transients.« less
Framework for architecture-independent run-time reconfigurable applications

NASA Astrophysics Data System (ADS)

Lehn, David I.; Hudson, Rhett D.; Athanas, Peter M.

2000-10-01

Configurable Computing Machines (CCMs) have emerged as a technology with the computational benefits of custom ASICs as well as the flexibility and reconfigurability of general-purpose microprocessors. Significant effort from the research community has focused on techniques to move this reconfigurability from a rapid application development tool to a run-time tool. This requires the ability to change the hardware design while the application is executing and is known as Run-Time Reconfiguration (RTR). Widespread acceptance of run-time reconfigurable custom computing depends upon the existence of high-level automated design tools. Such tools must reduce the designers effort to port applications between different platforms as the architecture, hardware, and software evolves. A Java implementation of a high-level application framework, called Janus, is presented here. In this environment, developers create Java classes that describe the structural behavior of an application. The framework allows hardware and software modules to be freely mixed and interchanged. A compilation phase of the development process analyzes the structure of the application and adapts it to the target platform. Janus is capable of structuring the run-time behavior of an application to take advantage of the memory and computational resources available.
Evolution of perceived footwear comfort over a prolonged running session.

PubMed

Hintzy, F; Cavagna, J; Horvais, N

2015-12-01

The purpose of this study was to investigate the subjective perception of overall footwear comfort over a prolonged running session. Ten runners performed two similar sessions consisting of a 13-km trail run (5 laps of 2.6 km) as fast as possible. The overall footwear comfort was evaluated before running and at the end of each lap with a 150-mm visual analogic scale, as well as speed, heart rate and rate of perceived exertion. The results showed that both overall footwear comfort and speed decreased consistently during the run session, and significantly after 44 min of running (i.e. the 3rd lap). It could be hypothesized that the deterioration of overall footwear comfort was explained by mechanical and energetical parameter changes with time and/or fatigue occurring at the whole body, foot and footwear levels. These results justify the use of a prolonged running test for running footwear comfort evaluation. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mao, Shoudi; He, Jiansen; Yang, Liping

The impact of an overtaking fast shock on a magnetic cloud (MC) is a pivotal process in CME–CME (CME: coronal mass ejection) interactions and CME–SIR (SIR: stream interaction region) interactions. MC with a strong and rotating magnetic field is usually deemed a crucial part of CMEs. To study the impact of a fast shock on an MC, we perform a 2.5 dimensional numerical magnetohydrodynamic simulation. Two cases are run in this study: without and with impact by fast shock. In the former case, the MC expands gradually from its initial state and drives a relatively slow magnetic reconnection with themore » ambient magnetic field. Analyses of forces near the core of the MC as a whole body indicates that the solar gravity is quite small compared to the Lorentz force and the pressure gradient force. In the second run, a fast shock propagates, relative to the background plasma, at a speed twice that of the perpendicular fast magnetosonic speed, catches up with and takes over the MC. Due to the penetration of the fast shock, the MC is highly compressed and heated, with the temperature growth rate enhanced by a factor of about 10 and the velocity increased to about half of the shock speed. The magnetic reconnection with ambient magnetic field is also sped up by a factor of two to four in reconnection rate as a result of the enhanced density of the current sheet, which is squeezed by the forward motion of the shocked MC.« less
Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE PAGES

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...

2017-12-06

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
Setting Up the JBrowse Genome Browser

PubMed Central

Skinner, Mitchell E; Holmes, Ian H

2010-01-01

JBrowse is a web-based tool for visualizing genomic data. Unlike most other web-based genome browsers, JBrowse exploits the capabilities of the user's web browser to make scrolling and zooming fast and smooth. It supports the browsers used by almost all internet users, and is relatively simple to install. JBrowse can utilize multiple types of data in a variety of common genomic data formats, including genomic feature data in bioperl databases, GFF files, and BED files, and quantitative data in wiggle files. This unit describes how to obtain the JBrowse software, set it up on a Linux or Mac OS X computer running as a web server and incorporate genome annotation data from multiple sources into JBrowse. After completing the protocols described in this unit, the reader will have a web site that other users can visit to browse the genomic data. PMID:21154710

Hera - The HEASARC's New Data Analysis Service

NASA Technical Reports Server (NTRS)

Pence, William

2006-01-01

Hera is the new computer service provided by the HEASARC at the NASA Goddard Space Flight Center that enables qualified student and professional astronomical researchers to immediately begin analyzing scientific data from high-energy astrophysics missions. All the necessary resources needed to do the data analysis are freely provided by Hera, including: * the latest version of the hundreds of scientific analysis programs in the HEASARC's HEASOFT package, as well as most of the programs in the Chandra CIAO package and the XMM-Newton SAS package. * high speed access to the terabytes of data in the HEASARC's high energy astrophysics Browse data archive. * a cluster of fast Linw workstations to run the software * ample local disk space to temporarily store the data and results. Some of the many features and different modes of using Hera are illustrated in this poster presentation.
An efficient hybrid method for stochastic reaction-diffusion biochemical systems with delay

NASA Astrophysics Data System (ADS)

Sayyidmousavi, Alireza; Ilie, Silvana

2017-12-01

Many chemical reactions, such as gene transcription and translation in living cells, need a certain time to finish once they are initiated. Simulating stochastic models of reaction-diffusion systems with delay can be computationally expensive. In the present paper, a novel hybrid algorithm is proposed to accelerate the stochastic simulation of delayed reaction-diffusion systems. The delayed reactions may be of consuming or non-consuming delay type. The algorithm is designed for moderately stiff systems in which the events can be partitioned into slow and fast subsets according to their propensities. The proposed algorithm is applied to three benchmark problems and the results are compared with those of the delayed Inhomogeneous Stochastic Simulation Algorithm. The numerical results show that the new hybrid algorithm achieves considerable speed-up in the run time and very good accuracy.
Data transmission protocol for Pi-of-the-Sky cameras

NASA Astrophysics Data System (ADS)

Uzycki, J.; Kasprowicz, G.; Mankiewicz, M.; Nawrocki, K.; Sitek, P.; Sokolowski, M.; Sulej, R.; Tlaczala, W.

2006-10-01

The large amount of data collected by the automatic astronomical cameras has to be transferred to the fast computers in a reliable way. The method chosen should ensure data streaming in both directions but in nonsymmetrical way. The Ethernet interface is very good choice because of its popularity and proven performance. However it requires TCP/IP stack implementation in devices like cameras for full compliance with existing network and operating systems. This paper describes NUDP protocol, which was made as supplement to standard UDP protocol and can be used as a simple-network protocol. The NUDP does not need TCP protocol implementation and makes it possible to run the Ethernet network with simple devices based on microcontroller and/or FPGA chips. The data transmission idea was created especially for the "Pi of the Sky" project.
Multitasking the code ARC3D. [for computational fluid dynamics

NASA Technical Reports Server (NTRS)

Barton, John T.; Hsiung, Christopher C.

1986-01-01

The CRAY multitasking system was developed in order to utilize all four processors and sharply reduce the wall clock run time. This paper describes the techniques used to modify the computational fluid dynamics code ARC3D for this run and analyzes the achieved speedup. The ARC3D code solves either the Euler or thin-layer N-S equations using an implicit approximate factorization scheme. Results indicate that multitask processing can be used to achieve wall clock speedup factors of over three times, depending on the nature of the program code being used. Multitasking appears to be particularly advantageous for large-memory problems running on multiple CPU computers.
Study of the mapping of Navier-Stokes algorithms onto multiple-instruction/multiple-data-stream computers

NASA Technical Reports Server (NTRS)

Eberhardt, D. S.; Baganoff, D.; Stevens, K.

1984-01-01

Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
BOKASUN: A fast and precise numerical program to calculate the Master Integrals of the two-loop sunrise diagrams

NASA Astrophysics Data System (ADS)

Caffo, Michele; Czyż, Henryk; Gunia, Michał; Remiddi, Ettore

2009-03-01

We present the program BOKASUN for fast and precise evaluation of the Master Integrals of the two-loop self-mass sunrise diagram for arbitrary values of the internal masses and the external four-momentum. We use a combination of two methods: a Bernoulli accelerated series expansion and a Runge-Kutta numerical solution of a system of linear differential equations. Program summaryProgram title: BOKASUN Catalogue identifier: AECG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 9404 No. of bytes in distributed program, including test data, etc.: 104 123 Distribution format: tar.gz Programming language: FORTRAN77 Computer: Any computer with a Fortran compiler accepting FORTRAN77 standard. Tested on various PC's with LINUX Operating system: LINUX RAM: 120 kbytes Classification: 4.4 Nature of problem: Any integral arising in the evaluation of the two-loop sunrise Feynman diagram can be expressed in terms of a given set of Master Integrals, which should be calculated numerically. The program provides a fast and precise evaluation method of the Master Integrals for arbitrary (but not vanishing) masses and arbitrary value of the external momentum. Solution method: The integrals depend on three internal masses and the external momentum squared p. The method is a combination of an accelerated expansion in 1/p in its (pretty large!) region of fast convergence and of a Runge-Kutta numerical solution of a system of linear differential equations. Running time: To obtain 4 Master Integrals on PC with 2 GHz processor it takes 3 μs for series expansion with pre-calculated coefficients, 80 μs for series expansion without pre-calculated coefficients, from a few seconds up to a few minutes for Runge-Kutta method (depending on the required accuracy and the values of the physical parameters).
CLOUDCLOUD : general-purpose instrument monitoring and data managing software

NASA Astrophysics Data System (ADS)

Dias, António; Amorim, António; Tomé, António

2016-04-01

An effective experiment is dependent on the ability to store and deliver data and information to all participant parties regardless of their degree of involvement in the specific parts that make the experiment a whole. Having fast, efficient and ubiquitous access to data will increase visibility and discussion, such that the outcome will have already been reviewed several times, strengthening the conclusions. The CLOUD project aims at providing users with a general purpose data acquisition, management and instrument monitoring platform that is fast, easy to use, lightweight and accessible to all participants of an experiment. This work is now implemented in the CLOUD experiment at CERN and will be fully integrated with the experiment as of 2016. Despite being used in an experiment of the scale of CLOUD, this software can also be used in any size of experiment or monitoring station, from single computers to large networks of computers to monitor any sort of instrument output without influencing the individual instrument's DAQ. Instrument data and meta data is stored and accessed via a specially designed database architecture and any type of instrument output is accepted using our continuously growing parsing application. Multiple databases can be used to separate different data taking periods or a single database can be used if for instance an experiment is continuous. A simple web-based application gives the user total control over the monitored instruments and their data, allowing data visualization and download, upload of processed data and the ability to edit existing instruments or add new instruments to the experiment. When in a network, new computers are immediately recognized and added to the system and are able to monitor instruments connected to them. Automatic computer integration is achieved by a locally running python-based parsing agent that communicates with a main server application guaranteeing that all instruments assigned to that computer are monitored with parsing intervals as fast as milliseconds. This software (server+agents+interface+database) comes in easy and ready-to-use packages that can be installed in any operating system, including Android and iOS systems. This software is ideal for use in modular experiments or monitoring stations with large variability in instruments and measuring methods or in large collaborations, where data requires homogenization in order to be effectively transmitted to all involved parties. This work presents the software and provides performance comparison with previously used monitoring systems in the CLOUD experiment at CERN.
No positive influence of ingesting chia seed oil on human running performance.

PubMed

Nieman, David C; Gillitt, Nicholas D; Meaney, Mary Pat; Dew, Dustin A

2015-05-15

Runners (n = 24) reported to the laboratory in an overnight fasted state at 8:00 am on two occasions separated by at least two weeks. After providing a blood sample at 8:00 am, subjects ingested 0.5 liters flavored water alone or 0.5 liters water with 7 kcal kg-1 chia seed oil (random order), provided another blood sample at 8:30 am, and then started running to exhaustion (~70% VO2max). Additional blood samples were collected immediately post- and 1-h post-exercise. Despite elevations in plasma alpha-linolenic acid (ALA) during the chia seed oil (337%) versus water trial (35%) (70.8 ± 8.6, 20.3 ± 1.8 μg mL(-1), respectively, p < 0.001), run time to exhaustion did not differ between trials (1.86 ± 0.10, 1.91 ± 0.13 h, p = 0.577, respectively). No trial differences were found for respiratory exchange ratio (RER) (0.92 ± 0.01), oxygen consumption, ventilation, ratings of perceived exertion (RPE), and plasma glucose and blood lactate. Significant post-run increases were measured for total leukocyte counts, plasma cortisol, and plasma cytokines (Interleukin-6 (IL-6), Interleukin-8 (IL-8), Interleukin-10 (IL-10), and Tumor necrosis factors-α (TNF-α)), with no trial differences. Chia seed oil supplementation compared to water alone in overnight fasted runners before and during prolonged, intensive running caused an elevation in plasma ALA, but did not enhance run time to exhaustion, alter RER, or counter elevations in cortisol and inflammatory outcome measures.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

PubMed Central

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.

PubMed

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
EnergyPlus Run Time Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Tianzhen; Buhl, Fred; Haves, Philip

2008-09-20

EnergyPlus is a new generation building performance simulation program offering many new modeling capabilities and more accurate performance calculations integrating building components in sub-hourly time steps. However, EnergyPlus runs much slower than the current generation simulation programs. This has become a major barrier to its widespread adoption by the industry. This paper analyzed EnergyPlus run time from comprehensive perspectives to identify key issues and challenges of speeding up EnergyPlus: studying the historical trends of EnergyPlus run time based on the advancement of computers and code improvements to EnergyPlus, comparing EnergyPlus with DOE-2 to understand and quantify the run time differences,more » identifying key simulation settings and model features that have significant impacts on run time, and performing code profiling to identify which EnergyPlus subroutines consume the most amount of run time. This paper provides recommendations to improve EnergyPlus run time from the modeler?s perspective and adequate computing platforms. Suggestions of software code and architecture changes to improve EnergyPlus run time based on the code profiling results are also discussed.« less
Efficient and Scalable Cross-Matching of (Very) Large Catalogs

NASA Astrophysics Data System (ADS)

Pineau, F.-X.; Boch, T.; Derriere, S.

2011-07-01

Whether it be for building multi-wavelength datasets from independent surveys, studying changes in objects luminosities, or detecting moving objects (stellar proper motions, asteroids), cross-catalog matching is a technique widely used in astronomy. The need for efficient, reliable and scalable cross-catalog matching is becoming even more pressing with forthcoming projects which will produce huge catalogs in which astronomers will dig for rare objects, perform statistical analysis and classification, or real-time transients detection. We have developed a formalism and the corresponding technical framework to address the challenge of fast cross-catalog matching. Our formalism supports more than simple nearest-neighbor search, and handles elliptical positional errors. Scalability is improved by partitioning the sky using the HEALPix scheme, and processing independently each sky cell. The use of multi-threaded two-dimensional kd-trees adapted to managing equatorial coordinates enables efficient neighbor search. The whole process can run on a single computer, but could also use clusters of machines to cross-match future very large surveys such as GAIA or LSST in reasonable times. We already achieve performances where the 2MASS (˜470M sources) and SDSS DR7 (˜350M sources) can be matched on a single machine in less than 10 minutes. We aim at providing astronomers with a catalog cross-matching service, available on-line and leveraging on the catalogs present in the VizieR database. This service will allow users both to access pre-computed cross-matches across some very large catalogs, and to run customized cross-matching operations. It will also support VO protocols for synchronous or asynchronous queries.
Application of Nearly Linear Solvers to Electric Power System Computation

NASA Astrophysics Data System (ADS)

Grant, Lisa L.

To meet the future needs of the electric power system, improvements need to be made in the areas of power system algorithms, simulation, and modeling, specifically to achieve a time frame that is useful to industry. If power system time-domain simulations could run in real-time, then system operators would have situational awareness to implement online control and avoid cascading failures, significantly improving power system reliability. Several power system applications rely on the solution of a very large linear system. As the demands on power systems continue to grow, there is a greater computational complexity involved in solving these large linear systems within reasonable time. This project expands on the current work in fast linear solvers, developed for solving symmetric and diagonally dominant linear systems, in order to produce power system specific methods that can be solved in nearly-linear run times. The work explores a new theoretical method that is based on ideas in graph theory and combinatorics. The technique builds a chain of progressively smaller approximate systems with preconditioners based on the system's low stretch spanning tree. The method is compared to traditional linear solvers and shown to reduce the time and iterations required for an accurate solution, especially as the system size increases. A simulation validation is performed, comparing the solution capabilities of the chain method to LU factorization, which is the standard linear solver for power flow. The chain method was successfully demonstrated to produce accurate solutions for power flow simulation on a number of IEEE test cases, and a discussion on how to further improve the method's speed and accuracy is included.
Community detection using preference networks

NASA Astrophysics Data System (ADS)

Tasgin, Mursel; Bingol, Haluk O.

2018-04-01

Community detection is the task of identifying clusters or groups of nodes in a network where nodes within the same group are more connected with each other than with nodes in different groups. It has practical uses in identifying similar functions or roles of nodes in many biological, social and computer networks. With the availability of very large networks in recent years, performance and scalability of community detection algorithms become crucial, i.e. if time complexity of an algorithm is high, it cannot run on large networks. In this paper, we propose a new community detection algorithm, which has a local approach and is able to run on large networks. It has a simple and effective method; given a network, algorithm constructs a preference network of nodes where each node has a single outgoing edge showing its preferred node to be in the same community with. In such a preference network, each connected component is a community. Selection of the preferred node is performed using similarity based metrics of nodes. We use two alternatives for this purpose which can be calculated in 1-neighborhood of nodes, i.e. number of common neighbors of selector node and its neighbors and, the spread capability of neighbors around the selector node which is calculated by the gossip algorithm of Lind et.al. Our algorithm is tested on both computer generated LFR networks and real-life networks with ground-truth community structure. It can identify communities accurately in a fast way. It is local, scalable and suitable for distributed execution on large networks.
The Effect of Driver Rise-Time on Pinch Current and its Impact on Plasma Focus Performance and Neutron Yield

NASA Astrophysics Data System (ADS)

Sears, Jason; Schmidt, Andrea; Link, Anthony; Welch, Dale

2016-10-01

Experiments have suggested that dense plasma focus (DPF) neutron yield increases with faster drivers [Decker NIMP 1986]. Using the particle-in-cell code LSP [Schmidt PRL 2012], we reproduce this trend in a kJ DPF [Ellsworth 2014], and demonstrate how driver rise time is coupled to neutron output. We implement a 2-D model of the plasma focus including self-consistent circuit-driven boundary conditions. Driver capacitance and voltage are varied to modify the current rise time, and anode length is adjusted so that run-in coincides with the peak current. We observe during run down that magnetohydrodynamic (MHD) instabilities of the sheath shed blobs of plasma that remain in the inter-electrode gap during run in. This trailing plasma later acts as a low-inductance restrike path that shunts current from the pinch during maximum compression. While the MHD growth rate increases slightly with driver speed, the shorter anode of the fast driver allows fewer e-foldings and hence reduces the trailing mass between electrodes. As a result, the fast driver postpones parasitic restrikes and maintains peak current through the pinch during maximum compression. The fast driver pinch therefore achieves best simultaneity between its ion beam and peak target density, which maximizes neutron production. Prepared by LLNL under Contract DE-AC52-07NA27344.
Compressed quantum computation using a remote five-qubit quantum computer

NASA Astrophysics Data System (ADS)

Hebenstreit, M.; Alsina, D.; Latorre, J. I.; Kraus, B.

2017-05-01

The notion of compressed quantum computation is employed to simulate the Ising interaction of a one-dimensional chain consisting of n qubits using the universal IBM cloud quantum computer running on log2(n ) qubits. The external field parameter that controls the quantum phase transition of this model translates into particular settings of the quantum gates that generate the circuit. We measure the magnetization, which displays the quantum phase transition, on a two-qubit system, which simulates a four-qubit Ising chain, and show its agreement with the theoretical prediction within a certain error. We also discuss the relevant point of how to assess errors when using a cloud quantum computer with a limited amount of runs. As a solution, we propose to use validating circuits, that is, to run independent controlled quantum circuits of similar complexity to the circuit of interest.
Quality of Care as an Emergent Phenomenon out of a Small-World Network of Relational Actors.

PubMed

Fiorini, Rodolfo; De Giacomo, Piero; Marconi, Pier Luigi; L'Abate, Luciano

2014-01-01

In Healthcare Decision Support System, the development and evaluation of effective "Quality of Care" (QOC) indicators, in simulation-based training, are key feature to develop resilient and antifragile organization scenarios. Is it possible to conceive of QOC not only as a result of a voluntary and rational decision, imposed or even not, but also as an overall system "emergent phenomenon" out of a small-world network of relational synthetic actors, endowed with their own personality profiles to simulate human behaviour (for short, called "subjects")? In order to answer this question and to observe the phenomena of real emergence we should use computational models of high complexity, with heavy computational load and extensive computational time. Nevertheless, De Giacomo's Elementary Pragmatic Model (EPM) intrinsic self-reflexive functional logical closure enables to run simulation examples to classify the outcomes grown out of a small-world network of relational subjects fast and effectively. Therefore, it is possible to take note and to learn of how much strategic systemic interventions can induce context conditions of QOC facilitation, which can improve the effectiveness of specific actions, which otherwise might be paradoxically counterproductive also. Early results are so encouraging to use EPM as basic block to start designing more powerful Evolutive Elementary Pragmatic Model (E2PM) for real emergence computational model, to cope with ontological uncertainty at system level.
Reduced Design Load Basis for Ultimate Blade Loads Estimation in Multidisciplinary Design Optimization Frameworks

NASA Astrophysics Data System (ADS)

Pavese, Christian; Tibaldi, Carlo; Larsen, Torben J.; Kim, Taeseong; Thomsen, Kenneth

2016-09-01

The aim is to provide a fast and reliable approach to estimate ultimate blade loads for a multidisciplinary design optimization (MDO) framework. For blade design purposes, the standards require a large amount of computationally expensive simulations, which cannot be efficiently run each cost function evaluation of an MDO process. This work describes a method that allows integrating the calculation of the blade load envelopes inside an MDO loop. Ultimate blade load envelopes are calculated for a baseline design and a design obtained after an iteration of an MDO. These envelopes are computed for a full standard design load basis (DLB) and a deterministic reduced DLB. Ultimate loads extracted from the two DLBs with the two blade designs each are compared and analyzed. Although the reduced DLB supplies ultimate loads of different magnitude, the shape of the estimated envelopes are similar to the one computed using the full DLB. This observation is used to propose a scheme that is computationally cheap, and that can be integrated inside an MDO framework, providing a sufficiently reliable estimation of the blade ultimate loading. The latter aspect is of key importance when design variables implementing passive control methodologies are included in the formulation of the optimization problem. An MDO of a 10 MW wind turbine blade is presented as an applied case study to show the efficacy of the reduced DLB concept.
Approximate, computationally efficient online learning in Bayesian spiking neurons.

PubMed

Kuhlmann, Levin; Hauser-Raspe, Michael; Manton, Jonathan H; Grayden, David B; Tapson, Jonathan; van Schaik, André

2014-03-01

Bayesian spiking neurons (BSNs) provide a probabilistic interpretation of how neurons perform inference and learning. Online learning in BSNs typically involves parameter estimation based on maximum-likelihood expectation-maximization (ML-EM) which is computationally slow and limits the potential of studying networks of BSNs. An online learning algorithm, fast learning (FL), is presented that is more computationally efficient than the benchmark ML-EM for a fixed number of time steps as the number of inputs to a BSN increases (e.g., 16.5 times faster run times for 20 inputs). Although ML-EM appears to converge 2.0 to 3.6 times faster than FL, the computational cost of ML-EM means that ML-EM takes longer to simulate to convergence than FL. FL also provides reasonable convergence performance that is robust to initialization of parameter estimates that are far from the true parameter values. However, parameter estimation depends on the range of true parameter values. Nevertheless, for a physiologically meaningful range of parameter values, FL gives very good average estimation accuracy, despite its approximate nature. The FL algorithm therefore provides an efficient tool, complementary to ML-EM, for exploring BSN networks in more detail in order to better understand their biological relevance. Moreover, the simplicity of the FL algorithm means it can be easily implemented in neuromorphic VLSI such that one can take advantage of the energy-efficient spike coding of BSNs.
Fast two-stream method for computing diurnal-mean actinic flux in vertically inhomogeneous atmospheres

NASA Technical Reports Server (NTRS)

Filyushkin, V. V.; Madronich, S.; Brasseur, G. P.; Petropavlovskikh, I. V.

1994-01-01

Based on a derivation of the two-stream daytime-mean equations of radiative flux transfer, a method for computing the daytime-mean actinic fluxes in the absorbing and scattering vertically inhomogeneous atmosphere is suggested. The method applies direct daytime integration of the particular solutions of the two-stream approximations or the source functions. It is valid for any duration of period of averaging. The merit of the method is that the multiple scattering computation is carried out only once for the whole averaging period. It can be implemented with a number of widely used two-stream approximations. The method agrees with the results obtained with 200-point multiple scattering calculations. The method was also tested in runs with a 1-km cloud layer with optical depth of 10, as well as with aerosol background. Comparison of the results obtained for a cloud subdivided into 20 layers with those obtained for a one-layer cloud with the same optical parameters showed that direct integration of particular solutions possesses an 'analytical' accuracy. In the case of the source function interpolation, the actinic fluxes calculated above the one-layer and 20-layer clouds agreed within 1%-1.5%, while below the cloud they may differ up to 5% (in the worst case). The ways of enhancing the accuracy (in a 'two-stream sense') and computational efficiency of the method are discussed.

Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

NASA Technical Reports Server (NTRS)

Long, Lyle N.; Brentner, Kenneth S.

2000-01-01

This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
BarraCUDA - a fast short read sequence aligner using graphics processing units

PubMed Central

2012-01-01

Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Investigation of Response Amplitude Operators for Floating Offshore Wind Turbines: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramachandran, G. K. V.; Robertson, A.; Jonkman, J. M.

This paper examines the consistency between response amplitude operators (RAOs) computed from WAMIT, a linear frequency-domain tool, to RAOs derived from time-domain computations based on white-noise wave excitation using FAST, a nonlinear aero-hydro-servo-elastic tool. The RAO comparison is first made for a rigid floating wind turbine without wind excitation. The investigation is further extended to examine how these RAOs change for a flexible and operational wind turbine. The RAOs are computed for below-rated, rated, and above-rated wind conditions. The method is applied to a floating wind system composed of the OC3-Hywind spar buoy and NREL 5-MW wind turbine. The responsesmore » are compared between FAST and WAMIT to verify the FAST model and to understand the influence of structural flexibility, aerodynamic damping, control actions, and waves on the system responses. The results show that based on the RAO computation procedure implemented, the WAMIT- and FAST-computed RAOs are similar (as expected) for a rigid turbine subjected to waves only. However, WAMIT is unable to model the excitation from a flexible turbine. Further, the presence of aerodynamic damping decreased the platform surge and pitch responses, as computed by both WAMIT and FAST when wind was included. Additionally, the influence of gyroscopic excitation increased the yaw response, which was captured by both WAMIT and FAST.« less
Velocity, safety, or both? How do balance and strength of goal conflicts affect drivers' behaviour, feelings and physiological responses?

PubMed

Schmidt-Daffy, Martin; Brandenburg, Stefan; Beliavski, Alina

2013-06-01

Motivational models of driving behaviour agree that choice of speed is modulated by drivers' goals. Whilst it is accepted that some goals favour fast driving and others favour safe driving, little is known about the interplay of these conflicting goals. In the present study, two aspects of this interplay are investigated: the balance of conflict and the strength of conflict. Thirty-two participants completed several simulated driving runs in which fast driving was rewarded with a monetary gain if the end of the track was reached. However, unpredictably, some runs ended with the appearance of a deer. In these runs, fast driving was punished with a monetary loss. The ratio between the magnitudes of gains and losses varied in order to manipulate the balance of conflict. The absolute magnitudes of both gains and losses altered the strength of conflict. Participants drove slower, reported an increase in anxiety-related feelings, and showed indications of physiological arousal if there was more money at stake. In contrast, only marginal effects of varying the ratio between gains and losses were observed. Results confirm that the strength of a safety-velocity conflict is an important determinant of drivers' behaviour, feelings, and physiological responses. The lack of evidence for the balance of conflict playing a role suggests that in each condition, participants subjectively weighted the loss higher than the gain (loss aversion). It is concluded that the interplay of the subjective values that drivers attribute to objective incentives for fast and safe driving is a promising field for future research. Incorporating this knowledge into motivational theories of driving behaviour might improve their contribution to the design of adequate road safety measures. Copyright © 2013 Elsevier Ltd. All rights reserved.
Counterfactual quantum computation through quantum interrogation

NASA Astrophysics Data System (ADS)

Hosten, Onur; Rakher, Matthew T.; Barreiro, Julio T.; Peters, Nicholas A.; Kwiat, Paul G.

2006-02-01

The logic underlying the coherent nature of quantum information processing often deviates from intuitive reasoning, leading to surprising effects. Counterfactual computation constitutes a striking example: the potential outcome of a quantum computation can be inferred, even if the computer is not run. Relying on similar arguments to interaction-free measurements (or quantum interrogation), counterfactual computation is accomplished by putting the computer in a superposition of `running' and `not running' states, and then interfering the two histories. Conditional on the as-yet-unknown outcome of the computation, it is sometimes possible to counterfactually infer information about the solution. Here we demonstrate counterfactual computation, implementing Grover's search algorithm with an all-optical approach. It was believed that the overall probability of such counterfactual inference is intrinsically limited, so that it could not perform better on average than random guesses. However, using a novel `chained' version of the quantum Zeno effect, we show how to boost the counterfactual inference probability to unity, thereby beating the random guessing limit. Our methods are general and apply to any physical system, as illustrated by a discussion of trapped-ion systems. Finally, we briefly show that, in certain circumstances, counterfactual computation can eliminate errors induced by decoherence.
Smartphones as image processing systems for prosthetic vision.

PubMed

Zapf, Marc P; Matteucci, Paul B; Lovell, Nigel H; Suaning, Gregg J

2013-01-01

The feasibility of implants for prosthetic vision has been demonstrated by research and commercial organizations. In most devices, an essential forerunner to the internal stimulation circuit is an external electronics solution for capturing, processing and relaying image information as well as extracting useful features from the scene surrounding the patient. The capabilities and multitude of image processing algorithms that can be performed by the device in real-time plays a major part in the final quality of the prosthetic vision. It is therefore optimal to use powerful hardware yet to avoid bulky, straining solutions. Recent publications have reported of portable single-board computers fast enough for computationally intensive image processing. Following the rapid evolution of commercial, ultra-portable ARM (Advanced RISC machine) mobile devices, the authors investigated the feasibility of modern smartphones running complex face detection as external processing devices for vision implants. The role of dedicated graphics processors in speeding up computation was evaluated while performing a demanding noise reduction algorithm (image denoising). The time required for face detection was found to decrease by 95% from 2.5 year old to recent devices. In denoising, graphics acceleration played a major role, speeding up denoising by a factor of 18. These results demonstrate that the technology has matured sufficiently to be considered as a valid external electronics platform for visual prosthetic research.
Evaluation of a micro-scale wind model's performance over realistic building clusters using wind tunnel experiments

NASA Astrophysics Data System (ADS)

Zhang, Ning; Du, Yunsong; Miao, Shiguang; Fang, Xiaoyi

2016-08-01

The simulation performance over complex building clusters of a wind simulation model (Wind Information Field Fast Analysis model, WIFFA) in a micro-scale air pollutant dispersion model system (Urban Microscale Air Pollution dispersion Simulation model, UMAPS) is evaluated using various wind tunnel experimental data including the CEDVAL (Compilation of Experimental Data for Validation of Micro-Scale Dispersion Models) wind tunnel experiment data and the NJU-FZ experiment data (Nanjing University-Fang Zhuang neighborhood wind tunnel experiment data). The results show that the wind model can reproduce the vortexes triggered by urban buildings well, and the flow patterns in urban street canyons and building clusters can also be represented. Due to the complex shapes of buildings and their distributions, the simulation deviations/discrepancies from the measurements are usually caused by the simplification of the building shapes and the determination of the key zone sizes. The computational efficiencies of different cases are also discussed in this paper. The model has a high computational efficiency compared to traditional numerical models that solve the Navier-Stokes equations, and can produce very high-resolution (1-5 m) wind fields of a complex neighborhood scale urban building canopy (~ 1 km ×1 km) in less than 3 min when run on a personal computer.
Quantum transport and nanoplasmonics with carbon nanorings - using HPC in computational nanoscience

NASA Astrophysics Data System (ADS)

Jack, Mark A.

2011-10-01

Central theme of this talk is the theoretical study of toroidal carbon nanostructures as a new form of metamaterial. The interference of ring-generated electromagnetic radiation in a regular array of nanorings driven by an incoming polarized wave front may lead to fascinating new optoelectronics applications. The tight-binding method is used to model charge transport in a carbon nanotorus: All transport observables can be derived from the Green's function of the device region in a non-equilibrium Green's function algorithm. We have calculated density-of-states D(E) and transmissivities T(E) between two metallic leads under a small voltage bias. Electron-phonon coupling is included for low-energy phonon modes of armchair and zigzag nanorings with atomic displacements determined by a collaborator's finite-element based code. A numerically fast and stable algorithm has been developed via parallel linear algebra matrix routines (PETSc) with MPI parallelism to reach significant speed-up. Production runs are planned on the NSF XSEDE network. This project was supported in parts by a 2010 NSF TeraGrid Fellowship and the Sunshine State Education and Research Computing Alliance (SSERCA). Two summer students were supported as 2010 and 2011 NCSI/Shodor Petascale Computing undergraduate interns.[4pt] In collaboration with Leon W. Durivage, Adam Byrd, and Mario Encinosa.
New methodology for fast prediction of wheel wear evolution

NASA Astrophysics Data System (ADS)

Apezetxea, I. S.; Perez, X.; Casanueva, C.; Alonso, A.

2017-07-01

In railway applications wear prediction in the wheel-rail interface is a fundamental matter in order to study problems such as wheel lifespan and the evolution of vehicle dynamic characteristic with time. However, one of the principal drawbacks of the existing methodologies for calculating the wear evolution is the computational cost. This paper proposes a new wear prediction methodology with a reduced computational cost. This methodology is based on two main steps: the first one is the substitution of the calculations over the whole network by the calculation of the contact conditions in certain characteristic point from whose result the wheel wear evolution can be inferred. The second one is the substitution of the dynamic calculation (time integration calculations) by the quasi-static calculation (the solution of the quasi-static situation of a vehicle at a certain point which is the same that neglecting the acceleration terms in the dynamic equations). These simplifications allow a significant reduction of computational cost to be obtained while maintaining an acceptable level of accuracy (error order of 5-10%). Several case studies are analysed along the paper with the objective of assessing the proposed methodology. The results obtained in the case studies allow concluding that the proposed methodology is valid for an arbitrary vehicle running through an arbitrary track layout.
Post-processing of seismic parameter data based on valid seismic event determination

DOEpatents

McEvilly, Thomas V.

1985-01-01

An automated seismic processing system and method are disclosed, including an array of CMOS microprocessors for unattended battery-powered processing of a multi-station network. According to a characterizing feature of the invention, each channel of the network is independently operable to automatically detect, measure times and amplitudes, and compute and fit Fast Fourier transforms (FFT's) for both P- and S- waves on analog seismic data after it has been sampled at a given rate. The measured parameter data from each channel are then reviewed for event validity by a central controlling microprocessor and if determined by preset criteria to constitute a valid event, the parameter data are passed to an analysis computer for calculation of hypocenter location, running b-values, source parameters, event count, P- wave polarities, moment-tensor inversion, and Vp/Vs ratios. The in-field real-time analysis of data maximizes the efficiency of microearthquake surveys allowing flexibility in experimental procedures, with a minimum of traditional labor-intensive postprocessing. A unique consequence of the system is that none of the original data (i.e., the sensor analog output signals) are necessarily saved after computation, but rather, the numerical parameters generated by the automatic analysis are the sole output of the automated seismic processor.
4273π: Bioinformatics education on low cost ARM hardware

PubMed Central

2013-01-01

Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194
4273π: bioinformatics education on low cost ARM hardware.

PubMed

Barker, Daniel; Ferrier, David Ek; Holland, Peter Wh; Mitchell, John Bo; Plaisier, Heleen; Ritchie, Michael G; Smart, Steven D

2013-08-12

Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012-2013. 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost.
A Randomized Field Trial of the Fast ForWord Language Computer-Based Training Program

ERIC Educational Resources Information Center

Borman, Geoffrey D.; Benson, James G.; Overman, Laura

2009-01-01

This article describes an independent assessment of the Fast ForWord Language computer-based training program developed by Scientific Learning Corporation. Previous laboratory research involving children with language-based learning impairments showed strong effects on their abilities to recognize brief and fast sequences of nonspeech and speech…
Dynamic performance of a suspended reinforced concrete footbridge under pedestrian movements

NASA Astrophysics Data System (ADS)

Drygala, I.; Dulinska, J.; Kondrat, K.

2018-02-01

In the paper the dynamic analysis of a suspended reinforced concrete footbridge over a national road located in South Poland was carried out. Firstly, modes and values of natural frequencies of vibration of the structure were calculated. The results of the numerical modal investigation shown that the natural frequencies of the structure coincided with the frequency of human beings during motion steps (walking fast or running). Hence, to consider the comfort standards, the dynamic response of the footbridge to a runner dynamic motion should be calculated. Secondly, the dynamic response of the footbridge was calculated taking into consideration two models of dynamic forces produced by a single running pedestrian: a ‘sine’ and ‘half-sine’ model. It occurred that the values of accelerations and displacements obtained for the ‘half-sine’ model of dynamic forces were greater than those obtained for the ‘sine’ model up 20%. The ‘sine’ model is appropriate only for walking users of the walkways, because the nature of their motion has continues characteristic. In the case of running users of walkways this theory is unfitting, since the forces produced by a running pedestrian has a discontinuous nature. In this scenario of calculations, a ‘half-sine’ model seemed to be more effective. Finally, the comfort conditions for the footbridge were evaluated. The analysis proved that the vertical comfort criteria were not exceeded for a single user of footbridge running or walking fast.
Statistical fingerprinting for malware detection and classification

DOEpatents

Prowell, Stacy J.; Rathgeb, Christopher T.

2015-09-15

A system detects malware in a computing architecture with an unknown pedigree. The system includes a first computing device having a known pedigree and operating free of malware. The first computing device executes a series of instrumented functions that, when executed, provide a statistical baseline that is representative of the time it takes the software application to run on a computing device having a known pedigree. A second computing device executes a second series of instrumented functions that, when executed, provides an actual time that is representative of the time the known software application runs on the second computing device. The system detects malware when there is a difference in execution times between the first and the second computing devices.
GPU-Accelerated Voxelwise Hepatic Perfusion Quantification

PubMed Central

Wang, H; Cao, Y

2012-01-01

Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645
HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

NASA Astrophysics Data System (ADS)

De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.

2017-10-01

PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
Predictors of race time in male Ironman triathletes: physical characteristics, training, or prerace experience?

PubMed

Knechtle, Beat; Wirth, Andrea; Rosemann, Thomas

2010-10-01

The aim of the present study was to assess whether physical characteristics, training, or prerace experience were related to performance in recreational male Ironman triathletes using bi- and multivariate analysis. 83 male recreational triathletes who volunteered to participate in the study (M age 41.5 yr., SD = 8.9) had a mean body height of 1.80 m (SD = 0.06), mean body mass of 77.3 kg (SD = 8.9), and mean Body Mass Index of 23.7 kg/m2 (SD = 2.1) at the 2009 IRONMAN SWITZERLAND competition. Speed in running during training, personal best marathon time, and personal best time in an Olympic distance triathlon were related to the Ironman race time. These three variables explained 64% of the variance in Ironman race time. Personal best marathon time was significantly and positively related to the run split time in the Ironman race. Faster running while training and both a fast personal best time in a marathon and in an Olympic distance triathlon were associated with a fast Ironman race time.
a Fast and Flexible Method for Meta-Map Building for Icp Based Slam

NASA Astrophysics Data System (ADS)

Kurian, A.; Morin, K. W.

2016-06-01

Recent developments in LiDAR sensors make mobile mapping fast and cost effective. These sensors generate a large amount of data which in turn improves the coverage and details of the map. Due to the limited range of the sensor, one has to collect a series of scans to build the entire map of the environment. If we have good GNSS coverage, building a map is a well addressed problem. But in an indoor environment, we have limited GNSS reception and an inertial solution, if available, can quickly diverge. In such situations, simultaneous localization and mapping (SLAM) is used to generate a navigation solution and map concurrently. SLAM using point clouds possesses a number of computational challenges even with modern hardware due to the shear amount of data. In this paper, we propose two strategies for minimizing the cost of computation and storage when a 3D point cloud is used for navigation and real-time map building. We have used the 3D point cloud generated by Leica Geosystems's Pegasus Backpack which is equipped with Velodyne VLP-16 LiDARs scanners. To improve the speed of the conventional iterative closest point (ICP) algorithm, we propose a point cloud sub-sampling strategy which does not throw away any key features and yet significantly reduces the number of points that needs to be processed and stored. In order to speed up the correspondence finding step, a dual kd-tree and circular buffer architecture is proposed. We have shown that the proposed method can run in real time and has excellent navigation accuracy characteristics.

Running R Statistical Computing Environment Software on the Peregrine

Science.gov Websites

for the development of new statistical methodologies and enjoys a large user base. Please consult the distribution details. Natural language support but running in an English locale R is a collaborative project programming paradigms to better leverage modern HPC systems. The CRAN task view for High Performance Computing
Simplified Distributed Computing

NASA Astrophysics Data System (ADS)

Li, G. G.

2006-05-01

The distributed computing runs from high performance parallel computing, GRID computing, to an environment where idle CPU cycles and storage space of numerous networked systems are harnessed to work together through the Internet. In this work we focus on building an easy and affordable solution for computationally intensive problems in scientific applications based on existing technology and hardware resources. This system consists of a series of controllers. When a job request is detected by a monitor or initialized by an end user, the job manager launches the specific job handler for this job. The job handler pre-processes the job, partitions the job into relative independent tasks, and distributes the tasks into the processing queue. The task handler picks up the related tasks, processes the tasks, and puts the results back into the processing queue. The job handler also monitors and examines the tasks and the results, and assembles the task results into the overall solution for the job request when all tasks are finished for each job. A resource manager configures and monitors all participating notes. A distributed agent is deployed on all participating notes to manage the software download and report the status. The processing queue is the key to the success of this distributed system. We use BEA's Weblogic JMS queue in our implementation. It guarantees the message delivery and has the message priority and re-try features so that the tasks never get lost. The entire system is built on the J2EE technology and it can be deployed on heterogeneous platforms. It can handle algorithms and applications developed in any languages on any platforms. J2EE adaptors are provided to manage and communicate the existing applications to the system so that the applications and algorithms running on Unix, Linux and Windows can all work together. This system is easy and fast to develop based on the industry's well-adopted technology. It is highly scalable and heterogeneous. It is an open system and any number and type of machines can join the system to provide the computational power. This asynchronous message-based system can achieve second of response time. For efficiency, communications between distributed tasks are often done at the start and end of the tasks but intermediate status of the tasks can also be provided.
Uncertainty quantification for environmental models

USGS Publications Warehouse

Hill, Mary C.; Lu, Dan; Kavetski, Dmitri; Clark, Martyn P.; Ye, Ming

2012-01-01

Environmental models are used to evaluate the fate of fertilizers in agricultural settings (including soil denitrification), the degradation of hydrocarbons at spill sites, and water supply for people and ecosystems in small to large basins and cities—to mention but a few applications of these models. They also play a role in understanding and diagnosing potential environmental impacts of global climate change. The models are typically mildly to extremely nonlinear. The persistent demand for enhanced dynamics and resolution to improve model realism [17] means that lengthy individual model execution times will remain common, notwithstanding continued enhancements in computer power. In addition, high-dimensional parameter spaces are often defined, which increases the number of model runs required to quantify uncertainty [2]. Some environmental modeling projects have access to extensive funding and computational resources; many do not. The many recent studies of uncertainty quantification in environmental model predictions have focused on uncertainties related to data error and sparsity of data, expert judgment expressed mathematically through prior information, poorly known parameter values, and model structure (see, for example, [1,7,9,10,13,18]). Approaches for quantifying uncertainty include frequentist (potentially with prior information [7,9]), Bayesian [13,18,19], and likelihood-based. A few of the numerous methods, including some sensitivity and inverse methods with consequences for understanding and quantifying uncertainty, are as follows: Bayesian hierarchical modeling and Bayesian model averaging; single-objective optimization with error-based weighting [7] and multi-objective optimization [3]; methods based on local derivatives [2,7,10]; screening methods like OAT (one at a time) and the method of Morris [14]; FAST (Fourier amplitude sensitivity testing) [14]; the Sobol' method [14]; randomized maximum likelihood [10]; Markov chain Monte Carlo (MCMC) [10]. There are also bootstrapping and cross-validation approaches.Sometimes analyses are conducted using surrogate models [12]. The availability of so many options can be confusing. Categorizing methods based on fundamental questions assists in communicating the essential results of uncertainty analyses to stakeholders. Such questions can focus on model adequacy (e.g., How well does the model reproduce observed system characteristics and dynamics?) and sensitivity analysis (e.g., What parameters can be estimated with available data? What observations are important to parameters and predictions? What parameters are important to predictions?), as well as on the uncertainty quantification (e.g., How accurate and precise are the predictions?). The methods can also be classified by the number of model runs required: few (10s to 1000s) or many (10,000s to 1,000,000s). Of the methods listed above, the most computationally frugal are generally those based on local derivatives; MCMC methods tend to be among the most computationally demanding. Surrogate models (emulators)do not necessarily produce computational frugality because many runs of the full model are generally needed to create a meaningful surrogate model. With this categorization, we can, in general, address all the fundamental questions mentioned above using either computationally frugal or demanding methods. Model development and analysis can thus be conducted consistently using either computation-ally frugal or demanding methods; alternatively, different fundamental questions can be addressed using methods that require different levels of effort. Based on this perspective, we pose the question: Can computationally frugal methods be useful companions to computationally demanding meth-ods? The reliability of computationally frugal methods generally depends on the model being reasonably linear, which usually means smooth nonlin-earities and the assumption of Gaussian errors; both tend to be more valid with more linear
High Resolution Nature Runs and the Big Data Challenge

NASA Technical Reports Server (NTRS)

Webster, W. Phillip; Duffy, Daniel Q.

2015-01-01

NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
Improved programs for DNA and protein sequence analysis on the IBM personal computer and other standard computer systems.

PubMed Central

Mount, D W; Conrad, B

1986-01-01

We have previously described programs for a variety of types of sequence analysis (1-4). These programs have now been integrated into a single package. They are written in the standard C programming language and run on virtually any computer system with a C compiler, such as the IBM/PC and other computers running under the MS/DOS and UNIX operating systems. The programs are widely distributed and may be obtained from the authors as described below. PMID:3753780
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

NASA Astrophysics Data System (ADS)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
NONMEMory: a run management tool for NONMEM.

PubMed

Wilkins, Justin J

2005-06-01

NONMEM is an extremely powerful tool for nonlinear mixed-effect modelling and simulation of pharmacokinetic and pharmacodynamic data. However, it is a console-based application whose output does not lend itself to rapid interpretation or efficient management. NONMEMory has been created to be a comprehensive project manager for NONMEM, providing detailed summary, comparison and overview of the runs comprising a given project, including the display of output data, simple post-run processing, fast diagnostic plots and run output management, complementary to other available modelling aids. Analysis time ought not to be spent on trivial tasks, and NONMEMory's role is to eliminate these as far as possible by increasing the efficiency of the modelling process. NONMEMory is freely available from http://www.uct.ac.za/depts/pha/nonmemory.php.
Processing, Cataloguing and Distribution of Uas Images in Near Real Time

NASA Astrophysics Data System (ADS)

Runkel, I.

2013-08-01

Why are UAS such a hype? UAS make the data capture flexible, fast and easy. For many applications this is more important than a perfect photogrammetric aerial image block. To ensure, that the advantage of a fast data capturing will be valid up to the end of the processing chain, all intermediate steps like data processing and data dissemination to the customer need to be flexible and fast as well. GEOSYSTEMS has established the whole processing workflow as server/client solution. This is the focus of the presentation. Depending on the image acquisition system the image data can be down linked during the flight to the data processing computer or it is stored on a mobile device and hooked up to the data processing computer after the flight campaign. The image project manager reads the data from the device and georeferences the images according to the position data. The meta data is converted into an ISO conform format and subsequently all georeferenced images are catalogued in the raster data management System ERDAS APOLLO. APOLLO provides the data, respectively the images as an OGC-conform services to the customer. Within seconds the UAV-images are ready to use for GIS application, image processing or direct interpretation via web applications - where ever you want. The whole processing chain is built in a generic manner. It can be adapted to a magnitude of applications. The UAV imageries can be processed and catalogued as single ortho imges or as image mosaic. Furthermore, image data of various cameras can be fusioned. By using WPS (web processing services) image enhancement, image analysis workflows like change detection layers can be calculated and provided to the image analysts. The processing of the WPS runs direct on the raster data management server. The image analyst has no data and no software on his local computer. This workflow is proven to be fast, stable and accurate. It is designed to support time critical applications for security demands - the images can be checked and interpreted in near real-time. For sensible areas it gives you the possibility to inform remote decision makers or interpretation experts in order to provide them situations awareness, wherever they are. For monitoring and inspection tasks it speeds up the process of data capture and data interpretation. The fully automated workflow of data pre-processing, data georeferencing, data cataloguing and data dissemination in near real time was developed based on the Intergraph products ERDAS IMAGINE, ERDAS APOLLO and GEOSYSTEMS METAmorph!IT. It is offered as adaptable solution by GEOSYSTEMS GmbH.
XPOSE: the Exxon Nuclear revised LEOPARD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skogen, F.B.

1975-04-01

Main differences between XPOSE and LEOPARD codes used to generate fast and thermal neutron spectra and cross sections are presented. Models used for fast and thermal spectrum calculations as well as the depletion calculations considering U-238 chain, U-235 chain, xenon and samarium, fission products and boron-10 are described. A detailed description of the input required to run XPOSE and a description of the output are included. (FS)
Adaptive responses of GLUT-4 and citrate synthase in fast-twitch muscle of voluntary running rats

NASA Technical Reports Server (NTRS)

Henriksen, E. J.; Halseth, A. E.

1995-01-01

Glucose transporter (GLUT-4) protein, hexokinase, and citrate synthase (proteins involved in oxidative energy production from blood glucose catabolism) increase in response to chronically elevated neuromuscular activity. It is currently unclear whether these proteins increase in a coordinated manner in response to this stimulus. Therefore, voluntary wheel running (WR) was used to chronically overload the fast-twitch rat plantaris muscle and the myocardium, and the early time courses of adaptative responses of GLUT-4 protein and the activities of hexokinase and citrate synthase were characterized and compared. Plantaris hexokinase activity increased 51% after just 1 wk of WR, whereas GLUT-4 and citrate synthase were increased by 51 and 40%, respectively, only after 2 wk of WR. All three variables remained comparably elevated (+50-64%) through 4 wk of WR. Despite the overload of the myocardium with this protocol, no substantial elevations in these variables were observed. These findings are consistent with a coordinated upregulation of GLUT-4 and citrate synthase in the fast-twitch plantaris, but not in the myocardium, in response to this increased neuromuscular activity. Regulation of hexokinase in fast-twitch muscle appears to be uncoupled from regulation of GLUT-4 and citrate synthase, as increases in the former are detectable well before increases in the latter.
A GPU OpenCL based cross-platform Monte Carlo dose calculation engine (goMC)

NASA Astrophysics Data System (ADS)

Tian, Zhen; Shi, Feng; Folkerts, Michael; Qin, Nan; Jiang, Steve B.; Jia, Xun

2015-09-01

Monte Carlo (MC) simulation has been recognized as the most accurate dose calculation method for radiotherapy. However, the extremely long computation time impedes its clinical application. Recently, a lot of effort has been made to realize fast MC dose calculation on graphic processing units (GPUs). However, most of the GPU-based MC dose engines have been developed under NVidia’s CUDA environment. This limits the code portability to other platforms, hindering the introduction of GPU-based MC simulations to clinical practice. The objective of this paper is to develop a GPU OpenCL based cross-platform MC dose engine named goMC with coupled photon-electron simulation for external photon and electron radiotherapy in the MeV energy range. Compared to our previously developed GPU-based MC code named gDPM (Jia et al 2012 Phys. Med. Biol. 57 7783-97), goMC has two major differences. First, it was developed under the OpenCL environment for high code portability and hence could be run not only on different GPU cards but also on CPU platforms. Second, we adopted the electron transport model used in EGSnrc MC package and PENELOPE’s random hinge method in our new dose engine, instead of the dose planning method employed in gDPM. Dose distributions were calculated for a 15 MeV electron beam and a 6 MV photon beam in a homogenous water phantom, a water-bone-lung-water slab phantom and a half-slab phantom. Satisfactory agreement between the two MC dose engines goMC and gDPM was observed in all cases. The average dose differences in the regions that received a dose higher than 10% of the maximum dose were 0.48-0.53% for the electron beam cases and 0.15-0.17% for the photon beam cases. In terms of efficiency, goMC was ~4-16% slower than gDPM when running on the same NVidia TITAN card for all the cases we tested, due to both the different electron transport models and the different development environments. The code portability of our new dose engine goMC was validated by successfully running it on a variety of different computing devices including an NVidia GPU card, two AMD GPU cards and an Intel CPU processor. Computational efficiency among these platforms was compared.
Tsunami hazard assessment in El Salvador, Central America, from seismic sources through flooding numerical models

NASA Astrophysics Data System (ADS)

Álvarez-Gómez, J. A.; Aniel-Quiroga, Í.; Gutiérrez-Gutiérrez, O. Q.; Larreynaga, J.; González, M.; Castro, M.; Gavidia, F.; Aguirre-Ayerbe, I.; González-Riancho, P.; Carreño, E.

2013-05-01

El Salvador is the smallest and most densely populated country in Central America; its coast has approximately a length of 320 km, 29 municipalities and more than 700 000 inhabitants. In El Salvador there have been 15 recorded tsunamis between 1859 and 2012, 3 of them causing damages and hundreds of victims. The hazard assessment is commonly based on propagation numerical models for earthquake-generated tsunamis and can be approached from both Probabilistic and Deterministic Methods. A deterministic approximation has been applied in this study as it provides essential information for coastal planning and management. The objective of the research was twofold, on the one hand the characterization of the threat over the entire coast of El Salvador, and on the other the computation of flooding maps for the three main localities of the Salvadorian coast. For the latter we developed high resolution flooding models. For the former, due to the extension of the coastal area, we computed maximum elevation maps and from the elevation in the near-shore we computed an estimation of the run-up and the flooded area using empirical relations. We have considered local sources located in the Middle America Trench, characterized seismotectonically, and distant sources in the rest of Pacific basin, using historical and recent earthquakes and tsunamis. We used a hybrid finite differences - finite volumes numerical model in this work, based on the Linear and Non-linear Shallow Water Equations, to simulate a total of 24 earthquake generated tsunami scenarios. In the western Salvadorian coast, run-up values higher than 5 m are common, while in the eastern area, approximately from La Libertad to the Gulf of Fonseca, the run-up values are lower. The more exposed areas to flooding are the lowlands in the Lempa River delta and the Barra de Santiago Western Plains. The results of the empirical approximation used for the whole country are similar to the results obtained with the high resolution numerical modelling, being a good and fast approximation to obtain preliminary tsunami hazard estimations. In Acajutla and La Libertad, both important tourism centres being actively developed, flooding depths between 2 and 4 m are frequent, accompanied with high and very high person instability hazard. Inside the Gulf of Fonseca the impact of the waves is almost negligible.
Tsunami hazard assessment in El Salvador, Central America, from seismic sources through flooding numerical models.

NASA Astrophysics Data System (ADS)

Álvarez-Gómez, J. A.; Aniel-Quiroga, Í.; Gutiérrez-Gutiérrez, O. Q.; Larreynaga, J.; González, M.; Castro, M.; Gavidia, F.; Aguirre-Ayerbe, I.; González-Riancho, P.; Carreño, E.

2013-11-01

El Salvador is the smallest and most densely populated country in Central America; its coast has an approximate length of 320 km, 29 municipalities and more than 700 000 inhabitants. In El Salvador there were 15 recorded tsunamis between 1859 and 2012, 3 of them causing damages and resulting in hundreds of victims. Hazard assessment is commonly based on propagation numerical models for earthquake-generated tsunamis and can be approached through both probabilistic and deterministic methods. A deterministic approximation has been applied in this study as it provides essential information for coastal planning and management. The objective of the research was twofold: on the one hand the characterization of the threat over the entire coast of El Salvador, and on the other the computation of flooding maps for the three main localities of the Salvadorian coast. For the latter we developed high-resolution flooding models. For the former, due to the extension of the coastal area, we computed maximum elevation maps, and from the elevation in the near shore we computed an estimation of the run-up and the flooded area using empirical relations. We have considered local sources located in the Middle America Trench, characterized seismotectonically, and distant sources in the rest of Pacific Basin, using historical and recent earthquakes and tsunamis. We used a hybrid finite differences-finite volumes numerical model in this work, based on the linear and non-linear shallow water equations, to simulate a total of 24 earthquake-generated tsunami scenarios. Our results show that at the western Salvadorian coast, run-up values higher than 5 m are common, while in the eastern area, approximately from La Libertad to the Gulf of Fonseca, the run-up values are lower. The more exposed areas to flooding are the lowlands in the Lempa River delta and the Barra de Santiago Western Plains. The results of the empirical approximation used for the whole country are similar to the results obtained with the high-resolution numerical modelling, being a good and fast approximation to obtain preliminary tsunami hazard estimations. In Acajutla and La Libertad, both important tourism centres being actively developed, flooding depths between 2 and 4 m are frequent, accompanied with high and very high person instability hazard. Inside the Gulf of Fonseca the impact of the waves is almost negligible.
A GPU OpenCL based cross-platform Monte Carlo dose calculation engine (goMC).

PubMed

Tian, Zhen; Shi, Feng; Folkerts, Michael; Qin, Nan; Jiang, Steve B; Jia, Xun

2015-10-07

Monte Carlo (MC) simulation has been recognized as the most accurate dose calculation method for radiotherapy. However, the extremely long computation time impedes its clinical application. Recently, a lot of effort has been made to realize fast MC dose calculation on graphic processing units (GPUs). However, most of the GPU-based MC dose engines have been developed under NVidia's CUDA environment. This limits the code portability to other platforms, hindering the introduction of GPU-based MC simulations to clinical practice. The objective of this paper is to develop a GPU OpenCL based cross-platform MC dose engine named goMC with coupled photon-electron simulation for external photon and electron radiotherapy in the MeV energy range. Compared to our previously developed GPU-based MC code named gDPM (Jia et al 2012 Phys. Med. Biol. 57 7783-97), goMC has two major differences. First, it was developed under the OpenCL environment for high code portability and hence could be run not only on different GPU cards but also on CPU platforms. Second, we adopted the electron transport model used in EGSnrc MC package and PENELOPE's random hinge method in our new dose engine, instead of the dose planning method employed in gDPM. Dose distributions were calculated for a 15 MeV electron beam and a 6 MV photon beam in a homogenous water phantom, a water-bone-lung-water slab phantom and a half-slab phantom. Satisfactory agreement between the two MC dose engines goMC and gDPM was observed in all cases. The average dose differences in the regions that received a dose higher than 10% of the maximum dose were 0.48-0.53% for the electron beam cases and 0.15-0.17% for the photon beam cases. In terms of efficiency, goMC was ~4-16% slower than gDPM when running on the same NVidia TITAN card for all the cases we tested, due to both the different electron transport models and the different development environments. The code portability of our new dose engine goMC was validated by successfully running it on a variety of different computing devices including an NVidia GPU card, two AMD GPU cards and an Intel CPU processor. Computational efficiency among these platforms was compared.
Third Congress on Information System Science and Technology

DTIC Science & Technology

1968-04-01

versions of the same compiler. The " fast compile-slow execute" and the "slow compile- fast execute" gimmick is the greatest hoax ever per- petrated on the... fast such natural language analysis and translation can be accomplished. If the fairly superficial syntactic anal- ysis of a sentence which is...two kinds of computer: a fast computer with large immediate access and bulk memory for rear echelon and large installation em- ployment, and a
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Real-time plasma control based on the ISTTOK tomography diagnostica)

NASA Astrophysics Data System (ADS)

Carvalho, P. J.; Carvalho, B. B.; Neto, A.; Coelho, R.; Fernandes, H.; Sousa, J.; Varandas, C.; Chávez-Alarcón, E.; Herrera-Velázquez, J. J. E.

2008-10-01

The presently available processing power in generic processing units (GPUs) combined with state-of-the-art programmable logic devices benefits the implementation of complex, real-time driven, data processing algorithms for plasma diagnostics. A tomographic reconstruction diagnostic has been developed for the ISTTOK tokamak, based on three linear pinhole cameras each with ten lines of sight. The plasma emissivity in a poloidal cross section is computed locally on a submillisecond time scale, using a Fourier-Bessel algorithm, allowing the use of the output signals for active plasma position control. The data acquisition and reconstruction (DAR) system is based on ATCA technology and consists of one acquisition board with integrated field programmable gate array (FPGA) capabilities and a dual-core Pentium module running real-time application interface (RTAI) Linux. In this paper, the DAR real-time firmware/software implementation is presented, based on (i) front-end digital processing in the FPGA; (ii) a device driver specially developed for the board which enables streaming data acquisition to the host GPU; and (iii) a fast reconstruction algorithm running in Linux RTAI. This system behaves as a module of the central ISTTOK control and data acquisition system (FIRESIGNAL). Preliminary results of the above experimental setup are presented and a performance benchmarking against the magnetic coil diagnostic is shown.
Leveraging FPGAs for Accelerating Short Read Alignment.

PubMed

Arram, James; Kaplan, Thomas; Luk, Wayne; Jiang, Peiyong

2017-01-01

One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper, we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to two mismatches. Our design is based on the FM-index, with optimizations to improve the alignment performance. In particular, the n-step FM-index, index oversampling, a seed-and-compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with eight Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and nine times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.
User's instructions for the cardiovascular Walters model

NASA Technical Reports Server (NTRS)

Croston, R. C.

1973-01-01

The model is a combined, steady-state cardiovascular and thermal model. It was originally developed for interactive use, but was converted to batch mode simulation for the Sigma 3 computer. The model has the purpose to compute steady-state circulatory and thermal variables in response to exercise work loads and environmental factors. During a computer simulation run, several selected variables are printed at each time step. End conditions are also printed at the completion of the run.
A Quantum Computing Approach to Model Checking for Advanced Manufacturing Problems

DTIC Science & Technology

2014-07-01

amount of time. In summary, the tool we developed succeeded in allowing us to produce good solutions for optimization problems that did not fit ...We compared the value of the objective obtained in each run with the known optimal value, and used this information to compute the probability of ...success for each given instance. Then we used this information to compute the expected number of repetitions (or runs) needed to obtain the optimal

Running climate model on a commercial cloud computing environment: A case study using Community Earth System Model (CESM) on Amazon AWS

NASA Astrophysics Data System (ADS)

Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock

2017-01-01

The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
Scilab software package for the study of dynamical systems

NASA Astrophysics Data System (ADS)

Bordeianu, C. C.; Beşliu, C.; Jipa, Al.; Felea, D.; Grossu, I. V.

2008-05-01

This work presents a new software package for the study of chaotic flows and maps. The codes were written using Scilab, a software package for numerical computations providing a powerful open computing environment for engineering and scientific applications. It was found that Scilab provides various functions for ordinary differential equation solving, Fast Fourier Transform, autocorrelation, and excellent 2D and 3D graphical capabilities. The chaotic behaviors of the nonlinear dynamics systems were analyzed using phase-space maps, autocorrelation functions, power spectra, Lyapunov exponents and Kolmogorov-Sinai entropy. Various well known examples are implemented, with the capability of the users inserting their own ODE. Program summaryProgram title: Chaos Catalogue identifier: AEAP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEAP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 885 No. of bytes in distributed program, including test data, etc.: 5925 Distribution format: tar.gz Programming language: Scilab 3.1.1 Computer: PC-compatible running Scilab on MS Windows or Linux Operating system: Windows XP, Linux RAM: below 100 Megabytes Classification: 6.2 Nature of problem: Any physical model containing linear or nonlinear ordinary differential equations (ODE). Solution method: Numerical solving of ordinary differential equations. The chaotic behavior of the nonlinear dynamical system is analyzed using Poincaré sections, phase-space maps, autocorrelation functions, power spectra, Lyapunov exponents and Kolmogorov-Sinai entropies. Restrictions: The package routines are normally able to handle ODE systems of high orders (up to order twelve and possibly higher), depending on the nature of the problem. Running time: 10 to 20 seconds for problems that do not involve Lyapunov exponents calculation; 60 to 1000 seconds for problems that involve high orders ODE and Lyapunov exponents calculation.
C-State: an interactive web app for simultaneous multi-gene visualization and comparative epigenetic pattern search.

PubMed

Sowpati, Divya Tej; Srivastava, Surabhi; Dhawan, Jyotsna; Mishra, Rakesh K

2017-09-13

Comparative epigenomic analysis across multiple genes presents a bottleneck for bench biologists working with NGS data. Despite the development of standardized peak analysis algorithms, the identification of novel epigenetic patterns and their visualization across gene subsets remains a challenge. We developed a fast and interactive web app, C-State (Chromatin-State), to query and plot chromatin landscapes across multiple loci and cell types. C-State has an interactive, JavaScript-based graphical user interface and runs locally in modern web browsers that are pre-installed on all computers, thus eliminating the need for cumbersome data transfer, pre-processing and prior programming knowledge. C-State is unique in its ability to extract and analyze multi-gene epigenetic information. It allows for powerful GUI-based pattern searching and visualization. We include a case study to demonstrate its potential for identifying user-defined epigenetic trends in context of gene expression profiles.
Pathgroups, a dynamic data structure for genome reconstruction problems.

PubMed

Zheng, Chunfang

2010-07-01

Ancestral gene order reconstruction problems, including the median problem, quartet construction, small phylogeny, guided genome halving and genome aliquoting, are NP hard. Available heuristics dedicated to each of these problems are computationally costly for even small instances. We present a data structure enabling rapid heuristic solution to all these ancestral genome reconstruction problems. A generic greedy algorithm with look-ahead based on an automatically generated priority system suffices for all the problems using this data structure. The efficiency of the algorithm is due to fast updating of the structure during run time and to the simplicity of the priority scheme. We illustrate with the first rapid algorithm for quartet construction and apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny. http://albuquerque.bioinformatics.uottawa.ca/pathgroup/Quartet.html chunfang313@gmail.com Supplementary data are available at Bioinformatics online.
Thriving on Chaos: The Development of a Surgical Information System

PubMed Central

Olund, Steven R.

1988-01-01

Hospitals present unique challenges to the computer industry, generating a greater quantity and variety of data than nearly any other enterprise. This is complicated by the fact that a hospital is not one homogenous organization, but a bundle of semi-independent groups with unique data requirements. Therefore hospital information systems must be fast, flexible, reliable, easy to use and maintain, and cost-effective. The Surgical Information System at Rush Presbyterian-St. Luke's Medical Center, Chicago is such system. It uses a Sequent Balance 21000 multi-processor superminicomputer, running industry standard tools such as the Unix operating system, a 4th generation programming language (4GL), and Structured Query Language (SQL) relational database management software. This treatise illustrates a comprehensive yet generic approach which can be applied to almost any clinical situation where access to patient data is required by a variety of medical professionals.
Designing the optimal shutter sequences for the flutter shutter imaging method

NASA Astrophysics Data System (ADS)

Jelinek, Jan

2010-04-01

Acquiring iris or face images of moving subjects at larger distances using a flash to prevent the motion blur quickly runs into eye safety concerns as the acquisition distance is increased. For that reason the flutter shutter method recently proposed by Raskar et al.has generated considerable interest in the biometrics community. The paper concerns the design of shutter sequences that produce the best images. The number of possible sequences grows exponentially in both the subject' s motion velocity and desired exposure value, with their majority being useless. Because the exact solution leads to an intractable mixed integer programming problem, we propose an approximate solution based on pre - screening the sequences according to the distribution of roots in their Fourier transform. A very fast algorithm utilizing the Jury' s criterion allows the testing to be done without explicitly computing the roots, making the approach practical for moderately long sequences.
Design of rapid prototype of UAV line-of-sight stabilized control system

NASA Astrophysics Data System (ADS)

Huang, Gang; Zhao, Liting; Li, Yinlong; Yu, Fei; Lin, Zhe

2018-01-01

The line-of-sight (LOS) stable platform is the most important technology of UAV (unmanned aerial vehicle), which can reduce the effect to imaging quality from vibration and maneuvering of the aircraft. According to the requirement of LOS stability system (inertial and optical-mechanical combined method) and UAV's structure, a rapid prototype is designed using based on industrial computer using Peripheral Component Interconnect (PCI) and Windows RTX to exchange information. The paper shows the control structure, and circuit system including the inertial stability control circuit with gyro and voice coil motor driven circuit, the optical-mechanical stability control circuit with fast-steering-mirror (FSM) driven circuit and image-deviation-obtained system, outer frame rotary follower, and information-exchange system on PC. Test results show the stability accuracy reaches 5μrad, and prove the effectiveness of the combined line-of-sight stabilization control system, and the real-time rapid prototype runs stable.
A Global Repository for Planet-Sized Experiments and Observations

NASA Technical Reports Server (NTRS)

Williams, Dean; Balaji, V.; Cinquini, Luca; Denvil, Sebastien; Duffy, Daniel; Evans, Ben; Ferraro, Robert D.; Hansen, Rose; Lautenschlager, Michael; Trenham, Claire

2016-01-01

Working across U.S. federal agencies, international agencies, and multiple worldwide data centers, and spanning seven international network organizations, the Earth System Grid Federation (ESGF) allows users to access, analyze, and visualize data using a globally federated collection of networks, computers, and software. Its architecture employs a system of geographically distributed peer nodes that are independently administered yet united by common federation protocols and application programming interfaces (APIs). The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the Coupled Model Intercomparison Project (CMIP) output used by the Intergovernmental Panel on Climate Change assessment reports. Data served by ESGF not only include model output (i.e., CMIP simulation runs) but also include observational data from satellites and instruments, reanalyses, and generated images. Metadata summarize basic information about the data for fast and easy data discovery.
k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and treesmore » determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less
Induction motor broken rotor bar fault location detection through envelope analysis of start-up current using Hilbert transform

NASA Astrophysics Data System (ADS)

Abd-el-Malek, Mina; Abdelsalam, Ahmed K.; Hassan, Ola E.

2017-09-01

Robustness, low running cost and reduced maintenance lead Induction Motors (IMs) to pioneerly penetrate the industrial drive system fields. Broken rotor bars (BRBs) can be considered as an important fault that needs to be early assessed to minimize the maintenance cost and labor time. The majority of recent BRBs' fault diagnostic techniques focus on differentiating between healthy and faulty rotor cage. In this paper, a new technique is proposed for detecting the location of the broken bar in the rotor. The proposed technique relies on monitoring certain statistical parameters estimated from the analysis of the start-up stator current envelope. The envelope of the signal is obtained using Hilbert Transformation (HT). The proposed technique offers non-invasive, fast computational and accurate location diagnostic process. Various simulation scenarios are presented that validate the effectiveness of the proposed technique.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gardner, Shea; Slezak, Tom

With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. Themore » SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.« less
Vectorized Rebinning Algorithm for Fast Data Down-Sampling

NASA Technical Reports Server (NTRS)

Dean, Bruce; Aronstein, David; Smith, Jeffrey

2013-01-01

A vectorized rebinning (down-sampling) algorithm, applicable to N-dimensional data sets, has been developed that offers a significant reduction in computer run time when compared to conventional rebinning algorithms. For clarity, a two-dimensional version of the algorithm is discussed to illustrate some specific details of the algorithm content, and using the language of image processing, 2D data will be referred to as "images," and each value in an image as a "pixel." The new approach is fully vectorized, i.e., the down-sampling procedure is done as a single step over all image rows, and then as a single step over all image columns. Data rebinning (or down-sampling) is a procedure that uses a discretely sampled N-dimensional data set to create a representation of the same data, but with fewer discrete samples. Such data down-sampling is fundamental to digital signal processing, e.g., for data compression applications.
Distance-based over-segmentation for single-frame RGB-D images

NASA Astrophysics Data System (ADS)

Fang, Zhuoqun; Wu, Chengdong; Chen, Dongyue; Jia, Tong; Yu, Xiaosheng; Zhang, Shihong; Qi, Erzhao

2017-11-01

Over-segmentation, known as super-pixels, is a widely used preprocessing step in segmentation algorithms. Oversegmentation algorithm segments an image into regions of perceptually similar pixels, but performs badly based on only color image in the indoor environments. Fortunately, RGB-D images can improve the performances on the images of indoor scene. In order to segment RGB-D images into super-pixels effectively, we propose a novel algorithm, DBOS (Distance-Based Over-Segmentation), which realizes full coverage of super-pixels on the image. DBOS fills the holes in depth images to fully utilize the depth information, and applies SLIC-like frameworks for fast running. Additionally, depth features such as plane projection distance are extracted to compute distance which is the core of SLIC-like frameworks. Experiments on RGB-D images of NYU Depth V2 dataset demonstrate that DBOS outperforms state-ofthe-art methods in quality while maintaining speeds comparable to them.
XRootD popularity on hadoop clusters

NASA Astrophysics Data System (ADS)

Meoni, Marco; Boccali, Tommaso; Magini, Nicolò; Menichetti, Luca; Giordano, Domenico; CMS Collaboration

2017-10-01

Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.
Design and Development of a Portable WiFi enabled BIA device

NASA Astrophysics Data System (ADS)

Križaj, D.; Baloh, M.; Brajkovič, R.; Žagar, T.

2013-04-01

A bioimpedance device (BIA) for evaluation of sarcopenia - age related muscle mass loss - is designed, developed and evaluated. The requirements were based on lightweight design, flexible and user enabled incorporation of measurement protocols and WiFi protocol for remote device control, full internet integration and fast development and usage of measurement protocols. The current design is based on usage of a microcontroller with integrated AD/DA converters. The prototype system was assembled and the operation and connectivity to different handheld devices and laptop computers was successfully tested. The designed BIA device can be accessed using TCP sockets and once the connection is established the data transfer runs successfully at the specified speed. The accuracy of currently developed prototype is about 5% for the impedance modulus and 5 deg. for the phase for the frequencies below 20 kHz with an unfiltered excitation signal and no additional amplifiers employed.
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression

PubMed Central

Wiedenhoeft, John; Brugel, Eric; Schliep, Alexander

2016-01-01

By integrating Haar wavelets with Hidden Markov Models, we achieve drastically reduced running times for Bayesian inference using Forward-Backward Gibbs sampling. We show that this improves detection of genomic copy number variants (CNV) in array CGH experiments compared to the state-of-the-art, including standard Gibbs sampling. The method concentrates computational effort on chromosomal segments which are difficult to call, by dynamically and adaptively recomputing consecutive blocks of observations likely to share a copy number. This makes routine diagnostic use and re-analysis of legacy data collections feasible; to this end, we also propose an effective automatic prior. An open source software implementation of our method is available at http://schlieplab.org/Software/HaMMLET/ (DOI: 10.5281/zenodo.46262). This paper was selected for oral presentation at RECOMB 2016, and an abstract is published in the conference proceedings. PMID:27177143
Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

NASA Astrophysics Data System (ADS)

Detrixhe, Miles; Gibou, Frédéric

2016-10-01

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Running Batch Jobs on Peregrine | High-Performance Computing | NREL

Science.gov Websites

Using Resource Feature to Request Different Node Types Peregrine has several types of compute nodes incompatibility and get the job running. More information about requesting different node types in Peregrine is available. Queues In order to meet the needs of different types of jobs, nodes on Peregrine are available
Host-Nation Operations: Soldier Training on Governance (HOST-G) Training Support Package

DTIC Science & Technology

2011-07-01

restricted this webpage from running scripts or ActiveX controls that could access your computer. Click here for options…” • If this occurs, select that...scripts and ActiveX controls can be useful, but active content might also harm your computer. Are you sure you want to let this file run active

24 CFR 15.110 - What fees will HUD charge?

Code of Federal Regulations, 2013 CFR

2013-04-01

... duplicating machinery. The computer run time includes the cost of operating a central processing unit for that... Applies. (6) Computer run time (includes only mainframe search time not printing) The direct cost of... estimated fee is more than $250.00 or you have a history of failing to pay FOIA fees to HUD in a timely...
Pyrolaser Operating System

NASA Technical Reports Server (NTRS)

Roberts, Floyd E., III

1994-01-01

Software provides for control and acquisition of data from optical pyrometer. There are six individual programs in PYROLASER package. Provides quick and easy way to set up, control, and program standard Pyrolaser. Temperature and emisivity measurements either collected as if Pyrolaser in manual operating mode or displayed on real-time strip charts and stored in standard spreadsheet format for posttest analysis. Shell supplied to allow macros, which are test-specific, added to system easily. Written using Labview software for use on Macintosh-series computers running System 6.0.3 or later, Sun Sparc-series computers running Open-Windows 3.0 or MIT's X Window System (X11R4 or X11R5), and IBM PC or compatible computers running Microsoft Windows 3.1 or later.
Analyses of requirements for computer control and data processing experiment subsystems. Volume 2: ATM experiment S-056 image data processing system software development

NASA Technical Reports Server (NTRS)

1972-01-01

The IDAPS (Image Data Processing System) is a user-oriented, computer-based, language and control system, which provides a framework or standard for implementing image data processing applications, simplifies set-up of image processing runs so that the system may be used without a working knowledge of computer programming or operation, streamlines operation of the image processing facility, and allows multiple applications to be run in sequence without operator interaction. The control system loads the operators, interprets the input, constructs the necessary parameters for each application, and cells the application. The overlay feature of the IBSYS loader (IBLDR) provides the means of running multiple operators which would otherwise overflow core storage.
Identification of Program Signatures from Cloud Computing System Telemetry Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nichols, Nicole M.; Greaves, Mark T.; Smith, William P.

Malicious cloud computing activity can take many forms, including running unauthorized programs in a virtual environment. Detection of these malicious activities while preserving the privacy of the user is an important research challenge. Prior work has shown the potential viability of using cloud service billing metrics as a mechanism for proxy identification of malicious programs. Previously this novel detection method has been evaluated in a synthetic and isolated computational environment. In this paper we demonstrate the ability of billing metrics to identify programs, in an active cloud computing environment, including multiple virtual machines running on the same hypervisor. The openmore » source cloud computing platform OpenStack, is used for private cloud management at Pacific Northwest National Laboratory. OpenStack provides a billing tool (Ceilometer) to collect system telemetry measurements. We identify four different programs running on four virtual machines under the same cloud user account. Programs were identified with up to 95% accuracy. This accuracy is dependent on the distinctiveness of telemetry measurements for the specific programs we tested. Future work will examine the scalability of this approach for a larger selection of programs to better understand the uniqueness needed to identify a program. Additionally, future work should address the separation of signatures when multiple programs are running on the same virtual machine.« less
Providing Assistive Technology Applications as a Service Through Cloud Computing.

PubMed

Mulfari, Davide; Celesti, Antonio; Villari, Massimo; Puliafito, Antonio

2015-01-01

Users with disabilities interact with Personal Computers (PCs) using Assistive Technology (AT) software solutions. Such applications run on a PC that a person with a disability commonly uses. However the configuration of AT applications is not trivial at all, especially whenever the user needs to work on a PC that does not allow him/her to rely on his / her AT tools (e.g., at work, at university, in an Internet point). In this paper, we discuss how cloud computing provides a valid technological solution to enhance such a scenario.With the emergence of cloud computing, many applications are executed on top of virtual machines (VMs). Virtualization allows us to achieve a software implementation of a real computer able to execute a standard operating system and any kind of application. In this paper we propose to build personalized VMs running AT programs and settings. By using the remote desktop technology, our solution enables users to control their customized virtual desktop environment by means of an HTML5-based web interface running on any computer equipped with a browser, whenever they are.
Increasing the speed of medical image processing in MatLab®

PubMed Central

Bister, M; Yap, CS; Ng, KH; Tok, CH

2007-01-01

MatLab® has often been considered an excellent environment for fast algorithm development but is generally perceived as slow and hence not fit for routine medical image processing, where large data sets are now available e.g., high-resolution CT image sets with typically hundreds of 512x512 slices. Yet, with proper programming practices – vectorization, pre-allocation and specialization – applications in MatLab® can run as fast as in C language. In this article, this point is illustrated with fast implementations of bilinear interpolation, watershed segmentation and volume rendering. PMID:21614269
Development of a GPU Compatible Version of the Fast Radiation Code RRTMG

NASA Astrophysics Data System (ADS)

Iacono, M. J.; Mlawer, E. J.; Berthiaume, D.; Cady-Pereira, K. E.; Suarez, M.; Oreopoulos, L.; Lee, D.

2012-12-01

The absorption of solar radiation and emission/absorption of thermal radiation are crucial components of the physics that drive Earth's climate and weather. Therefore, accurate radiative transfer calculations are necessary for realistic climate and weather simulations. Efficient radiation codes have been developed for this purpose, but their accuracy requirements still necessitate that as much as 30% of the computational time of a GCM is spent computing radiative fluxes and heating rates. The overall computational expense constitutes a limitation on a GCM's predictive ability if it becomes an impediment to adding new physics to or increasing the spatial and/or vertical resolution of the model. The emergence of Graphics Processing Unit (GPU) technology, which will allow the parallel computation of multiple independent radiative calculations in a GCM, will lead to a fundamental change in the competition between accuracy and speed. Processing time previously consumed by radiative transfer will now be available for the modeling of other processes, such as physics parameterizations, without any sacrifice in the accuracy of the radiative transfer. Furthermore, fast radiation calculations can be performed much more frequently and will allow the modeling of radiative effects of rapid changes in the atmosphere. The fast radiation code RRTMG, developed at Atmospheric and Environmental Research (AER), is utilized operationally in many dynamical models throughout the world. We will present the results from the first stage of an effort to create a version of the RRTMG radiation code designed to run efficiently in a GPU environment. This effort will focus on the RRTMG implementation in GEOS-5. RRTMG has an internal pseudo-spectral vector of length of order 100 that, when combined with the much greater length of the global horizontal grid vector from which the radiation code is called in GEOS-5, makes RRTMG/GEOS-5 particularly suited to achieving a significant speed improvement through GPU technology. This large number of independent cases will allow us to take full advantage of the computational power of the latest GPUs, ensuring that all thread cores in the GPU remain active, a key criterion for obtaining significant speedup. The CUDA (Compute Unified Device Architecture) Fortran compiler developed by PGI and Nvidia will allow us to construct this parallel implementation on the GPU while remaining in the Fortran language. This implementation will scale very well across various CUDA-supported GPUs such as the recently released Fermi Nvidia cards. We will present the computational speed improvements of the GPU-compatible code relative to the standard CPU-based RRTMG with respect to a very large and diverse suite of atmospheric profiles. This suite will also be utilized to demonstrate the minimal impact of the code restructuring on the accuracy of radiation calculations. The GPU-compatible version of RRTMG will be directly applicable to future versions of GEOS-5, but it is also likely to provide significant associated benefits for other GCMs that employ RRTMG.
Parallel and pipeline computation of fast unitary transforms

NASA Technical Reports Server (NTRS)

Fino, B. J.; Algazi, V. R.

1975-01-01

The letter discusses the parallel and pipeline organization of fast-unitary-transform algorithms such as the fast Fourier transform, and points out the efficiency of a combined parallel-pipeline processor of a transform such as the Haar transform, in which (2 to the n-th power) -1 hardware 'butterflies' generate a transform of order 2 to the n-th power every computation cycle.
Fast algorithm for computing complex number-theoretic transforms

NASA Technical Reports Server (NTRS)

Reed, I. S.; Liu, K. Y.; Truong, T. K.

1977-01-01

A high-radix FFT algorithm for computing transforms over FFT, where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
MATH77 - A LIBRARY OF MATHEMATICAL SUBPROGRAMS FOR FORTRAN 77, RELEASE 4.0

NASA Technical Reports Server (NTRS)

Lawson, C. L.

1994-01-01

MATH77 is a high quality library of ANSI FORTRAN 77 subprograms implementing contemporary algorithms for the basic computational processes of science and engineering. The portability of MATH77 meets the needs of present-day scientists and engineers who typically use a variety of computing environments. Release 4.0 of MATH77 contains 454 user-callable and 136 lower-level subprograms. Usage of the user-callable subprograms is described in 69 sections of the 416 page users' manual. The topics covered by MATH77 are indicated by the following list of chapter titles in the users' manual: Mathematical Functions, Pseudo-random Number Generation, Linear Systems of Equations and Linear Least Squares, Matrix Eigenvalues and Eigenvectors, Matrix Vector Utilities, Nonlinear Equation Solving, Curve Fitting, Table Look-Up and Interpolation, Definite Integrals (Quadrature), Ordinary Differential Equations, Minimization, Polynomial Rootfinding, Finite Fourier Transforms, Special Arithmetic , Sorting, Library Utilities, Character-based Graphics, and Statistics. Besides subprograms that are adaptations of public domain software, MATH77 contains a number of unique packages developed by the authors of MATH77. Instances of the latter type include (1) adaptive quadrature, allowing for exceptional generality in multidimensional cases, (2) the ordinary differential equations solver used in spacecraft trajectory computation for JPL missions, (3) univariate and multivariate table look-up and interpolation, allowing for "ragged" tables, and providing error estimates, and (4) univariate and multivariate derivative-propagation arithmetic. MATH77 release 4.0 is a subroutine library which has been carefully designed to be usable on any computer system that supports the full ANSI standard FORTRAN 77 language. It has been successfully implemented on a CRAY Y/MP computer running UNICOS, a UNISYS 1100 computer running EXEC 8, a DEC VAX series computer running VMS, a Sun4 series computer running SunOS, a Hewlett-Packard 720 computer running HP-UX, a Macintosh computer running MacOS, and an IBM PC compatible computer running MS-DOS. Accompanying the library is a set of 196 "demo" drivers that exercise all of the user-callable subprograms. The FORTRAN source code for MATH77 comprises 109K lines of code in 375 files with a total size of 4.5Mb. The demo drivers comprise 11K lines of code and 418K. Forty-four percent of the lines of the library code and 29% of those in the demo code are comment lines. The standard distribution medium for MATH77 is a .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 9track 1600 BPI magnetic tape in VAX BACKUP format and a TK50 tape cartridge in VAX BACKUP format. An electronic copy of the documentation is included on the distribution media. Previous releases of MATH77 have been used over a number of years in a variety of JPL applications. MATH77 Release 4.0 was completed in 1992. MATH77 is a copyrighted work with all copyright vested in NASA.
Computer Simulation of Great Lakes-St. Lawrence Seaway Icebreaker Requirements.

DTIC Science & Technology

1980-01-01

of Run No. 1 for Taconite Task Command ... ....... 6-41 6.22d Results of Run No. I for Oil Can Task Command ........ ... 6-42 6.22e Results of Run No...Port and Period for Run No. 2 ... .. ... ... 6-47 6.23c Results of Run No. 2 for Taconite Task Command ... ....... 6-48 6.23d Results of Run No. 2 for...6-53 6.24b Predicted Icebreaker Fleet by Home Port and Period for Run No. 3 6-54 6.24c Results of Run No. 3 for Taconite Task Command. ....... 6
Fast sweeping methods for hyperbolic systems of conservation laws at steady state II

NASA Astrophysics Data System (ADS)

Engquist, Björn; Froese, Brittany D.; Tsai, Yen-Hsi Richard

2015-04-01

The idea of using fast sweeping methods for solving stationary systems of conservation laws has previously been proposed for efficiently computing solutions with sharp shocks. We further develop these methods to allow for a more challenging class of problems including problems with sonic points, shocks originating in the interior of the domain, rarefaction waves, and two-dimensional systems. We show that fast sweeping methods can produce higher-order accuracy. Computational results validate the claims of accuracy, sharp shock curves, and optimal computational efficiency.
The rid-redundant procedure in C-Prolog

NASA Technical Reports Server (NTRS)

Chen, Huo-Yan; Wah, Benjamin W.

1987-01-01

C-Prolog can conveniently be used for logical inferences on knowledge bases. However, as similar to many search methods using backward chaining, a large number of redundant computation may be produced in recursive calls. To overcome this problem, the 'rid-redundant' procedure was designed to rid all redundant computations in running multi-recursive procedures. Experimental results obtained for C-Prolog on the Vax 11/780 computer show that there is an order of magnitude improvement in the running time and solvable problem size.
An Upgrade of the Aeroheating Software ''MINIVER''

NASA Technical Reports Server (NTRS)

Louderback, Pierce

2013-01-01

Detailed computational modeling: CFO often used to create and execute computational domains. Increasing complexity when moving from 20 to 30 geometries. Computational time increased as finer grids are used (accuracy). Strong tool, but takes time to set up and run. MINIVER: Uses theoretical and empirical correlations. Orders of magnitude faster to set up and run. Not as accurate as CFO, but gives reasonable estimations. MINIVER's Drawbacks: Rigid command-line interface. Lackluster, unorganized documentation. No central control; multiple versions exist and have diverged.
A Functional Description of the Geophysical Data Acquisition System

DTIC Science & Technology

1990-08-10

less than 50 SPS nor greater than 250 SPS 3.0 SENSORS/TRANSDUCERS 3.1 CHAPTER OVERVIEW Most of the research supported by GDAS has primarily involved two...signal for the computer. The SRUN signal from the computer is fed to a retriggerable oneshot multivibrator on the board. SRUN consists of a pulse train...that is present when the computer is running. The oneshot output drives the RUN lamp on the front panel. Finally, one pin on the board edge connector is
Network support for system initiated checkpoints

DOEpatents

Chen, Dong; Heidelberger, Philip

2013-01-29

A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Joseph; Pirrung, Meg; McCue, Lee Ann

FQC is software that facilitates quality control of FASTQ files by carrying out a QC protocol using FastQC, parsing results, and aggregating quality metrics into an interactive dashboard designed to richly summarize individual sequencing runs. The dashboard groups samples in dropdowns for navigation among the data sets, utilizes human-readable configuration files to manipulate the pages and tabs, and is extensible with CSV data.
FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool

DOE PAGES

Brown, Joseph; Pirrung, Meg; McCue, Lee Ann

2017-06-09

FQC is software that facilitates quality control of FASTQ files by carrying out a QC protocol using FastQC, parsing results, and aggregating quality metrics into an interactive dashboard designed to richly summarize individual sequencing runs. The dashboard groups samples in dropdowns for navigation among the data sets, utilizes human-readable configuration files to manipulate the pages and tabs, and is extensible with CSV data.
Convergence properties of simple genetic algorithms

NASA Technical Reports Server (NTRS)

Bethke, A. D.; Zeigler, B. P.; Strauss, D. M.

1974-01-01

The essential parameters determining the behaviour of genetic algorithms were investigated. Computer runs were made while systematically varying the parameter values. Results based on the progress curves obtained from these runs are presented along with results based on the variability of the population as the run progresses.
Modeling Subsurface Reactive Flows Using Leadership-Class Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mills, Richard T; Hammond, Glenn; Lichtner, Peter

2009-01-01

We describe our experiences running PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - on leadership-class supercomputers, including initial experiences running on the petaflop incarnation of Jaguar, the Cray XT5 at the National Center for Computational Sciences at Oak Ridge National Laboratory. PFLOTRAN utilizes fully implicit time-stepping and is built on top of the Portable, Extensible Toolkit for Scientific Computation (PETSc). We discuss some of the hurdles to 'at scale' performance with PFLOTRAN and the progress we have made in overcoming them on leadership-class computer architectures.

Molecular Diagnostics for the Study of Hypersonic Flows

DTIC Science & Technology

2000-04-01

between the at the F4 high-enthalpy wind tunnel [21]. Figure 5 electrodes. The fast electrons exit the anode disk shows the image acquired 90 ms after...Discharge Figure 5 Typical F4 run, flow at 90 ms , Grounded Electrode convection imaged 5 jis after beam emission. Figure 4 Schematic diagram of the...accounts for the classical phenomena like absorption and Figure 6 Velocity profile at 90 ms for run of refraction. X(2) is the second-order
Non-exchangeability of running vs. other exercise in their association with adiposity, and its implications for public health recommendations.

PubMed

Williams, Paul T

2012-01-01

Current physical activity recommendations assume that different activities can be exchanged to produce the same weight-control benefits so long as total energy expended remains the same (exchangeability premise). To this end, they recommend calculating energy expenditure as the product of the time spent performing each activity and the activity's metabolic equivalents (MET), which may be summed to achieve target levels. The validity of the exchangeability premise was assessed using data from the National Runners' Health Study. Physical activity dose was compared to body mass index (BMI) and body circumferences in 33,374 runners who reported usual distance run and pace, and usual times spent running and other exercises per week. MET hours per day (METhr/d) from running was computed from: a) time and intensity, and b) reported distance run (1.02 MET • hours per km). When computed from time and intensity, the declines (slope±SE) per METhr/d were significantly greater (P<10(-15)) for running than non-running exercise for BMI (slopes±SE, male: -0.12 ± 0.00 vs. 0.00±0.00; female: -0.12 ± 0.00 vs. -0.01 ± 0.01 kg/m(2) per METhr/d) and waist circumference (male: -0.28 ± 0.01 vs. -0.07±0.01; female: -0. 31±0.01 vs. -0.05 ± 0.01 cm per METhr/d). Reported METhr/d of running was 38% to 43% greater when calculated from time and intensity than distance. Moreover, the declines per METhr/d run were significantly greater when estimated from reported distance for BMI (males: -0.29 ± 0.01; females: -0.27 ± 0.01 kg/m(2) per METhr/d) and waist circumference (males: -0.67 ± 0.02; females: -0.69 ± 0.02 cm per METhr/d) than when computed from time and intensity (cited above). The exchangeability premise was not supported for running vs. non-running exercise. Moreover, distance-based running prescriptions may provide better weight control than time-based prescriptions for running or other activities. Additional longitudinal studies and randomized clinical trials are required to verify these results prospectively.
A PICKSC Science Gateway for enabling the common plasma physicist to run kinetic software

NASA Astrophysics Data System (ADS)

Hu, Q.; Winjum, B. J.; Zonca, A.; Youn, C.; Tsung, F. S.; Mori, W. B.

2017-10-01

Computer simulations offer tremendous opportunities for studying plasmas, ranging from simulations for students that illuminate fundamental educational concepts to research-level simulations that advance scientific knowledge. Nevertheless, there is a significant hurdle to using simulation tools. Users must navigate codes and software libraries, determine how to wrangle output into meaningful plots, and oftentimes confront a significant cyberinfrastructure with powerful computational resources. Science gateways offer a Web-based environment to run simulations without needing to learn or manage the underlying software and computing cyberinfrastructure. We discuss our progress on creating a Science Gateway for the Particle-in-Cell and Kinetic Simulation Software Center that enables users to easily run and analyze kinetic simulations with our software. We envision that this technology could benefit a wide range of plasma physicists, both in the use of our simulation tools as well as in its adaptation for running other plasma simulation software. Supported by NSF under Grant ACI-1339893 and by the UCLA Institute for Digital Research and Education.
Sub-second pencil beam dose calculation on GPU for adaptive proton therapy.

PubMed

da Silva, Joakim; Ansorge, Richard; Jena, Rajesh

2015-06-21

Although proton therapy delivered using scanned pencil beams has the potential to produce better dose conformity than conventional radiotherapy, the created dose distributions are more sensitive to anatomical changes and patient motion. Therefore, the introduction of adaptive treatment techniques where the dose can be monitored as it is being delivered is highly desirable. We present a GPU-based dose calculation engine relying on the widely used pencil beam algorithm, developed for on-line dose calculation. The calculation engine was implemented from scratch, with each step of the algorithm parallelized and adapted to run efficiently on the GPU architecture. To ensure fast calculation, it employs several application-specific modifications and simplifications, and a fast scatter-based implementation of the computationally expensive kernel superposition step. The calculation time for a skull base treatment plan using two beam directions was 0.22 s on an Nvidia Tesla K40 GPU, whereas a test case of a cubic target in water from the literature took 0.14 s to calculate. The accuracy of the patient dose distributions was assessed by calculating the γ-index with respect to a gold standard Monte Carlo simulation. The passing rates were 99.2% and 96.7%, respectively, for the 3%/3 mm and 2%/2 mm criteria, matching those produced by a clinical treatment planning system.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
g_contacts: Fast contact search in bio-molecular ensemble data

NASA Astrophysics Data System (ADS)

Blau, Christian; Grubmuller, Helmut

2013-12-01

Short-range interatomic interactions govern many bio-molecular processes. Therefore, identifying close interaction partners in ensemble data is an essential task in structural biology and computational biophysics. A contact search can be cast as a typical range search problem for which efficient algorithms have been developed. However, none of those has yet been adapted to the context of macromolecular ensembles, particularly in a molecular dynamics (MD) framework. Here a set-decomposition algorithm is implemented which detects all contacting atoms or residues in maximum O(Nlog(N)) run-time, in contrast to the O(N2) complexity of a brute-force approach. Catalogue identifier: AEQA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQA_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 8945 No. of bytes in distributed program, including test data, etc.: 981604 Distribution format: tar.gz Programming language: C99. Computer: PC. Operating system: Linux. RAM: ≈Size of input frame Classification: 3, 4.14. External routines: Gromacs 4.6[1] Nature of problem: Finding atoms or residues that are closer to one another than a given cut-off. Solution method: Excluding distant atoms from distance calculations by decomposing the given set of atoms into disjoint subsets. Running time:≤O(Nlog(N)) References: [1] S. Pronk, S. Pall, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J.C. Smith, P. M. Kasson, D. van der Spoel, B. Hess and Erik Lindahl, Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics 29 (7) (2013).
Grape RNA-Seq analysis pipeline environment

PubMed Central

Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic

2013-01-01

Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413
Centralized Monitoring of the Microsoft Windows-based computers of the LHC Experiment Control Systems

NASA Astrophysics Data System (ADS)

Varela Rodriguez, F.

2011-12-01

The control system of each of the four major Experiments at the CERN Large Hadron Collider (LHC) is distributed over up to 160 computers running either Linux or Microsoft Windows. A quick response to abnormal situations of the computer infrastructure is crucial to maximize the physics usage. For this reason, a tool was developed to supervise, identify errors and troubleshoot such a large system. Although the monitoring of the performance of the Linux computers and their processes was available since the first versions of the tool, it is only recently that the software package has been extended to provide similar functionality for the nodes running Microsoft Windows as this platform is the most commonly used in the LHC detector control systems. In this paper, the architecture and the functionality of the Windows Management Instrumentation (WMI) client developed to provide centralized monitoring of the nodes running different flavour of the Microsoft platform, as well as the interface to the SCADA software of the control systems are presented. The tool is currently being commissioned by the Experiments and it has already proven to be very efficient optimize the running systems and to detect misbehaving processes or nodes.
Computational steering of GEM based detector simulations

NASA Astrophysics Data System (ADS)

Sheharyar, Ali; Bouhali, Othmane

2017-10-01

Gas based detector R&D relies heavily on full simulation of detectors and their optimization before final prototypes can be built and tested. These simulations in particular those with complex scenarios such as those involving high detector voltages or gas with larger gains are computationally intensive may take several days or weeks to complete. These long-running simulations usually run on the high-performance computers in batch mode. If the results lead to unexpected behavior, then the simulation might be rerun with different parameters. However, the simulations (or jobs) may have to wait in a queue until they get a chance to run again because the supercomputer is a shared resource that maintains a queue of other user programs as well and executes them as time and priorities permit. It may result in inefficient resource utilization and increase in the turnaround time for the scientific experiment. To overcome this issue, the monitoring of the behavior of a simulation, while it is running (or live), is essential. In this work, we employ the computational steering technique by coupling the detector simulations with a visualization package named VisIt to enable the exploration of the live data as it is produced by the simulation.
Closha: bioinformatics workflow system for the analysis of massive sequencing data.

PubMed

Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

2018-02-19

While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
CERN openlab: Engaging industry for innovation in the LHC Run 3-4 R&D programme

NASA Astrophysics Data System (ADS)

Girone, M.; Purcell, A.; Di Meglio, A.; Rademakers, F.; Gunne, K.; Pachou, M.; Pavlou, S.

2017-10-01

LHC Run3 and Run4 represent an unprecedented challenge for HEP computing in terms of both data volume and complexity. New approaches are needed for how data is collected and filtered, processed, moved, stored and analysed if these challenges are to be met with a realistic budget. To develop innovative techniques we are fostering relationships with industry leaders. CERN openlab is a unique resource for public-private partnership between CERN and leading Information Communication and Technology (ICT) companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide HEP community. In 2015, CERN openlab started its phase V with a strong focus on tackling the upcoming LHC challenges. Several R&D programs are ongoing in the areas of data acquisition, networks and connectivity, data storage architectures, computing provisioning, computing platforms and code optimisation and data analytics. This paper gives an overview of the various innovative technologies that are currently being explored by CERN openlab V and discusses the long-term strategies that are pursued by the LHC communities with the help of industry in closing the technological gap in processing and storage needs expected in Run3 and Run4.
A note on parallel and pipeline computation of fast unitary transforms

NASA Technical Reports Server (NTRS)

Fino, B. J.; Algazi, V. R.

1974-01-01

The parallel and pipeline organization of fast unitary transform algorithms such as the Fast Fourier Transform are discussed. The efficiency is pointed out of a combined parallel-pipeline processor of a transform such as the Haar transform in which 2 to the n minus 1 power hardware butterflies generate a transform of order 2 to the n power every computation cycle.
Memoized Symbolic Execution

NASA Technical Reports Server (NTRS)

Yang, Guowei; Pasareanu, Corina S.; Khurshid, Sarfraz

2012-01-01

This paper introduces memoized symbolic execution (Memoise), a novel approach for more efficient application of forward symbolic execution, which is a well-studied technique for systematic exploration of program behaviors based on bounded execution paths. Our key insight is that application of symbolic execution often requires several successive runs of the technique on largely similar underlying problems, e.g., running it once to check a program to find a bug, fixing the bug, and running it again to check the modified program. Memoise introduces a trie-based data structure that stores the key elements of a run of symbolic execution. Maintenance of the trie during successive runs allows re-use of previously computed results of symbolic execution without the need for re-computing them as is traditionally done. Experiments using our prototype embodiment of Memoise show the benefits it holds in various standard scenarios of using symbolic execution, e.g., with iterative deepening of exploration depth, to perform regression analysis, or to enhance coverage.
Computing the Power-Density Spectrum for an Engineering Model

NASA Technical Reports Server (NTRS)

Dunn, H. J.

1982-01-01

Computer program for calculating of power-density spectrum (PDS) from data base generated by Advanced Continuous Simulation Language (ACSL) uses algorithm that employs fast Fourier transform (FFT) to calculate PDS of variable. Accomplished by first estimating autocovariance function of variable and then taking FFT of smoothed autocovariance function to obtain PDS. Fast-Fourier-transform technique conserves computer resources.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI

USGS Publications Warehouse

Donato, David I.

2017-01-01

In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
Evaluating the Efficacy of the Cloud for Cluster Computation

NASA Technical Reports Server (NTRS)

Knight, David; Shams, Khawaja; Chang, George; Soderstrom, Tom

2012-01-01

Computing requirements vary by industry, and it follows that NASA and other research organizations have computing demands that fall outside the mainstream. While cloud computing made rapid inroads for tasks such as powering web applications, performance issues on highly distributed tasks hindered early adoption for scientific computation. One venture to address this problem is Nebula, NASA's homegrown cloud project tasked with delivering science-quality cloud computing resources. However, another industry development is Amazon's high-performance computing (HPC) instances on Elastic Cloud Compute (EC2) that promises improved performance for cluster computation. This paper presents results from a series of benchmarks run on Amazon EC2 and discusses the efficacy of current commercial cloud technology for running scientific applications across a cluster. In particular, a 240-core cluster of cloud instances achieved 2 TFLOPS on High-Performance Linpack (HPL) at 70% of theoretical computational performance. The cluster's local network also demonstrated sub-100 ?s inter-process latency with sustained inter-node throughput in excess of 8 Gbps. Beyond HPL, a real-world Hadoop image processing task from NASA's Lunar Mapping and Modeling Project (LMMP) was run on a 29 instance cluster to process lunar and Martian surface images with sizes on the order of tens of gigapixels. These results demonstrate that while not a rival of dedicated supercomputing clusters, commercial cloud technology is now a feasible option for moderately demanding scientific workloads.
Computational electromagnetics: the physics of smooth versus oscillatory fields.

PubMed

Chew, W C

2004-03-15

This paper starts by discussing the difference in the physics between solutions to Laplace's equation (static) and Maxwell's equations for dynamic problems (Helmholtz equation). Their differing physical characters are illustrated by how the two fields convey information away from their source point. The paper elucidates the fact that their differing physical characters affect the use of Laplacian field and Helmholtz field in imaging. They also affect the design of fast computational algorithms for electromagnetic scattering problems. Specifically, a comparison is made between fast algorithms developed using wavelets, the simple fast multipole method, and the multi-level fast multipole algorithm for electrodynamics. The impact of the physical characters of the dynamic field on the parallelization of the multi-level fast multipole algorithm is also discussed. The relationship of diagonalization of translators to group theory is presented. Finally, future areas of research for computational electromagnetics are described.
High-performance software-only H.261 video compression on PC

NASA Astrophysics Data System (ADS)

Kasperovich, Leonid

1996-03-01

This paper describes an implementation of a software H.261 codec for PC, that takes an advantage of the fast computational algorithms for DCT-based video compression, which have been presented by the author at the February's 1995 SPIE/IS&T meeting. The motivation for developing the H.261 prototype system is to demonstrate a feasibility of real time software- only videoconferencing solution to operate across a wide range of network bandwidth, frame rate, and resolution of the input video. As the bandwidths of current network technology will be increased, the higher frame rate and resolution of video to be transmitted is allowed, that requires, in turn, a software codec to be able to compress pictures of CIF (352 X 288) resolution at up to 30 frame/sec. Running on Pentium 133 MHz PC the codec presented is capable to compress video in CIF format at 21 - 23 frame/sec. This result is comparable to the known hardware-based H.261 solutions, but it doesn't require any specific hardware. The methods to achieve high performance, the program optimization technique for Pentium microprocessor along with the performance profile, showing the actual contribution of the different encoding/decoding stages to the overall computational process, are presented.
Computationally inexpensive approach for pitch control of offshore wind turbine on barge floating platform.

PubMed

Zuo, Shan; Song, Y D; Wang, Lei; Song, Qing-wang

2013-01-01

Offshore floating wind turbine (OFWT) has gained increasing attention during the past decade because of the offshore high-quality wind power and complex load environment. The control system is a tradeoff between power tracking and fatigue load reduction in the above-rated wind speed area. In allusion to the external disturbances and uncertain system parameters of OFWT due to the proximity to load centers and strong wave coupling, this paper proposes a computationally inexpensive robust adaptive control approach with memory-based compensation for blade pitch control. The method is tested and compared with a baseline controller and a conventional individual blade pitch controller with the "NREL offshore 5 MW baseline wind turbine" being mounted on a barge platform run on FAST and Matlab/Simulink, operating in the above-rated condition. It is shown that the advanced control approach is not only robust to complex wind and wave disturbances but adaptive to varying and uncertain system parameters as well. The simulation results demonstrate that the proposed method performs better in reducing power fluctuations, fatigue loads and platform vibration as compared to the conventional individual blade pitch control.

XS: a FASTQ read simulator.

PubMed

Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S

2014-01-16

The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.
Early MIMD experience on the CRAY X-MP

NASA Astrophysics Data System (ADS)

Rhoades, Clifford E.; Stevens, K. G.

1985-07-01

This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
The photoisomerization of a peptidic derivative of azobenzene: A nonadiabatic dynamics simulation of a supramolecular system

NASA Astrophysics Data System (ADS)

Ciminelli, Cosimo; Granucci, Giovanni; Persico, Maurizio

2008-06-01

The aim of this work is to investigate the mechanism of photoisomerization of an azobenzenic chromophore in a supramolecular environment, where the primary photochemical act produces important changes in the whole system. We have chosen a derivative of azobenzene, with two cyclopeptides attached in the para positions, linked by hydrogen bonds when the chromophore is in the cis geometry. We have run computational simulations of the cis → trans photoisomerization of such derivative of azobenzene, by means of a surface hopping method. The potential energy surfaces and nonadiabatic couplings are computed "on the fly" with a hybrid QM/MM strategy, in which the quantum mechanical subsystem is treated semiempirically. The simulations show that the photoisomerization is fast (about 200 fs) and occurs with high quantum yields, as in free azobenzene. However, the two cyclopeptides are not promptly separated, and the breaking of the hydrogen bonds requires longer times (at least several picoseconds), with the intervention of the solvent molecules (water). As a consequence, the resulting trans-azobenzene is severely distorted, and we show how its approach to the equilibrium geometry could be monitored by time-resolved absorption spectroscopy.
Computationally Inexpensive Approach for Pitch Control of Offshore Wind Turbine on Barge Floating Platform

PubMed Central

Zuo, Shan; Song, Y. D.; Wang, Lei; Song, Qing-wang

2013-01-01

Offshore floating wind turbine (OFWT) has gained increasing attention during the past decade because of the offshore high-quality wind power and complex load environment. The control system is a tradeoff between power tracking and fatigue load reduction in the above-rated wind speed area. In allusion to the external disturbances and uncertain system parameters of OFWT due to the proximity to load centers and strong wave coupling, this paper proposes a computationally inexpensive robust adaptive control approach with memory-based compensation for blade pitch control. The method is tested and compared with a baseline controller and a conventional individual blade pitch controller with the “NREL offshore 5 MW baseline wind turbine” being mounted on a barge platform run on FAST and Matlab/Simulink, operating in the above-rated condition. It is shown that the advanced control approach is not only robust to complex wind and wave disturbances but adaptive to varying and uncertain system parameters as well. The simulation results demonstrate that the proposed method performs better in reducing power fluctuations, fatigue loads and platform vibration as compared to the conventional individual blade pitch control. PMID:24453834
Maximal Neighbor Similarity Reveals Real Communities in Networks

PubMed Central

Žalik, Krista Rizman

2015-01-01

An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence. PMID:26680448
High skill in low-frequency climate response through fluctuation dissipation theorems despite structural instability.

PubMed

Majda, Andrew J; Abramov, Rafail; Gershgorin, Boris

2010-01-12

Climate change science focuses on predicting the coarse-grained, planetary-scale, longtime changes in the climate system due to either changes in external forcing or internal variability, such as the impact of increased carbon dioxide. The predictions of climate change science are carried out through comprehensive, computational atmospheric, and oceanic simulation models, which necessarily parameterize physical features such as clouds, sea ice cover, etc. Recently, it has been suggested that there is irreducible imprecision in such climate models that manifests itself as structural instability in climate statistics and which can significantly hamper the skill of computer models for climate change. A systematic approach to deal with this irreducible imprecision is advocated through algorithms based on the Fluctuation Dissipation Theorem (FDT). There are important practical and computational advantages for climate change science when a skillful FDT algorithm is established. The FDT response operator can be utilized directly for multiple climate change scenarios, multiple changes in forcing, and other parameters, such as damping and inverse modelling directly without the need of running the complex climate model in each individual case. The high skill of FDT in predicting climate change, despite structural instability, is developed in an unambiguous fashion using mathematical theory as guidelines in three different test models: a generic class of analytical models mimicking the dynamical core of the computer climate models, reduced stochastic models for low-frequency variability, and models with a significant new type of irreducible imprecision involving many fast, unstable modes.
Conjugate Compressible Fluid Flow and Heat Transfer in Ducts

NASA Technical Reports Server (NTRS)

Cross, M. F.

2011-01-01

A computational approach to modeling transient, compressible fluid flow with heat transfer in long, narrow ducts is presented. The primary application of the model is for analyzing fluid flow and heat transfer in solid propellant rocket motor nozzle joints during motor start-up, but the approach is relevant to a wide range of analyses involving rapid pressurization and filling of ducts. Fluid flow is modeled through solution of the spatially one-dimensional, transient Euler equations. Source terms are included in the governing equations to account for the effects of wall friction and heat transfer. The equation solver is fully-implicit, thus providing greater flexibility than an explicit solver. This approach allows for resolution of pressure wave effects on the flow as well as for fast calculation of the steady-state solution when a quasi-steady approach is sufficient. Solution of the one-dimensional Euler equations with source terms significantly reduces computational run times compared to general purpose computational fluid dynamics packages solving the Navier-Stokes equations with resolved boundary layers. In addition, conjugate heat transfer is more readily implemented using the approach described in this paper than with most general purpose computational fluid dynamics packages. The compressible flow code has been integrated with a transient heat transfer solver to analyze heat transfer between the fluid and surrounding structure. Conjugate fluid flow and heat transfer solutions are presented. The author is unaware of any previous work available in the open literature which uses the same approach described in this paper.
Methods for performing fast discrete curvelet transforms of data

DOEpatents

Candes, Emmanuel; Donoho, David; Demanet, Laurent

2010-11-23

Fast digital implementations of the second generation curvelet transform for use in data processing are disclosed. One such digital transformation is based on unequally-spaced fast Fourier transforms (USFFT) while another is based on the wrapping of specially selected Fourier samples. Both digital transformations return a table of digital curvelet coefficients indexed by a scale parameter, an orientation parameter, and a spatial location parameter. Both implementations are fast in the sense that they run in about O(n.sup.2 log n) flops for n by n Cartesian arrays or about O(N log N) flops for Cartesian arrays of size N=n.sup.3; in addition, they are also invertible, with rapid inversion algorithms of about the same complexity.
Controlling Laboratory Processes From A Personal Computer

NASA Technical Reports Server (NTRS)

Will, H.; Mackin, M. A.

1991-01-01

Computer program provides natural-language process control from IBM PC or compatible computer. Sets up process-control system that either runs without operator or run by workers who have limited programming skills. Includes three smaller programs. Two of them, written in FORTRAN 77, record data and control research processes. Third program, written in Pascal, generates FORTRAN subroutines used by other two programs to identify user commands with device-driving routines written by user. Also includes set of input data allowing user to define user commands to be executed by computer. Requires personal computer operating under MS-DOS with suitable hardware interfaces to all controlled devices. Also requires FORTRAN 77 compiler and device drivers written by user.
Effects of Ramadan Fasting on Body Composition, Aerobic Performance and Lactate, Heart Rate and Perceptual Responses in Young Soccer Players

PubMed Central

Güvenç, Alpay

2011-01-01

The purpose of this study was to examine the effects of Ramadan fasting on body composition, aerobic exercise performance and blood lactate, heart rate and perceived exertion in regularly trained young soccer players. Sixteen male soccer players participated in this study. Mean age, stature, body mass and training age of the players were 17.4±1.2 years, 175.4±3.6 cm, 69.6±4.3 kg and 5.1±1.3 years, respectively. During the Ramadan period, all subjects voluntarily chose to follow the fasting guidelines and abstained from eating and drinking from sunrise to sunset. Body composition, hydration status, dietary intake and sleep duration were assessed on four occasions: before Ramadan, at the beginning of Ramadan, at the end of Ramadan and 2 weeks after the end of Ramadan. On each occasion, aerobic exercise performance and blood lactate, heart rate and rating of perceived exertion responses of players were also determined during an incremental running test. Repeated measures of ANOVA revealed that body mass, percentage of body fat, fat-free mass, hydration status, daily sleeping time and daily energy and macronutrient intake of players did not vary significantly throughout the study period (p>0.05). However, players experienced a small but significant decrease in skinfold thicknesses over the course of the study (p<0.05). Although ratings of perceived exertion at submaximal workloads increased during Ramadan (p<0.05), blood lactate and heart rate responses had decreased by the end of Ramadan (p<0.05). In line with these changes, peak running performance and running velocity at anaerobic threshold also improved by the end of Ramadan (p<0.05). Improvements in aerobic exercise performance with time were probably due to the effects of pre-season training program that was performed after the break of the fast (Iftar) during the month of Ramadan. The results of the present study suggest that if regular training regimen, body fluid balance, daily energy intake and sleep duration are maintained as before Ramadan, Ramadan fasting does not have detrimental effects on aerobic exercise performance or body composition in young soccer players. PMID:23486092
Effects of ramadan fasting on body composition, aerobic performance and lactate, heart rate and perceptual responses in young soccer players.

PubMed

Güvenç, Alpay

2011-09-01

The purpose of this study was to examine the effects of Ramadan fasting on body composition, aerobic exercise performance and blood lactate, heart rate and perceived exertion in regularly trained young soccer players. Sixteen male soccer players participated in this study. Mean age, stature, body mass and training age of the players were 17.4±1.2 years, 175.4±3.6 cm, 69.6±4.3 kg and 5.1±1.3 years, respectively. During the Ramadan period, all subjects voluntarily chose to follow the fasting guidelines and abstained from eating and drinking from sunrise to sunset. Body composition, hydration status, dietary intake and sleep duration were assessed on four occasions: before Ramadan, at the beginning of Ramadan, at the end of Ramadan and 2 weeks after the end of Ramadan. On each occasion, aerobic exercise performance and blood lactate, heart rate and rating of perceived exertion responses of players were also determined during an incremental running test. Repeated measures of ANOVA revealed that body mass, percentage of body fat, fat-free mass, hydration status, daily sleeping time and daily energy and macronutrient intake of players did not vary significantly throughout the study period (p>0.05). However, players experienced a small but significant decrease in skinfold thicknesses over the course of the study (p<0.05). Although ratings of perceived exertion at submaximal workloads increased during Ramadan (p<0.05), blood lactate and heart rate responses had decreased by the end of Ramadan (p<0.05). In line with these changes, peak running performance and running velocity at anaerobic threshold also improved by the end of Ramadan (p<0.05). Improvements in aerobic exercise performance with time were probably due to the effects of pre-season training program that was performed after the break of the fast (Iftar) during the month of Ramadan. The results of the present study suggest that if regular training regimen, body fluid balance, daily energy intake and sleep duration are maintained as before Ramadan, Ramadan fasting does not have detrimental effects on aerobic exercise performance or body composition in young soccer players.
A description of shock attenuation for children running.

PubMed

Mercer, John A; Dufek, Janet S; Mangus, Brent C; Rubley, Mack D; Bhanot, Kunal; Aldridge, Jennifer M

2010-01-01

A growing number of children are participating in organized sport activities, resulting in a concomitant increase in lower extremity injuries. Little is known about the impact generated when children are running or how this impact is attenuated in child runners. To describe shock attenuation characteristics for children running at different speeds on a treadmill and at a single speed over ground. Prospective cohort study. Biomechanics laboratory. Eleven boys (age = 10.5 +/- 0.9 years, height = 143.7 +/- 8.3 cm, mass = 39.4 +/- 10.9 kg) and 7 girls (age = 9.9 +/- 1.1 years, height = 136.2 +/- 7.7 cm, mass = 35.1 +/- 9.6 kg) participated. Participants completed 4 running conditions, including 3 treadmill (TM) running speeds (preferred, fast [0.5 m/s more than preferred], and slow [0.5 m/s less than preferred]) and 1 overground (OG) running speed. We measured leg peak impact acceleration (LgPk), head peak impact acceleration (HdPk), and shock attenuation (ratio of LgPk to HdPk). Shock attenuation (F(2,16) = 4.80, P = .01) was influenced by the interaction of speed and sex. Shock attenuation increased across speeds (slow, preferred, fast) for boys (P < .05) but not for girls (P > .05). Both LgPk (F(1,16) = 5.04, P = .04) and HdPk (F(1,16) = 6.04, P = .03) were different across speeds, and both were greater for girls than for boys. None of the dependent variables were influenced by the interaction of setting (TM, OG) and sex (P >or= .05). Shock attenuation (F(1,16) = 33.51, P < .001) and LgPk (F(1,16) = 31.54, P < .001) were different between TM and OG, and each was greater when running OG than on the TM, regardless of sex. Shock attenuation was between 66% and 76% for children running under a variety of conditions. Girls had greater peak impact accelerations at the leg and head levels than boys but achieved similar shock attenuation. We do not know how these shock attenuation characteristics are related to overuse injuries.
FAST - A multiprocessed environment for visualization of computational fluid dynamics

NASA Technical Reports Server (NTRS)

Bancroft, Gordon V.; Merritt, Fergus J.; Plessel, Todd C.; Kelaita, Paul G.; Mccabe, R. Kevin

1991-01-01

The paper presents the Flow Analysis Software Toolset (FAST) to be used for fluid-mechanics analysis. The design criteria for FAST including the minimization of the data path in the computational fluid-dynamics (CFD) process, consistent user interface, extensible software architecture, modularization, and the isolation of three-dimensional tasks from the application programmer are outlined. Each separate process communicates through the FAST Hub, while other modules such as FAST Central, NAS file input, CFD calculator, surface extractor and renderer, titler, tracer, and isolev might work together to generate the scene. An interprocess communication package making it possible for FAST to operate as a modular environment where resources could be shared among different machines as well as a single host is discussed.
A fast collocation method for a variable-coefficient nonlocal diffusion model

NASA Astrophysics Data System (ADS)

Wang, Che; Wang, Hong

2017-02-01

We develop a fast collocation scheme for a variable-coefficient nonlocal diffusion model, for which a numerical discretization would yield a dense stiffness matrix. The development of the fast method is achieved by carefully handling the variable coefficients appearing inside the singular integral operator and exploiting the structure of the dense stiffness matrix. The resulting fast method reduces the computational work from O (N3) required by a commonly used direct solver to O (Nlog ⁡ N) per iteration and the memory requirement from O (N2) to O (N). Furthermore, the fast method reduces the computational work of assembling the stiffness matrix from O (N2) to O (N). Numerical results are presented to show the utility of the fast method.
WinHPC System Programming | High-Performance Computing | NREL

Science.gov Websites

Programming WinHPC System Programming Learn how to build and run an MPI (message passing interface (mpi.h) and library (msmpi.lib) are. To build from the command line, run... Start > Intel Software Development Tools > Intel C++ Compiler Professional... > C++ Build Environment for applications running
Prostate Cancer Research Training Program

DTIC Science & Technology

2011-05-01

and the University of Iowa. These include exercise facilities (running, tennis, basketball, volleyball, handball /racquetball, weights, biking, and...swimming), local beaches , and museums (art, natural history, and sports). In addition, there are a large number of restaurants ranging from fast
Using the Time-Correlated Induced Fission Method to Simultaneously Measure the 235U Content and the Burnable Poison Content in LWR Fuel Assemblies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Root, M. A.; Menlove, H. O.; Lanza, R. C.

The uranium neutron coincidence collar uses thermal neutron interrogation to verify the 235U mass in low-enriched uranium (LEU) fuel assemblies in fuel fabrication facilities. Burnable poisons are commonly added to nuclear fuel to increase the lifetime of the fuel. The high thermal neutron absorption by these poisons reduces the active neutron signal produced by the fuel. Burnable poison correction factors or fast-mode runs with Cd liners can help compensate for this effect, but the correction factors rely on operator declarations of burnable poison content, and fast-mode runs are time-consuming. Finally, this paper describes a new analysis method to measure themore » 235U mass and burnable poison content in LEU nuclear fuel simultaneously in a timely manner, without requiring additional hardware.« less
Using the Time-Correlated Induced Fission Method to Simultaneously Measure the 235U Content and the Burnable Poison Content in LWR Fuel Assemblies

DOE PAGES

Root, M. A.; Menlove, H. O.; Lanza, R. C.; ...

2018-03-21

The uranium neutron coincidence collar uses thermal neutron interrogation to verify the 235U mass in low-enriched uranium (LEU) fuel assemblies in fuel fabrication facilities. Burnable poisons are commonly added to nuclear fuel to increase the lifetime of the fuel. The high thermal neutron absorption by these poisons reduces the active neutron signal produced by the fuel. Burnable poison correction factors or fast-mode runs with Cd liners can help compensate for this effect, but the correction factors rely on operator declarations of burnable poison content, and fast-mode runs are time-consuming. Finally, this paper describes a new analysis method to measure themore » 235U mass and burnable poison content in LEU nuclear fuel simultaneously in a timely manner, without requiring additional hardware.« less
Running key mapping in a quantum stream cipher by the Yuen 2000 protocol

NASA Astrophysics Data System (ADS)

Shimizu, Tetsuya; Hirota, Osamu; Nagasako, Yuki

2008-03-01

A quantum stream cipher by Yuen 2000 protocol (so-called Y00 protocol or αη scheme) consisting of linear feedback shift register of short key is very attractive in implementing secure 40 Gbits/s optical data transmission, which is expected as a next-generation network. However, a basic model of the Y00 protocol with a very short key needs a careful design against fast correlation attacks as pointed out by Donnet This Brief Report clarifies an effectiveness of irregular mapping between running key and physical signals in the driver for selection of M -ary basis in the transmitter, and gives a design method. Consequently, quantum stream cipher by the Y00 protocol with our mapping has immunity against the proposed fast correlation attacks on a basic model of the Y00 protocol even if the key is very short.
Computer-based testing of the modified essay question: the Singapore experience.

PubMed

Lim, Erle Chuen-Hian; Seet, Raymond Chee-Seong; Oh, Vernon M S; Chia, Boon-Lock; Aw, Marion; Quak, Seng-Hock; Ong, Benjamin K C

2007-11-01

The modified essay question (MEQ), featuring an evolving case scenario, tests a candidate's problem-solving and reasoning ability, rather than mere factual recall. Although it is traditionally conducted as a pen-and-paper examination, our university has run the MEQ using computer-based testing (CBT) since 2003. We describe our experience with running the MEQ examination using the IVLE, or integrated virtual learning environment (https://ivle.nus.edu.sg), provide a blueprint for universities intending to conduct computer-based testing of the MEQ, and detail how our MEQ examination has evolved since its inception. An MEQ committee, comprising specialists in key disciplines from the departments of Medicine and Paediatrics, was formed. We utilized the IVLE, developed for our university in 1998, as the online platform on which we ran the MEQ. We calculated the number of man-hours (academic and support staff) required to run the MEQ examination, using either a computer-based or pen-and-paper format. With the support of our university's information technology (IT) specialists, we have successfully run the MEQ examination online, twice a year, since 2003. Initially, we conducted the examination with short-answer questions only, but have since expanded the MEQ examination to include multiple-choice and extended matching questions. A total of 1268 man-hours was spent in preparing for, and running, the MEQ examination using CBT, compared to 236.5 man-hours to run it using a pen-and-paper format. Despite being more labour-intensive, our students and staff prefer CBT to the pen-and-paper format. The MEQ can be conducted using a computer-based testing scenario, which offers several advantages over a pen-and-paper format. We hope to increase the number of questions and incorporate audio and video files, featuring clinical vignettes, to the MEQ examination in the near future.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.