large computational time: Topics by Science.gov

Sample records for large computational time

Coalescence computations for large samples drawn from populations of time-varying sizes

PubMed Central

Polanski, Andrzej; Szczesna, Agnieszka; Garbulowski, Mateusz; Kimmel, Marek

2017-01-01

We present new results concerning probability distributions of times in the coalescence tree and expected allele frequencies for coalescent with large sample size. The obtained results are based on computational methodologies, which involve combining coalescence time scale changes with techniques of integral transformations and using analytical formulae for infinite products. We show applications of the proposed methodologies for computing probability distributions of times in the coalescence tree and their limits, for evaluation of accuracy of approximate expressions for times in the coalescence tree and expected allele frequencies, and for analysis of large human mitochondrial DNA dataset. PMID:28170404
An evaluation of superminicomputers for thermal analysis

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Vidal, J. B.; Jones, G. K.

1962-01-01

The feasibility and cost effectiveness of solving thermal analysis problems on superminicomputers is demonstrated. Conventional thermal analysis and the changing computer environment, computer hardware and software used, six thermal analysis test problems, performance of superminicomputers (CPU time, accuracy, turnaround, and cost) and comparison with large computers are considered. Although the CPU times for superminicomputers were 15 to 30 times greater than the fastest mainframe computer, the minimum cost to obtain the solutions on superminicomputers was from 11 percent to 59 percent of the cost of mainframe solutions. The turnaround (elapsed) time is highly dependent on the computer load, but for large problems, superminicomputers produced results in less elapsed time than a typically loaded mainframe computer.
Advantages of Parallel Processing and the Effects of Communications Time

NASA Technical Reports Server (NTRS)

Eddy, Wesley M.; Allman, Mark

2000-01-01

Many computing tasks involve heavy mathematical calculations, or analyzing large amounts of data. These operations can take a long time to complete using only one computer. Networks such as the Internet provide many computers with the ability to communicate with each other. Parallel or distributed computing takes advantage of these networked computers by arranging them to work together on a problem, thereby reducing the time needed to obtain the solution. The drawback to using a network of computers to solve a problem is the time wasted in communicating between the various hosts. The application of distributed computing techniques to a space environment or to use over a satellite network would therefore be limited by the amount of time needed to send data across the network, which would typically take much longer than on a terrestrial network. This experiment shows how much faster a large job can be performed by adding more computers to the task, what role communications time plays in the total execution time, and the impact a long-delay network has on a distributed computing system.
Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Large Eddy Simulation in the Computation of Jet Noise

NASA Technical Reports Server (NTRS)

Mankbadi, R. R.; Goldstein, M. E.; Povinelli, L. A.; Hayder, M. E.; Turkel, E.

1999-01-01

Noise can be predicted by solving Full (time-dependent) Compressible Navier-Stokes Equation (FCNSE) with computational domain. The fluctuating near field of the jet produces propagating pressure waves that produce far-field sound. The fluctuating flow field as a function of time is needed in order to calculate sound from first principles. Noise can be predicted by solving the full, time-dependent, compressible Navier-Stokes equations with the computational domain extended to far field - but this is not feasible as indicated above. At high Reynolds number of technological interest turbulence has large range of scales. Direct numerical simulations (DNS) can not capture the small scales of turbulence. The large scales are more efficient than the small scales in radiating sound. The emphasize is thus on calculating sound radiated by large scales.
Advanced computer architecture for large-scale real-time applications.

DOT National Transportation Integrated Search

1973-04-01

Air traffic control automation is identified as a crucial problem which provides a complex, real-time computer application environment. A novel computer architecture in the form of a pipeline associative processor is conceived to achieve greater perf...
Optimized Laplacian image sharpening algorithm based on graphic processing unit

NASA Astrophysics Data System (ADS)

Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

2014-12-01

In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
An innovative computer design for modeling forest landscape change in very large spatial extents with fine resolutions

Treesearch

Jian Yang; Hong S. He; Stephen R. Shifley; Frank R. Thompson; Yangjian Zhang

2011-01-01

Although forest landscape models (FLMs) have benefited greatly from ongoing advances of computer technology and software engineering, computing capacity remains a bottleneck in the design and development of FLMs. Computer memory overhead and run time efficiency are primary limiting factors when applying forest landscape models to simulate large landscapes with fine...
Simultaneous analysis of large INTEGRAL/SPI1 datasets: Optimizing the computation of the solution and its variance using sparse matrix algorithms

NASA Astrophysics Data System (ADS)

Bouchet, L.; Amestoy, P.; Buttari, A.; Rouet, F.-H.; Chauvin, M.

2013-02-01

Nowadays, analyzing and reducing the ever larger astronomical datasets is becoming a crucial challenge, especially for long cumulated observation times. The INTEGRAL/SPI X/γ-ray spectrometer is an instrument for which it is essential to process many exposures at the same time in order to increase the low signal-to-noise ratio of the weakest sources. In this context, the conventional methods for data reduction are inefficient and sometimes not feasible at all. Processing several years of data simultaneously requires computing not only the solution of a large system of equations, but also the associated uncertainties. We aim at reducing the computation time and the memory usage. Since the SPI transfer function is sparse, we have used some popular methods for the solution of large sparse linear systems; we briefly review these methods. We use the Multifrontal Massively Parallel Solver (MUMPS) to compute the solution of the system of equations. We also need to compute the variance of the solution, which amounts to computing selected entries of the inverse of the sparse matrix corresponding to our linear system. This can be achieved through one of the latest features of the MUMPS software that has been partly motivated by this work. In this paper we provide a brief presentation of this feature and evaluate its effectiveness on astrophysical problems requiring the processing of large datasets simultaneously, such as the study of the entire emission of the Galaxy. We used these algorithms to solve the large sparse systems arising from SPI data processing and to obtain both their solutions and the associated variances. In conclusion, thanks to these newly developed tools, processing large datasets arising from SPI is now feasible with both a reasonable execution time and a low memory usage.
Vector computer memory bank contention

NASA Technical Reports Server (NTRS)

Bailey, D. H.

1985-01-01

A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Large-scale frequency- and time-domain quantum entanglement over the optical frequency comb (Conference Presentation)

NASA Astrophysics Data System (ADS)

Pfister, Olivier

2017-05-01

When it comes to practical quantum computing, the two main challenges are circumventing decoherence (devastating quantum errors due to interactions with the environmental bath) and achieving scalability (as many qubits as needed for a real-life, game-changing computation). We show that using, in lieu of qubits, the "qumodes" represented by the resonant fields of the quantum optical frequency comb of an optical parametric oscillator allows one to create bona fide, large scale quantum computing processors, pre-entangled in a cluster state. We detail our recent demonstration of 60-qumode entanglement (out of an estimated 3000) and present an extension to combining this frequency-tagged with time-tagged entanglement, in order to generate an arbitrarily large, universal quantum computing processor.
Vector computer memory bank contention

NASA Technical Reports Server (NTRS)

Bailey, David H.

1987-01-01

A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing.

PubMed

Xu, Jason; Minin, Vladimir N

2015-07-01

Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes.
Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

PubMed Central

Xu, Jason; Minin, Vladimir N.

2016-01-01

Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes. PMID:26949377
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
Parametric Study of a YAV-8B Harrier in Ground Effect Using Time-Dependent Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Shishir, Pandya; Chaderjian, Neal; Ahmad, Jsaim; Kwak, Dochan (Technical Monitor)

2001-01-01

Flow simulations using the time-dependent Navier-Stokes equations remain a challenge for several reasons. Principal among them are the difficulty to accurately model complex flows, and the time needed to perform the computations. A parametric study of such complex problems is not considered practical due to the large cost associated with computing many time-dependent solutions. The computation time for each solution must be reduced in order to make a parametric study possible. With successful reduction of computation time, the issue of accuracy, and appropriateness of turbulence models will become more tractable.
DCL System Using Deep Learning Approaches for Land-Based or Ship-Based Real Time Recognition and Localization of Marine Mammals

DTIC Science & Technology

2015-09-30

Clark (2014), "Using High Performance Computing to Explore Large Complex Bioacoustic Soundscapes : Case Study for Right Whale Acoustics," Procedia...34Using High Performance Computing to Explore Large Complex Bioacoustic Soundscapes : Case Study for Right Whale Acoustics," Procedia Computer Science 20
Design for Run-Time Monitor on Cloud Computing

NASA Astrophysics Data System (ADS)

Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Optimisation of multiplet identifier processing on a PLAYSTATION® 3

NASA Astrophysics Data System (ADS)

Hattori, Masami; Mizuno, Takashi

2010-02-01

To enable high-performance computing (HPC) for applications with large datasets using a Sony® PLAYSTATION® 3 (PS3™) video game console, we configured a hybrid system consisting of a Windows® PC and a PS3™. To validate this system, we implemented the real-time multiplet identifier (RTMI) application, which identifies multiplets of microearthquakes in terms of the similarity of their waveforms. The cross-correlation computation, which is a core algorithm of the RTMI application, was optimised for the PS3™ platform, while the rest of the computation, including data input and output remained on the PC. With this configuration, the core part of the algorithm ran 69 times faster than the original program, accelerating total computation speed more than five times. As a result, the system processed up to 2100 total microseismic events, whereas the original implementation had a limit of 400 events. These results indicate that this system enables high-performance computing for large datasets using the PS3™, as long as data transfer time is negligible compared with computation time.

Computation of type curves for flow to partially penetrating wells in water-table aquifers

USGS Publications Warehouse

Moench, Allen F.

1993-01-01

Evaluation of Neuman's analytical solution for flow to a well in a homogeneous, anisotropic, water-table aquifer commonly requires large amounts of computation time and can produce inaccurate results for selected combinations of parameters. Large computation times occur because the integrand of a semi-infinite integral involves the summation of an infinite series. Each term of the series requires evaluation of the roots of equations, and the series itself is sometimes slowly convergent. Inaccuracies can result from lack of computer precision or from the use of improper methods of numerical integration. In this paper it is proposed to use a method of numerical inversion of the Laplace transform solution, provided by Neuman, to overcome these difficulties. The solution in Laplace space is simpler in form than the real-time solution; that is, the integrand of the semi-infinite integral does not involve an infinite series or the need to evaluate roots of equations. Because the integrand is evaluated rapidly, advanced methods of numerical integration can be used to improve accuracy with an overall reduction in computation time. The proposed method of computing type curves, for which a partially documented computer program (WTAQ1) was written, was found to reduce computation time by factors of 2 to 20 over the time needed to evaluate the closed-form, real-time solution.
The solution of large multi-dimensional Poisson problems

NASA Technical Reports Server (NTRS)

Stone, H. S.

1974-01-01

The Buneman algorithm for solving Poisson problems can be adapted to solve large Poisson problems on computers with a rotating drum memory so that the computation is done with very little time lost due to rotational latency of the drum.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
An S N Algorithm for Modern Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Randal Scott

2016-08-29

LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
RICIS research

NASA Technical Reports Server (NTRS)

Mckay, Charles W.; Feagin, Terry; Bishop, Peter C.; Hallum, Cecil R.; Freedman, Glenn B.

1987-01-01

The principle focus of one of the RICIS (Research Institute for Computing and Information Systems) components is computer systems and software engineering in-the-large of the lifecycle of large, complex, distributed systems which: (1) evolve incrementally over a long time; (2) contain non-stop components; and (3) must simultaneously satisfy a prioritized balance of mission and safety critical requirements at run time. This focus is extremely important because of the contribution of the scaling direction problem to the current software crisis. The Computer Systems and Software Engineering (CSSE) component addresses the lifestyle issues of three environments: host, integration, and target.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.

PubMed

Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T

2017-01-01

Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

PubMed

Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

2018-04-20

A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
An atomic orbital based real-time time-dependent density functional theory for computing electronic circular dichroism band spectra

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goings, Joshua J.; Li, Xiaosong, E-mail: xsli@uw.edu

2016-06-21

One of the challenges of interpreting electronic circular dichroism (ECD) band spectra is that different states may have different rotatory strength signs, determined by their absolute configuration. If the states are closely spaced and opposite in sign, observed transitions may be washed out by nearby states, unlike absorption spectra where transitions are always positive additive. To accurately compute ECD bands, it is necessary to compute a large number of excited states, which may be prohibitively costly if one uses the linear-response time-dependent density functional theory (TDDFT) framework. Here we implement a real-time, atomic-orbital based TDDFT method for computing the entiremore » ECD spectrum simultaneously. The method is advantageous for large systems with a high density of states. In contrast to previous implementations based on real-space grids, the method is variational, independent of nuclear orientation, and does not rely on pseudopotential approximations, making it suitable for computation of chiroptical properties well into the X-ray regime.« less
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

PubMed Central

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.

PubMed

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Particle behavior simulation in thermophoresis phenomena by direct simulation Monte Carlo method

NASA Astrophysics Data System (ADS)

Wada, Takao

2014-07-01

A particle motion considering thermophoretic force is simulated by using direct simulation Monte Carlo (DSMC) method. Thermophoresis phenomena, which occur for a particle size of 1 μm, are treated in this paper. The problem of thermophoresis simulation is computation time which is proportional to the collision frequency. Note that the time step interval becomes much small for the simulation considering the motion of large size particle. Thermophoretic forces calculated by DSMC method were reported, but the particle motion was not computed because of the small time step interval. In this paper, the molecule-particle collision model, which computes the collision between a particle and multi molecules in a collision event, is considered. The momentum transfer to the particle is computed with a collision weight factor, where the collision weight factor means the number of molecules colliding with a particle in a collision event. The large time step interval is adopted by considering the collision weight factor. Furthermore, the large time step interval is about million times longer than the conventional time step interval of the DSMC method when a particle size is 1 μm. Therefore, the computation time becomes about one-millionth. We simulate the graphite particle motion considering thermophoretic force by DSMC-Neutrals (Particle-PLUS neutral module) with above the collision weight factor, where DSMC-Neutrals is commercial software adopting DSMC method. The size and the shape of the particle are 1 μm and a sphere, respectively. The particle-particle collision is ignored. We compute the thermophoretic forces in Ar and H2 gases of a pressure range from 0.1 to 100 mTorr. The results agree well with Gallis' analytical results. Note that Gallis' analytical result for continuum limit is the same as Waldmann's result.
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

NASA Astrophysics Data System (ADS)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Efficiency analysis of numerical integrations for finite element substructure in real-time hybrid simulation

NASA Astrophysics Data System (ADS)

Wang, Jinting; Lu, Liqiao; Zhu, Fei

2018-01-01

Finite element (FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations (RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time (TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method (CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ (λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.
Towards the computation of time-periodic inertial range dynamics

NASA Astrophysics Data System (ADS)

van Veen, L.; Vela-Martín, A.; Kawahara, G.

2018-04-01

We explore the possibility of computing simple invariant solutions, like travelling waves or periodic orbits, in Large Eddy Simulation (LES) on a periodic domain with constant external forcing. The absence of material boundaries and the simple forcing mechanism make this system a comparatively simple target for the study of turbulent dynamics through invariant solutions. We show, that in spite of the application of eddy viscosity the computations are still rather challenging and must be performed on GPU cards rather than conventional coupled CPUs. We investigate the onset of turbulence in this system by means of bifurcation analysis, and present a long-period, large-amplitude unstable periodic orbit that is filtered from a turbulent time series. Although this orbit is computed on a coarse grid, with only a small separation between the integral scale and the LES filter length, the periodic dynamics seem to capture a regeneration process of the large-scale vortices.
Tracking of large-scale structures in turbulent channel with direct numerical simulation of low Prandtl number passive scalar

NASA Astrophysics Data System (ADS)

Tiselj, Iztok

2014-12-01

Channel flow DNS (Direct Numerical Simulation) at friction Reynolds number 180 and with passive scalars of Prandtl numbers 1 and 0.01 was performed in various computational domains. The "normal" size domain was ˜2300 wall units long and ˜750 wall units wide; size taken from the similar DNS of Moser et al. The "large" computational domain, which is supposed to be sufficient to describe the largest structures of the turbulent flows was 3 times longer and 3 times wider than the "normal" domain. The "very large" domain was 6 times longer and 6 times wider than the "normal" domain. All simulations were performed with the same spatial and temporal resolution. Comparison of the standard and large computational domains shows the velocity field statistics (mean velocity, root-mean-square (RMS) fluctuations, and turbulent Reynolds stresses) that are within 1%-2%. Similar agreement is observed for Pr = 1 temperature fields and can be observed also for the mean temperature profiles at Pr = 0.01. These differences can be attributed to the statistical uncertainties of the DNS. However, second-order moments, i.e., RMS temperature fluctuations of standard and large computational domains at Pr = 0.01 show significant differences of up to 20%. Stronger temperature fluctuations in the "large" and "very large" domains confirm the existence of the large-scale structures. Their influence is more or less invisible in the main velocity field statistics or in the statistics of the temperature fields at Prandtl numbers around 1. However, these structures play visible role in the temperature fluctuations at low Prandtl number, where high temperature diffusivity effectively smears the small-scale structures in the thermal field and enhances the relative contribution of large-scales. These large thermal structures represent some kind of an echo of the large scale velocity structures: the highest temperature-velocity correlations are not observed between the instantaneous temperatures and instantaneous streamwise velocities, but between the instantaneous temperatures and velocities averaged over certain time interval.
Bibliographic Automation of Large Library Operations Using a Time-Sharing System: Phase I. Final Report.

ERIC Educational Resources Information Center

Epstein, A. H.; And Others

The first phase of an ongoing library automation project at Stanford University is described. Project BALLOTS (Bibliographic Automation of Large Library Operations Using a Time-Sharing System) seeks to automate the acquisition and cataloging functions of a large library using an on-line time-sharing computer. The main objectives are to control…
Online Operation Guidance of Computer System Used in Real-Time Distance Education Environment

ERIC Educational Resources Information Center

He, Aiguo

2011-01-01

Computer system is useful for improving real time and interactive distance education activities. Especially in the case that a large number of students participate in one distance lecture together and every student uses their own computer to share teaching materials or control discussions over the virtual classrooms. The problem is that within…
Job Management and Task Bundling

NASA Astrophysics Data System (ADS)

Berkowitz, Evan; Jansen, Gustav R.; McElvain, Kenneth; Walker-Loud, André

2018-03-01

High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
Computing the universe: how large-scale simulations illuminate galaxies and dark energy

NASA Astrophysics Data System (ADS)

O'Shea, Brian

2015-04-01

High-performance and large-scale computing is absolutely to understanding astronomical objects such as stars, galaxies, and the cosmic web. This is because these are structures that operate on physical, temporal, and energy scales that cannot be reasonably approximated in the laboratory, and whose complexity and nonlinearity often defies analytic modeling. In this talk, I show how the growth of computing platforms over time has facilitated our understanding of astrophysical and cosmological phenomena, focusing primarily on galaxies and large-scale structure in the Universe.
Computation of steady nozzle flow by a time-dependent method

NASA Technical Reports Server (NTRS)

Cline, M. C.

1974-01-01

The equations of motion governing steady, inviscid flow are of a mixed type, that is, hyperbolic in the supersonic region and elliptic in the subsonic region. These mathematical difficulties may be removed by using the so-called time-dependent method, where the governing equations become hyperbolic everywhere. The steady-state solution may be obtained as the asymptotic solution for large time. The object of this research was to develop a production type computer program capable of solving converging, converging-diverging, and plug two-dimensional nozzle flows in computational times of 1 min or less on a CDC 6600 computer.

Networked Microcomputers--The Next Generation in College Computing.

ERIC Educational Resources Information Center

Harris, Albert L.

The evolution of computer hardware for college computing has mirrored the industry's growth. When computers were introduced into the educational environment, they had limited capacity and served one user at a time. Then came large mainframes with many terminals sharing the resource. Next, the use of computers in office automation emerged. As…
Fast distributed large-pixel-count hologram computation using a GPU cluster.

PubMed

Pan, Yuechao; Xu, Xuewu; Liang, Xinan

2013-09-10

Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.
Faster than Real-Time Dynamic Simulation for Large-Size Power System with Detailed Dynamic Models using High-Performance Computing Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Renke; Jin, Shuangshuang; Chen, Yousu

This paper presents a faster-than-real-time dynamic simulation software package that is designed for large-size power system dynamic simulation. It was developed on the GridPACKTM high-performance computing (HPC) framework. The key features of the developed software package include (1) faster-than-real-time dynamic simulation for a WECC system (17,000 buses) with different types of detailed generator, controller, and relay dynamic models, (2) a decoupled parallel dynamic simulation algorithm with optimized computation architecture to better leverage HPC resources and technologies, (3) options for HPC-based linear and iterative solvers, (4) hidden HPC details, such as data communication and distribution, to enable development centered on mathematicalmore » models and algorithms rather than on computational details for power system researchers, and (5) easy integration of new dynamic models and related algorithms into the software package.« less
A statics-dynamics equivalence through the fluctuation–dissipation ratio provides a window into the spin-glass phase from nonequilibrium measurements

PubMed Central

Baity-Jesi, Marco; Calore, Enrico; Cruz, Andres; Fernandez, Luis Antonio; Gil-Narvión, José Miguel; Gordillo-Guerrero, Antonio; Iñiguez, David; Maiorano, Andrea; Marinari, Enzo; Martin-Mayor, Victor; Monforte-Garcia, Jorge; Muñoz Sudupe, Antonio; Navarro, Denis; Parisi, Giorgio; Perez-Gaviro, Sergio; Ricci-Tersenghi, Federico; Ruiz-Lorenzo, Juan Jesus; Schifano, Sebastiano Fabio; Tarancón, Alfonso; Tripiccione, Raffaele; Yllanes, David

2017-01-01

We have performed a very accurate computation of the nonequilibrium fluctuation–dissipation ratio for the 3D Edwards–Anderson Ising spin glass, by means of large-scale simulations on the special-purpose computers Janus and Janus II. This ratio (computed for finite times on very large, effectively infinite, systems) is compared with the equilibrium probability distribution of the spin overlap for finite sizes. Our main result is a quantitative statics-dynamics dictionary, which could allow the experimental exploration of important features of the spin-glass phase without requiring uncontrollable extrapolations to infinite times or system sizes. PMID:28174274
Three-dimensional time dependent computation of turbulent flow

NASA Technical Reports Server (NTRS)

Kwak, D.; Reynolds, W. C.; Ferziger, J. H.

1975-01-01

The three-dimensional, primitive equations of motion are solved numerically for the case of isotropic box turbulence and the distortion of homogeneous turbulence by irrotational plane strain at large Reynolds numbers. A Gaussian filter is applied to governing equations to define the large scale field. This gives rise to additional second order computed scale stresses (Leonard stresses). The residual stresses are simulated through an eddy viscosity. Uniform grids are used, with a fourth order differencing scheme in space and a second order Adams-Bashforth predictor for explicit time stepping. The results are compared to the experiments and statistical information extracted from the computer generated data.
Quantum rendering

NASA Astrophysics Data System (ADS)

Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

2003-08-01

In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
Real Time Text Analysis

NASA Astrophysics Data System (ADS)

Senthilkumar, K.; Ruchika Mehra Vijayan, E.

2017-11-01

This paper aims to illustrate real time analysis of large scale data. For practical implementation we are performing sentiment analysis on live Twitter feeds for each individual tweet. To analyze sentiments we will train our data model on sentiWordNet, a polarity assigned wordNet sample by Princeton University. Our main objective will be to efficiency analyze large scale data on the fly using distributed computation. Apache Spark and Apache Hadoop eco system is used as distributed computation platform with Java as development language
FLAME: A platform for high performance computing of complex systems, applied for three case studies

DOE PAGES

Kiran, Mariam; Bicak, Mesude; Maleki-Dizaji, Saeedeh; ...

2011-01-01

FLAME allows complex models to be automatically parallelised on High Performance Computing (HPC) grids enabling large number of agents to be simulated over short periods of time. Modellers are hindered by complexities of porting models on parallel platforms and time taken to run large simulations on a single machine, which FLAME overcomes. Three case studies from different disciplines were modelled using FLAME, and are presented along with their performance results on a grid.
Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks

PubMed Central

Kaltenbacher, Barbara; Hasenauer, Jan

2017-01-01

Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics. PMID:28114351
Portals for Real-Time Earthquake Data and Forecasting: Challenge and Promise (Invited)

NASA Astrophysics Data System (ADS)

Rundle, J. B.; Holliday, J. R.; Graves, W. R.; Feltstykket, R.; Donnellan, A.; Glasscoe, M. T.

2013-12-01

Earthquake forecasts have been computed by a variety of countries world-wide for over two decades. For the most part, forecasts have been computed for insurance, reinsurance and underwriters of catastrophe bonds. However, recent events clearly demonstrate that mitigating personal risk is becoming the responsibility of individual members of the public. Open access to a variety of web-based forecasts, tools, utilities and information is therefore required. Portals for data and forecasts present particular challenges, and require the development of both apps and the client/server architecture to deliver the basic information in real time. The basic forecast model we consider is the Natural Time Weibull (NTW) method (JBR et al., Phys. Rev. E, 86, 021106, 2012). This model uses small earthquakes (';seismicity-based models') to forecast the occurrence of large earthquakes, via data-mining algorithms combined with the ANSS earthquake catalog. This method computes large earthquake probabilities using the number of small earthquakes that have occurred in a region since the last large earthquake. Localizing these forecasts in space so that global forecasts can be computed in real time presents special algorithmic challenges, which we describe in this talk. Using 25 years of data from the ANSS California-Nevada catalog of earthquakes, we compute real-time global forecasts at a grid scale of 0.1o. We analyze and monitor the performance of these models using the standard tests, which include the Reliability/Attributes and Receiver Operating Characteristic (ROC) tests. It is clear from much of the analysis that data quality is a major limitation on the accurate computation of earthquake probabilities. We discuss the challenges of serving up these datasets over the web on web-based platforms such as those at www.quakesim.org , www.e-decider.org , and www.openhazards.com.
Visual Analysis of Cloud Computing Performance Using Behavioral Lines.

PubMed

Muelder, Chris; Zhu, Biao; Chen, Wei; Zhang, Hongxin; Ma, Kwan-Liu

2016-02-29

Cloud computing is an essential technology to Big Data analytics and services. A cloud computing system is often comprised of a large number of parallel computing and storage devices. Monitoring the usage and performance of such a system is important for efficient operations, maintenance, and security. Tracing every application on a large cloud system is untenable due to scale and privacy issues. But profile data can be collected relatively efficiently by regularly sampling the state of the system, including properties such as CPU load, memory usage, network usage, and others, creating a set of multivariate time series for each system. Adequate tools for studying such large-scale, multidimensional data are lacking. In this paper, we present a visual based analysis approach to understanding and analyzing the performance and behavior of cloud computing systems. Our design is based on similarity measures and a layout method to portray the behavior of each compute node over time. When visualizing a large number of behavioral lines together, distinct patterns often appear suggesting particular types of performance bottleneck. The resulting system provides multiple linked views, which allow the user to interactively explore the data by examining the data or a selected subset at different levels of detail. Our case studies, which use datasets collected from two different cloud systems, show that this visual based approach is effective in identifying trends and anomalies of the systems.
High-performance computing in image registration

NASA Astrophysics Data System (ADS)

Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro

2012-10-01

Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.
The Case for Modular Redundancy in Large-Scale High Performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engelmann, Christian; Ong, Hong Hoe; Scott, Stephen L

2009-01-01

Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to failure (MTTF) and a higher mean-time to recover (MTTR) than their predecessors. Modular redundancy is being used in many mission critical systems today to provide for resilience, such as for aerospace and command \\& control systems. The primary argument against modular redundancy for resilience in HPC has always been that the capability of a HPC system, and respective return on investment, would be significantly reduced. We argue that modular redundancy can significantly increasemore » compute node availability as it removes the impact of scale from single compute node MTTR. We further argue that single compute nodes can be much less reliable, and therefore less expensive, and still be highly available, if their MTTR/MTTF ratio is maintained.« less
Large holographic displays for real-time applications

NASA Astrophysics Data System (ADS)

Schwerdtner, A.; Häussler, R.; Leister, N.

2008-02-01

Holography is generally accepted as the ultimate approach to display three-dimensional scenes or objects. Principally, the reconstruction of an object from a perfect hologram would appear indistinguishable from viewing the corresponding real-world object. Up to now two main obstacles have prevented large-screen Computer-Generated Holograms (CGH) from achieving a satisfactory laboratory prototype not to mention a marketable one. The reason is a small cell pitch CGH resulting in a huge number of hologram cells and a very high computational load for encoding the CGH. These seemingly inevitable technological hurdles for a long time have not been cleared limiting the use of holography to special applications, such as optical filtering, interference, beam forming, digital holography for capturing the 3-D shape of objects, and others. SeeReal Technologies has developed a new approach for real-time capable CGH using the socalled Tracked Viewing Windows technology to overcome these problems. The paper will show that today's state of the art reconfigurable Spatial Light Modulators (SLM), especially today's feasible LCD panels are suited for reconstructing large 3-D scenes which can be observed from large viewing angles. For this to achieve the original holographic concept of containing information from the entire scene in each part of the CGH has been abandoned. This substantially reduces the hologram resolution and thus the computational load by several orders of magnitude making thus real-time computation possible. A monochrome real-time prototype measuring 20 inches has been built and demonstrated at last year's SID conference and exhibition 2007 and at several other events.
Convolution of large 3D images on GPU and its decomposition

NASA Astrophysics Data System (ADS)

Karas, Pavel; Svoboda, David

2011-12-01

In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.
Flexible structure control experiments using a real-time workstation for computer-aided control engineering

NASA Technical Reports Server (NTRS)

Stieber, Michael E.

1989-01-01

A Real-Time Workstation for Computer-Aided Control Engineering has been developed jointly by the Communications Research Centre (CRC) and Ruhr-Universitaet Bochum (RUB), West Germany. The system is presently used for the development and experimental verification of control techniques for large space systems with significant structural flexibility. The Real-Time Workstation essentially is an implementation of RUB's extensive Computer-Aided Control Engineering package KEDDC on an INTEL micro-computer running under the RMS real-time operating system. The portable system supports system identification, analysis, control design and simulation, as well as the immediate implementation and test of control systems. The Real-Time Workstation is currently being used by CRC to study control/structure interaction on a ground-based structure called DAISY, whose design was inspired by a reflector antenna. DAISY emulates the dynamics of a large flexible spacecraft with the following characteristics: rigid body modes, many clustered vibration modes with low frequencies and extremely low damping. The Real-Time Workstation was found to be a very powerful tool for experimental studies, supporting control design and simulation, and conducting and evaluating tests withn one integrated environment.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images.

PubMed

Du, Xiaogang; Dang, Jianwu; Wang, Yangping; Wang, Song; Lei, Tao

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).
Scaling predictive modeling in drug development with cloud computing.

PubMed

Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola

2015-01-26

Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
Computing at the speed limit (supercomputers)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernhard, R.

1982-07-01

The author discusses how unheralded efforts in the United States, mainly in universities, have removed major stumbling blocks to building cost-effective superfast computers for scientific and engineering applications within five years. These computers would have sustained speeds of billions of floating-point operations per second (flops), whereas with the fastest machines today the top sustained speed is only 25 million flops, with bursts to 160 megaflops. Cost-effective superfast machines can be built because of advances in very large-scale integration and the special software needed to program the new machines. VLSI greatly reduces the cost per unit of computing power. The developmentmore » of such computers would come at an opportune time. Although the US leads the world in large-scale computer technology, its supremacy is now threatened, not surprisingly, by the Japanese. Publicized reports indicate that the Japanese government is funding a cooperative effort by commercial computer manufacturers to develop superfast computers-about 1000 times faster than modern supercomputers. The US computer industry, by contrast, has balked at attempting to boost computer power so sharply because of the uncertain market for the machines and the failure of similar projects in the past to show significant results.« less
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

NASA Astrophysics Data System (ADS)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.

Efficient Computation of Closed-loop Frequency Response for Large Order Flexible Systems

NASA Technical Reports Server (NTRS)

Maghami, Peiman G.; Giesy, Daniel P.

1997-01-01

An efficient and robust computational scheme is given for the calculation of the frequency response function of a large order, flexible system implemented with a linear, time invariant control system. Advantage is taken of the highly structured sparsity of the system matrix of the plant based on a model of the structure using normal mode coordinates. The computational time per frequency point of the new computational scheme is a linear function of system size, a significant improvement over traditional, full-matrix techniques whose computational times per frequency point range from quadratic to cubic functions of system size. This permits the practical frequency domain analysis of systems of much larger order than by traditional, full-matrix techniques. Formulations are given for both open and closed loop loop systems. Numerical examples are presented showing the advantages of the present formulation over traditional approaches, both in speed and in accuracy. Using a model with 703 structural modes, a speed-up of almost two orders of magnitude was observed while accuracy improved by up to 5 decimal places.
Megatux

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-09-25

The Megatux platform enables the emulation of large scale (multi-million node) distributed systems. In particular, it allows for the emulation of large-scale networks interconnecting a very large number of emulated computer systems. It does this by leveraging virtualization and associated technologies to allow hundreds of virtual computers to be hosted on a single moderately sized server or workstation. Virtualization technology provided by modern processors allows for multiple guest OSs to run at the same time, sharing the hardware resources. The Megatux platform can be deployed on a single PC, a small cluster of a few boxes or a large clustermore » of computers. With a modest cluster, the Megatux platform can emulate complex organizational networks. By using virtualization, we emulate the hardware, but run actual software enabling large scale without sacrificing fidelity.« less
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Parallelization of Finite Element Analysis Codes Using Heterogeneous Distributed Computing

NASA Technical Reports Server (NTRS)

Ozguner, Fusun

1996-01-01

Performance gains in computer design are quickly consumed as users seek to analyze larger problems to a higher degree of accuracy. Innovative computational methods, such as parallel and distributed computing, seek to multiply the power of existing hardware technology to satisfy the computational demands of large applications. In the early stages of this project, experiments were performed using two large, coarse-grained applications, CSTEM and METCAN. These applications were parallelized on an Intel iPSC/860 hypercube. It was found that the overall speedup was very low, due to large, inherently sequential code segments present in the applications. The overall execution time T(sub par), of the application is dependent on these sequential segments. If these segments make up a significant fraction of the overall code, the application will have a poor speedup measure.
Interactive graphical computer-aided design system

NASA Technical Reports Server (NTRS)

Edge, T. M.

1975-01-01

System is used for design, layout, and modification of large-scale-integrated (LSI) metal-oxide semiconductor (MOS) arrays. System is structured around small computer which provides real-time support for graphics storage display unit with keyboard, slave display unit, hard copy unit, and graphics tablet for designer/computer interface.
An Optimization Code for Nonlinear Transient Problems of a Large Scale Multidisciplinary Mathematical Model

NASA Astrophysics Data System (ADS)

Takasaki, Koichi

This paper presents a program for the multidisciplinary optimization and identification problem of the nonlinear model of large aerospace vehicle structures. The program constructs the global matrix of the dynamic system in the time direction by the p-version finite element method (pFEM), and the basic matrix for each pFEM node in the time direction is described by a sparse matrix similarly to the static finite element problem. The algorithm used by the program does not require the Hessian matrix of the objective function and so has low memory requirements. It also has a relatively low computational cost, and is suited to parallel computation. The program was integrated as a solver module of the multidisciplinary analysis system CUMuLOUS (Computational Utility for Multidisciplinary Large scale Optimization of Undense System) which is under development by the Aerospace Research and Development Directorate (ARD) of the Japan Aerospace Exploration Agency (JAXA).
A Pipeline for Large Data Processing Using Regular Sampling for Unstructured Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berres, Anne Sabine; Adhinarayanan, Vignesh; Turton, Terece

2017-05-12

Large simulation data requires a lot of time and computational resources to compute, store, analyze, visualize, and run user studies. Today, the largest cost of a supercomputer is not hardware but maintenance, in particular energy consumption. Our goal is to balance energy consumption and cognitive value of visualizations of resulting data. This requires us to go through the entire processing pipeline, from simulation to user studies. To reduce the amount of resources, data can be sampled or compressed. While this adds more computation time, the computational overhead is negligible compared to the simulation time. We built a processing pipeline atmore » the example of regular sampling. The reasons for this choice are two-fold: using a simple example reduces unnecessary complexity as we know what to expect from the results. Furthermore, it provides a good baseline for future, more elaborate sampling methods. We measured time and energy for each test we did, and we conducted user studies in Amazon Mechanical Turk (AMT) for a range of different results we produced through sampling.« less
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Peregrine Queue Changes | High-Performance Computing | NREL

Science.gov Websites

that the best path is to disable the large queue and move the nodes from the "large" queue to jobs that request a large number of nodes. The large queue was disabled during the October System time
Computer Programs for Library Operations; Results of a Survey Conducted Between Fall 1971 and Spring 1972.

ERIC Educational Resources Information Center

Liberman, Eva; And Others

Many library operations involving large data banks lend themselves readily to computer operation. In setting up library computer programs, in changing or expanding programs, cost in programming and time delays could be substantially reduced if the programmers had access to library computer programs being used by other libraries, providing similar…
An Approach to Integrate a Space-Time GIS Data Model with High Performance Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Dali; Zhao, Ziliang; Shaw, Shih-Lung

2011-01-01

In this paper, we describe an approach to integrate a Space-Time GIS data model on a high performance computing platform. The Space-Time GIS data model has been developed on a desktop computing environment. We use the Space-Time GIS data model to generate GIS module, which organizes a series of remote sensing data. We are in the process of porting the GIS module into an HPC environment, in which the GIS modules handle large dataset directly via parallel file system. Although it is an ongoing project, authors hope this effort can inspire further discussions on the integration of GIS on highmore » performance computing platforms.« less
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model

NASA Astrophysics Data System (ADS)

Kumar, M.; Duffy, C.

2006-05-01

Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
Operational flood control of a low-lying delta system using large time step Model Predictive Control

NASA Astrophysics Data System (ADS)

Tian, Xin; van Overloop, Peter-Jules; Negenborn, Rudy R.; van de Giesen, Nick

2015-01-01

The safety of low-lying deltas is threatened not only by riverine flooding but by storm-induced coastal flooding as well. For the purpose of flood control, these deltas are mostly protected in a man-made environment, where dikes, dams and other adjustable infrastructures, such as gates, barriers and pumps are widely constructed. Instead of always reinforcing and heightening these structures, it is worth considering making the most of the existing infrastructure to reduce the damage and manage the delta in an operational and overall way. In this study, an advanced real-time control approach, Model Predictive Control, is proposed to operate these structures in the Dutch delta system (the Rhine-Meuse delta). The application covers non-linearity in the dynamic behavior of the water system and the structures. To deal with the non-linearity, a linearization scheme is applied which directly uses the gate height instead of the structure flow as the control variable. Given the fact that MPC needs to compute control actions in real-time, we address issues regarding computational time. A new large time step scheme is proposed in order to save computation time, in which different control variables can have different control time steps. Simulation experiments demonstrate that Model Predictive Control with the large time step setting is able to control a delta system better and much more efficiently than the conventional operational schemes.
Using 3D infrared imaging to calibrate and refine computational fluid dynamic modeling for large computer and data centers

NASA Astrophysics Data System (ADS)

Stockton, Gregory R.

2011-05-01

Over the last 10 years, very large government, military, and commercial computer and data center operators have spent millions of dollars trying to optimally cool data centers as each rack has begun to consume as much as 10 times more power than just a few years ago. In fact, the maximum amount of data computation in a computer center is becoming limited by the amount of available power, space and cooling capacity at some data centers. Tens of millions of dollars and megawatts of power are being annually spent to keep data centers cool. The cooling and air flows dynamically change away from any predicted 3-D computational fluid dynamic modeling during construction and as time goes by, and the efficiency and effectiveness of the actual cooling rapidly departs even farther from predicted models. By using 3-D infrared (IR) thermal mapping and other techniques to calibrate and refine the computational fluid dynamic modeling and make appropriate corrections and repairs, the required power for data centers can be dramatically reduced which reduces costs and also improves reliability.
Vectorization of transport and diffusion computations on the CDC Cyber 205

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abu-Shumays, I.K.

1986-01-01

The development and testing of alternative numerical methods and computational algorithms specifically designed for the vectorization of transport and diffusion computations on a Control Data Corporation (CDC) Cyber 205 vector computer are described. Two solution methods for the discrete ordinates approximation to the transport equation are summarized and compared. Factors of 4 to 7 reduction in run times for certain large transport problems were achieved on a Cyber 205 as compared with run times on a CDC-7600. The solution of tridiagonal systems of linear equations, central to several efficient numerical methods for multidimensional diffusion computations and essential for fluid flowmore » and other physics and engineering problems, is also dealt with. Among the methods tested, a combined odd-even cyclic reduction and modified Cholesky factorization algorithm for solving linear symmetric positive definite tridiagonal systems is found to be the most effective for these systems on a Cyber 205. For large tridiagonal systems, computation with this algorithm is an order of magnitude faster on a Cyber 205 than computation with the best algorithm for tridiagonal systems on a CDC-7600.« less
Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
The Time on Task Effect in Reading and Problem Solving Is Moderated by Task Difficulty and Skill: Insights from a Computer-Based Large-Scale Assessment

ERIC Educational Resources Information Center

Goldhammer, Frank; Naumann, Johannes; Stelter, Annette; Tóth, Krisztina; Rölke, Heiko; Klieme, Eckhard

2014-01-01

Computer-based assessment can provide new insights into behavioral processes of task completion that cannot be uncovered by paper-based instruments. Time presents a major characteristic of the task completion process. Psychologically, time on task has 2 different interpretations, suggesting opposing associations with task outcome: Spending more…
Exploring Cloud Computing for Large-scale Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Guang; Han, Binh; Yin, Jian

This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
The performance of low-cost commercial cloud computing as an alternative in computational chemistry.

PubMed

Thackston, Russell; Fortenberry, Ryan C

2015-05-05

The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.

Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

PubMed

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
High-Resiliency and Auto-Scaling of Large-Scale Cloud Computing for OCO-2 L2 Full Physics Processing

NASA Astrophysics Data System (ADS)

Hua, H.; Manipon, G.; Starch, M.; Dang, L. B.; Southam, P.; Wilson, B. D.; Avis, C.; Chang, A.; Cheng, C.; Smyth, M.; McDuffie, J. L.; Ramirez, P.

2015-12-01

Next generation science data systems are needed to address the incoming flood of data from new missions such as SWOT and NISAR where data volumes and data throughput rates are order of magnitude larger than present day missions. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. We present our experiences on deploying a hybrid-cloud computing science data system (HySDS) for the OCO-2 Science Computing Facility to support large-scale processing of their Level-2 full physics data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer ~10X costs savings but with an unpredictable computing environment based on market forces. We will present how we enabled high-tolerance computing in order to achieve large-scale computing as well as operational cost savings.
Thermoelectric property measurements with computer controlled systems

NASA Technical Reports Server (NTRS)

Chmielewski, A. B.; Wood, C.

1984-01-01

A joint JPL-NASA program to develop an automated system to measure the thermoelectric properties of newly developed materials is described. Consideration is given to the difficulties created by signal drift in measurements of Hall voltage and the Large Delta T Seebeck coefficient. The benefits of a computerized system were examined with respect to error reduction and time savings for human operators. It is shown that the time required to measure Hall voltage can be reduced by a factor of 10 when a computer is used to fit a curve to the ratio of the measured signal and its standard deviation. The accuracy of measurements of the Large Delta T Seebeck coefficient and thermal diffusivity was also enhanced by the use of computers.
A Systematic Approach to Simulating Metabolism in Computational Toxicology. I. The Times Heuristic Modeling Framework

EPA Science Inventory

This paper presents a new system for automated 2D-3D migration of chemicals in large databases with conformer multiplication. The main advantages of this system are its straightforward performance, reasonable execution time, simplicity, and applicability to building large 3D che...
Fully integrated sub 100ps photon counting platform

NASA Astrophysics Data System (ADS)

Buckley, S. J.; Bellis, S. J.; Rosinger, P.; Jackson, J. C.

2007-02-01

Current state of the art high resolution counting modules, specifically designed for high timing resolution applications, are largely based on a computer card format. This has tended to result in a costly solution that is restricted to the computer it resides in. We describe a four channel timing module that interfaces to a computer via a USB port and operates with a resolution of less than 100 picoseconds. The core design of the system is an advanced field programmable gate array (FPGA) interfacing to a precision time interval measurement module, mass memory block and a high speed USB 2.0 serial data port. The FPGA design allows the module to operate in a number of modes allowing both continuous recording of photon events (time-tagging) and repetitive time binning. In time-tag mode the system reports, for each photon event, the high resolution time along with the chronological time (macro time) and the channel ID. The time-tags are uploaded in real time to a host computer via a high speed USB port allowing continuous storage to computer memory of up to 4 millions photons per second. In time-bin mode, binning is carried out with count rates up to 10 million photons per second. Each curve resides in a block of 128,000 time-bins each with a resolution programmable down to less than 100 picoseconds. Each bin has a limit of 65535 hits allowing autonomous curve recording until a bin reaches the maximum count or the system is commanded to halt. Due to the large memory storage, several curves/experiments can be stored in the system prior to uploading to the host computer for analysis. This makes this module ideal for integration into high timing resolution specific applications such as laser ranging and fluorescence lifetime imaging using techniques such as time correlated single photon counting (TCSPC).
The study on servo-control system in the large aperture telescope

NASA Astrophysics Data System (ADS)

Hu, Wei; Zhenchao, Zhang; Daxing, Wang

2008-08-01

Large astronomical telescope or extremely enormous astronomical telescope servo tracking technique will be one of crucial technology that must be solved in researching and manufacturing. To control technique feature of large astronomical telescope or extremely enormous astronomical telescope, this paper design a sort of large astronomical telescope servo tracking control system. This system composes a principal and subordinate distributed control system, host computer sends steering instruction and receive slave computer functional mode, slave computer accomplish control algorithm and execute real-time control. Large astronomical telescope servo control use direct drive machine, and adopt DSP technology to complete direct torque control algorithm, Such design can not only increase control system performance, but also greatly reduced volume and costs of control system, which has a significant occurrence. The system design scheme can be proved reasonably by calculating and simulating. This system can be applied to large astronomical telescope.
Multi-source Geospatial Data Analysis with Google Earth Engine

NASA Astrophysics Data System (ADS)

Erickson, T.

2014-12-01

The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org
Large Data at Small Universities: Astronomical processing using a computer classroom

NASA Astrophysics Data System (ADS)

Fuller, Nathaniel James; Clarkson, William I.; Fluharty, Bill; Belanger, Zach; Dage, Kristen

2016-06-01

The use of large computing clusters for astronomy research is becoming more commonplace as datasets expand, but access to these required resources is sometimes difficult for research groups working at smaller Universities. As an alternative to purchasing processing time on an off-site computing cluster, or purchasing dedicated hardware, we show how one can easily build a crude on-site cluster by utilizing idle cycles on instructional computers in computer-lab classrooms. Since these computers are maintained as part of the educational mission of the University, the resource impact on the investigator is generally low.By using open source Python routines, it is possible to have a large number of desktop computers working together via a local network to sort through large data sets. By running traditional analysis routines in an “embarrassingly parallel” manner, gains in speed are accomplished without requiring the investigator to learn how to write routines using highly specialized methodology. We demonstrate this concept here applied to 1. photometry of large-format images and 2. Statistical significance-tests for X-ray lightcurve analysis. In these scenarios, we see a speed-up factor which scales almost linearly with the number of cores in the cluster. Additionally, we show that the usage of the cluster does not severely limit performance for a local user, and indeed the processing can be performed while the computers are in use for classroom purposes.
An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

PubMed

Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

2015-09-01

Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images

PubMed Central

Wang, Yangping; Wang, Song

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU). PMID:28053653
Blading Design for Axial Turbomachines

DTIC Science & Technology

1989-05-01

three- dimensional, viscous computation systems appear to have a long development period ahead, in which fluid shear stress modeling and computation time ...and n directions and T is the shear stress , As a consequence the solution time is longer than for integral methods, dependent largely on thc accuracy of...distributions over airfoils is an adaptation of thin plate deflection theory from stress analysis. At the same time , it minimizes designer effort
Algorithms for Efficient Computation of Transfer Functions for Large Order Flexible Systems

NASA Technical Reports Server (NTRS)

Maghami, Peiman G.; Giesy, Daniel P.

1998-01-01

An efficient and robust computational scheme is given for the calculation of the frequency response function of a large order, flexible system implemented with a linear, time invariant control system. Advantage is taken of the highly structured sparsity of the system matrix of the plant based on a model of the structure using normal mode coordinates. The computational time per frequency point of the new computational scheme is a linear function of system size, a significant improvement over traditional, still-matrix techniques whose computational times per frequency point range from quadratic to cubic functions of system size. This permits the practical frequency domain analysis of systems of much larger order than by traditional, full-matrix techniques. Formulations are given for both open- and closed-loop systems. Numerical examples are presented showing the advantages of the present formulation over traditional approaches, both in speed and in accuracy. Using a model with 703 structural modes, the present method was up to two orders of magnitude faster than a traditional method. The present method generally showed good to excellent accuracy throughout the range of test frequencies, while traditional methods gave adequate accuracy for lower frequencies, but generally deteriorated in performance at higher frequencies with worst case errors being many orders of magnitude times the correct values.
Integral equation methods for computing likelihoods and their derivatives in the stochastic integrate-and-fire model.

PubMed

Paninski, Liam; Haith, Adrian; Szirtes, Gabor

2008-02-01

We recently introduced likelihood-based methods for fitting stochastic integrate-and-fire models to spike train data. The key component of this method involves the likelihood that the model will emit a spike at a given time t. Computing this likelihood is equivalent to computing a Markov first passage time density (the probability that the model voltage crosses threshold for the first time at time t). Here we detail an improved method for computing this likelihood, based on solving a certain integral equation. This integral equation method has several advantages over the techniques discussed in our previous work: in particular, the new method has fewer free parameters and is easily differentiable (for gradient computations). The new method is also easily adaptable for the case in which the model conductance, not just the input current, is time-varying. Finally, we describe how to incorporate large deviations approximations to very small likelihoods.
Cellular computational generalized neuron network for frequency situational intelligence in a multi-machine power system.

PubMed

Wei, Yawei; Venayagamoorthy, Ganesh Kumar

2017-09-01

To prevent large interconnected power system from a cascading failure, brownout or even blackout, grid operators require access to faster than real-time information to make appropriate just-in-time control decisions. However, the communication and computational system limitations of currently used supervisory control and data acquisition (SCADA) system can only deliver delayed information. However, the deployment of synchrophasor measurement devices makes it possible to capture and visualize, in near-real-time, grid operational data with extra granularity. In this paper, a cellular computational network (CCN) approach for frequency situational intelligence (FSI) in a power system is presented. The distributed and scalable computing unit of the CCN framework makes it particularly flexible for customization for a particular set of prediction requirements. Two soft-computing algorithms have been implemented in the CCN framework: a cellular generalized neuron network (CCGNN) and a cellular multi-layer perceptron network (CCMLPN), for purposes of providing multi-timescale frequency predictions, ranging from 16.67 ms to 2 s. These two developed CCGNN and CCMLPN systems were then implemented on two different scales of power systems, one of which installed a large photovoltaic plant. A real-time power system simulator at weather station within the Real-Time Power and Intelligent Systems (RTPIS) laboratory at Clemson, SC, was then used to derive typical FSI results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
GAP Noise Computation By The CE/SE Method

NASA Technical Reports Server (NTRS)

Loh, Ching Y.; Chang, Sin-Chung; Wang, Xiao Y.; Jorgenson, Philip C. E.

2001-01-01

A typical gap noise problem is considered in this paper using the new space-time conservation element and solution element (CE/SE) method. Implementation of the computation is straightforward. No turbulence model, LES (large eddy simulation) or a preset boundary layer profile is used, yet the computed frequency agrees well with the experimental one.
Computer Augmented Lectures (CAL): A New Teaching Technique for Chemistry.

ERIC Educational Resources Information Center

Masten, F. A.; And Others

A new technique described as computer augmented lectures (CAL) is being used at the University of Texas at Austin. It involves the integration of on-line, interactive, time sharing computer terminals and theater size video projectors for large screen display. This paper covers the basic concept, pedagogical techniques, experiments conducted,…
Computational procedure for finite difference solution of one-dimensional heat conduction problems reduces computer time

NASA Technical Reports Server (NTRS)

Iida, H. T.

1966-01-01

Computational procedure reduces the numerical effort whenever the method of finite differences is used to solve ablation problems for which the surface recession is large relative to the initial slab thickness. The number of numerical operations required for a given maximum space mesh size is reduced.
Optimization and large scale computation of an entropy-based moment closure

NASA Astrophysics Data System (ADS)

Kristopher Garrett, C.; Hauck, Cory; Hill, Judith

2015-12-01

We present computational advances and results in the implementation of an entropy-based moment closure, MN, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as PN, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which are used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. These results show, in particular, load balancing issues in scaling the MN algorithm that do not appear for the PN algorithm. We also observe that in weak scaling tests, the ratio in time to solution of MN to PN decreases.
Optimization and large scale computation of an entropy-based moment closure

DOE PAGES

Hauck, Cory D.; Hill, Judith C.; Garrett, C. Kristopher

2015-09-10

We present computational advances and results in the implementation of an entropy-based moment closure, M N, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as P N, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which aremore » used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. Lastly, these results show, in particular, load balancing issues in scaling the M N algorithm that do not appear for the P N algorithm. We also observe that in weak scaling tests, the ratio in time to solution of M N to P N decreases.« less

Computations on Wings With Full-Span Oscillating Control Surfaces Using Navier-Stokes Equations

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.

2013-01-01

A dual-level parallel procedure is presented for computing large databases to support aerospace vehicle design. This procedure has been developed as a single Unix script within the Parallel Batch Submission environment utilizing MPIexec and runs MPI based analysis software. It has been developed to provide a process for aerospace designers to generate data for large numbers of cases with the highest possible fidelity and reasonable wall clock time. A single job submission environment has been created to avoid keeping track of multiple jobs and the associated system administration overhead. The process has been demonstrated for computing large databases for the design of typical aerospace configurations, a launch vehicle and a rotorcraft.
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies

PubMed Central

Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang

2008-01-01

Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
ACCURATE CHEMICAL MASTER EQUATION SOLUTION USING MULTI-FINITE BUFFERS

PubMed Central

Cao, Youfang; Terebus, Anna; Liang, Jie

2016-01-01

The discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multi-scale nature of many networks where reaction rates have large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate solutions can be computed. It is also important to know if all major probabilistic peaks have been computed. Here we introduce the Accurate CME (ACME) algorithm for obtaining direct solutions to dCMEs. With multi-finite buffers for reducing the state space by O(n!), exact steady-state and time-evolving network probability landscapes can be computed. We further describe a theoretical framework of aggregating microstates into a smaller number of macrostates by decomposing a network into independent aggregated birth and death processes, and give an a priori method for rapidly determining steady-state truncation errors. The maximal sizes of the finite buffers for a given error tolerance can also be pre-computed without costly trial solutions of dCMEs. We show exactly computed probability landscapes of three multi-scale networks, namely, a 6-node toggle switch, 11-node phage-lambda epigenetic circuit, and 16-node MAPK cascade network, the latter two with no known solutions. We also show how probabilities of rare events can be computed from first-passage times, another class of unsolved problems challenging for simulation-based techniques due to large separations in time scales. Overall, the ACME method enables accurate and efficient solutions of the dCME for a large class of networks. PMID:27761104
Large Scale Traffic Simulations

DOT National Transportation Integrated Search

1997-01-01

Large scale microscopic (i.e. vehicle-based) traffic simulations pose high demands on computation speed in at least two application areas: (i) real-time traffic forecasting, and (ii) long-term planning applications (where repeated "looping" between t...
Efficient Memory Access with NumPy Global Arrays using Local Memory Access

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.; Berghofer, Dan C.

This paper discusses the work completed working with Global Arrays of data on distributed multi-computer systems and improving their performance. The tasks completed were done at Pacific Northwest National Laboratory in the Science Undergrad Laboratory Internship program in the summer of 2013 for the Data Intensive Computing Group in the Fundamental and Computational Sciences DIrectorate. This work was done on the Global Arrays Toolkit developed by this group. This toolkit is an interface for programmers to more easily create arrays of data on networks of computers. This is useful because scientific computation is often done on large amounts of datamore » sometimes so large that individual computers cannot hold all of it. This data is held in array form and can best be processed on supercomputers which often consist of a network of individual computers doing their computation in parallel. One major challenge for this sort of programming is that operations on arrays on multiple computers is very complex and an interface is needed so that these arrays seem like they are on a single computer. This is what global arrays does. The work done here is to use more efficient operations on that data that requires less copying of data to be completed. This saves a lot of time because copying data on many different computers is time intensive. The way this challenge was solved is when data to be operated on with binary operations are on the same computer, they are not copied when they are accessed. When they are on separate computers, only one set is copied when accessed. This saves time because of less copying done although more data access operations were done.« less
The Application of Special Computing Techniques to Speed-Up Image Feature Extraction and Processing Techniques.

DTIC Science & Technology

1981-12-01

ocessors has led to the possibility of implementing a large number of image processing functions in near real time . ~CC~ jnro _ j:% UNLSSFE (b-.YC ASIIAINO...to the possibility of implementing a large number of image processing functions in near " real - time ," a result which is essential to establishing a...for example, and S) rapid image handling for near real - time in- teraction by a user at a display. For example, for a large resolution image, say
OpenMP parallelization of a gridded SWAT (SWATG)

NASA Astrophysics Data System (ADS)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Analysis of Computer Network Information Based on "Big Data"

NASA Astrophysics Data System (ADS)

Li, Tianli

2017-11-01

With the development of the current era, computer network and large data gradually become part of the people's life, people use the computer to provide convenience for their own life, but at the same time there are many network information problems has to pay attention. This paper analyzes the information security of computer network based on "big data" analysis, and puts forward some solutions.
Mira: Argonne's 10-petaflops supercomputer

ScienceCinema

Papka, Michael; Coghlan, Susan; Isaacs, Eric; Peters, Mark; Messina, Paul

2018-02-13

Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Mira: Argonne's 10-petaflops supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Papka, Michael; Coghlan, Susan; Isaacs, Eric

2013-07-03

Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Direct Methods for Predicting Movement Biomechanics Based Upon Optimal Control Theory with Implementation in OpenSim.

PubMed

Porsa, Sina; Lin, Yi-Chung; Pandy, Marcus G

2016-08-01

The aim of this study was to compare the computational performances of two direct methods for solving large-scale, nonlinear, optimal control problems in human movement. Direct shooting and direct collocation were implemented on an 8-segment, 48-muscle model of the body (24 muscles on each side) to compute the optimal control solution for maximum-height jumping. Both algorithms were executed on a freely-available musculoskeletal modeling platform called OpenSim. Direct collocation converged to essentially the same optimal solution up to 249 times faster than direct shooting when the same initial guess was assumed (3.4 h of CPU time for direct collocation vs. 35.3 days for direct shooting). The model predictions were in good agreement with the time histories of joint angles, ground reaction forces and muscle activation patterns measured for subjects jumping to their maximum achievable heights. Both methods converged to essentially the same solution when started from the same initial guess, but computation time was sensitive to the initial guess assumed. Direct collocation demonstrates exceptional computational performance and is well suited to performing predictive simulations of movement using large-scale musculoskeletal models.
A Review of High-Performance Computational Strategies for Modeling and Imaging of Electromagnetic Induction Data

NASA Astrophysics Data System (ADS)

Newman, Gregory A.

2014-01-01

Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Global Static Indexing for Real-Time Exploration of Very Large Regular Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pascucci, V; Frank, R

2001-07-23

In this paper we introduce a new indexing scheme for progressive traversal and visualization of large regular grids. We demonstrate the potential of our approach by providing a tool that displays at interactive rates planar slices of scalar field data with very modest computing resources. We obtain unprecedented results both in terms of absolute performance and, more importantly, in terms of scalability. On a laptop computer we provide real time interaction with a 2048{sup 3} grid (8 Giga-nodes) using only 20MB of memory. On an SGI Onyx we slice interactively an 8192{sup 3} grid (1/2 tera-nodes) using only 60MB ofmore » memory. The scheme relies simply on the determination of an appropriate reordering of the rectilinear grid data and a progressive construction of the output slice. The reordering minimizes the amount of I/O performed during the out-of-core computation. The progressive and asynchronous computation of the output provides flexible quality/speed tradeoffs and a time-critical and interruptible user interface.« less
Time-Shifted Boundary Conditions Used for Navier-Stokes Aeroelastic Solver

NASA Technical Reports Server (NTRS)

Srivastava, Rakesh

1999-01-01

Under the Advanced Subsonic Technology (AST) Program, an aeroelastic analysis code (TURBO-AE) based on Navier-Stokes equations is currently under development at NASA Lewis Research Center s Machine Dynamics Branch. For a blade row, aeroelastic instability can occur in any of the possible interblade phase angles (IBPA s). Analyzing small IBPA s is very computationally expensive because a large number of blade passages must be simulated. To reduce the computational cost of these analyses, we used time shifted, or phase-lagged, boundary conditions in the TURBO-AE code. These conditions can be used to reduce the computational domain to a single blade passage by requiring the boundary conditions across the passage to be lagged depending on the IBPA being analyzed. The time-shifted boundary conditions currently implemented are based on the direct-store method. This method requires large amounts of data to be stored over a period of the oscillation cycle. On CRAY computers this is not a major problem because solid-state devices can be used for fast input and output to read and write the data onto a disk instead of storing it in core memory.
Changing from computing grid to knowledge grid in life-science grid.

PubMed

Talukdar, Veera; Konar, Amit; Datta, Ayan; Choudhury, Anamika Roy

2009-09-01

Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Towards Scalable Graph Computation on Mobile Devices.

PubMed

Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng

2014-10-01

Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices

PubMed Central

Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng

2015-01-01

Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
Large Spatial Scale Ground Displacement Mapping through the P-SBAS Processing of Sentinel-1 Data on a Cloud Computing Environment

NASA Astrophysics Data System (ADS)

Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.

2017-12-01

Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
GPU-accelerated element-free reverse-time migration with Gauss points partition

NASA Astrophysics Data System (ADS)

Zhou, Zhen; Jia, Xiaofeng; Qiang, Xiaodong

2018-06-01

An element-free method (EFM) has been demonstrated successfully in elasticity, heat conduction and fatigue crack growth problems. We present the theory of EFM and its numerical applications in seismic modelling and reverse time migration (RTM). Compared with the finite difference method and the finite element method, the EFM has unique advantages: (1) independence of grids in computation and (2) lower expense and more flexibility (because only the information of the nodes and the boundary of the concerned area is required). However, in EFM, due to improper computation and storage of some large sparse matrices, such as the mass matrix and the stiffness matrix, the method is difficult to apply to seismic modelling and RTM for a large velocity model. To solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition and utilise the graphics processing unit to improve the computational efficiency. We employ the compressed sparse row format to compress the intermediate large sparse matrices and attempt to simplify the operations by solving the linear equations with CULA solver. To improve the computation efficiency further, we introduce the concept of the lumped mass matrix. Numerical experiments indicate that the proposed method is accurate and more efficient than the regular EFM.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

2017-01-28

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Dynamic displacement measurement of large-scale structures based on the Lucas-Kanade template tracking algorithm

NASA Astrophysics Data System (ADS)

Guo, Jie; Zhu, Chang`an

2016-01-01

The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
Assessment of time-dependent density functional theory with the restricted excitation space approximation for excited state calculations of large systems

NASA Astrophysics Data System (ADS)

Hanson-Heine, Magnus W. D.; George, Michael W.; Besley, Nicholas A.

2018-06-01

The restricted excitation subspace approximation is explored as a basis to reduce the memory storage required in linear response time-dependent density functional theory (TDDFT) calculations within the Tamm-Dancoff approximation. It is shown that excluding the core orbitals and up to 70% of the virtual orbitals in the construction of the excitation subspace does not result in significant changes in computed UV/vis spectra for large molecules. The reduced size of the excitation subspace greatly reduces the size of the subspace vectors that need to be stored when using the Davidson procedure to determine the eigenvalues of the TDDFT equations. Furthermore, additional screening of the two-electron integrals in combination with a reduction in the size of the numerical integration grid used in the TDDFT calculation leads to significant computational savings. The use of these approximations represents a simple approach to extend TDDFT to the study of large systems and make the calculations increasingly tractable using modest computing resources.
A Parallel Sliding Region Algorithm to Make Agent-Based Modeling Possible for a Large-Scale Simulation: Modeling Hepatitis C Epidemics in Canada.

PubMed

Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla

2016-11-01

Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
Two-Level Chebyshev Filter Based Complementary Subspace Method: Pushing the Envelope of Large-Scale Electronic Structure Calculations.

PubMed

Banerjee, Amartya S; Lin, Lin; Suryanarayana, Phanish; Yang, Chao; Pask, John E

2018-06-12

We describe a novel iterative strategy for Kohn-Sham density functional theory calculations aimed at large systems (>1,000 electrons), applicable to metals and insulators alike. In lieu of explicit diagonalization of the Kohn-Sham Hamiltonian on every self-consistent field (SCF) iteration, we employ a two-level Chebyshev polynomial filter based complementary subspace strategy to (1) compute a set of vectors that span the occupied subspace of the Hamiltonian; (2) reduce subspace diagonalization to just partially occupied states; and (3) obtain those states in an efficient, scalable manner via an inner Chebyshev filter iteration. By reducing the necessary computation to just partially occupied states and obtaining these through an inner Chebyshev iteration, our approach reduces the cost of large metallic calculations significantly, while eliminating subspace diagonalization for insulating systems altogether. We describe the implementation of the method within the framework of the discontinuous Galerkin (DG) electronic structure method and show that this results in a computational scheme that can effectively tackle bulk and nano systems containing tens of thousands of electrons, with chemical accuracy, within a few minutes or less of wall clock time per SCF iteration on large-scale computing platforms. We anticipate that our method will be instrumental in pushing the envelope of large-scale ab initio molecular dynamics. As a demonstration of this, we simulate a bulk silicon system containing 8,000 atoms at finite temperature, and obtain an average SCF step wall time of 51 s on 34,560 processors; thus allowing us to carry out 1.0 ps of ab initio molecular dynamics in approximately 28 h (of wall time).
An Interactive, Versatile, Three-Dimensional Display, Manipulation and Plotting System for Biomedical Research

ERIC Educational Resources Information Center

Feldmann, Richard J.; And Others

1972-01-01

Computer graphics provides a valuable tool for the representation and a better understanding of structures, both small and large. Accurate and rapid construction, manipulation, and plotting of structures, such as macromolecules as complex as hemoglobin, are performed by a collection of computer programs and a time-sharing computer. (21 references)…
Supercomputer optimizations for stochastic optimal control applications

NASA Technical Reports Server (NTRS)

Chung, Siu-Leung; Hanson, Floyd B.; Xu, Huihuang

1991-01-01

Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations.
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

PubMed Central

Afshar, Yaser; Sbalzarini, Ivo F.

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

PubMed

Afshar, Yaser; Sbalzarini, Ivo F

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Use of cloud computing in biomedicine.

PubMed

Sobeslav, Vladimir; Maresova, Petra; Krejcar, Ondrej; Franca, Tanos C C; Kuca, Kamil

2016-12-01

Nowadays, biomedicine is characterised by a growing need for processing of large amounts of data in real time. This leads to new requirements for information and communication technologies (ICT). Cloud computing offers a solution to these requirements and provides many advantages, such as cost savings, elasticity and scalability of using ICT. The aim of this paper is to explore the concept of cloud computing and the related use of this concept in the area of biomedicine. Authors offer a comprehensive analysis of the implementation of the cloud computing approach in biomedical research, decomposed into infrastructure, platform and service layer, and a recommendation for processing large amounts of data in biomedicine. Firstly, the paper describes the appropriate forms and technological solutions of cloud computing. Secondly, the high-end computing paradigm of cloud computing aspects is analysed. Finally, the potential and current use of applications in scientific research of this technology in biomedicine is discussed.
The evolution of computer monitoring of real time data during the Atlas Centaur launch countdown

NASA Technical Reports Server (NTRS)

Thomas, W. F.

1981-01-01

In the last decade, improvements in computer technology have provided new 'tools' for controlling and monitoring critical missile systems. In this connection, computers have gradually taken a large role in monitoring all flights and ground systems on the Atlas Centaur. The wide body Centaur which will be launched in the Space Shuttle Cargo Bay will use computers to an even greater extent. It is planned to use the wide body Centaur to boost the Galileo spacecraft toward Jupiter in 1985. The critical systems which must be monitored prior to liftoff are examined. Computers have now been programmed to monitor all critical parameters continuously. At this time, there are two separate computer systems used to monitor these parameters.
Time-Resolved C-Arm Computed Tomographic Angiography Derived From Computed Tomographic Perfusion Acquisition: New Capability for One-Stop-Shop Acute Ischemic Stroke Treatment in the Angiosuite.

PubMed

Yang, Pengfei; Niu, Kai; Wu, Yijing; Struffert, Tobias; Dorfler, Arnd; Schafer, Sebastian; Royalty, Kevin; Strother, Charles; Chen, Guang-Hong

2015-12-01

Multimodal imaging using cone beam C-arm computed tomography (CT) may shorten the delay from ictus to revascularization for acute ischemic stroke patients with a large vessel occlusion. Largely because of limited temporal resolution, reconstruction of time-resolved CT angiography (CTA) from these systems has not yielded satisfactory results. We evaluated the image quality and diagnostic value of time-resolved C-arm CTA reconstructed using novel image processing algorithms. Studies were done under an Institutional Review Board approved protocol. Postprocessing of data from 21 C-arm CT dynamic perfusion acquisitions from 17 patients with acute ischemic stroke were done to derive time-resolved C-arm CTA images. Two observers independently evaluated image quality and diagnostic content for each case. ICC and receiver-operating characteristic analysis were performed to evaluate interobserver agreement and diagnostic value of this novel imaging modality. Time-resolved C-arm CTA images were successfully generated from 20 data sets (95.2%, 20/21). Two observers agreed well that the image quality for large cerebral arteries was good but was more limited for small cerebral arteries (distal to M1, A1, and P1). receiver-operating characteristic curves demonstrated excellent diagnostic value for detecting large vessel occlusions (area under the curve=0.987-1). Time-resolved CTAs derived from C-arm CT perfusion acquisitions provide high quality images that allowed accurate diagnosis of large vessel occlusions. Although image quality of smaller arteries in this study was not optimal ongoing modifications of the postprocessing algorithm will likely remove this limitation. Adding time-resolved C-arm CTAs to the capabilities of the angiography suite further enhances its suitability as a one-stop shop for care for patients with acute ischemic stroke. © 2015 American Heart Association, Inc.
Polynomial complexity despite the fermionic sign

NASA Astrophysics Data System (ADS)

Rossi, R.; Prokof'ev, N.; Svistunov, B.; Van Houcke, K.; Werner, F.

2017-04-01

It is commonly believed that in unbiased quantum Monte Carlo approaches to fermionic many-body problems, the infamous sign problem generically implies prohibitively large computational times for obtaining thermodynamic-limit quantities. We point out that for convergent Feynman diagrammatic series evaluated with a recently introduced Monte Carlo algorithm (see Rossi R., arXiv:1612.05184), the computational time increases only polynomially with the inverse error on thermodynamic-limit quantities.
Web-Based Real Time Earthquake Forecasting and Personal Risk Management

NASA Astrophysics Data System (ADS)

Rundle, J. B.; Holliday, J. R.; Graves, W. R.; Turcotte, D. L.; Donnellan, A.

2012-12-01

Earthquake forecasts have been computed by a variety of countries and economies world-wide for over two decades. For the most part, forecasts have been computed for insurance, reinsurance and underwriters of catastrophe bonds. One example is the Working Group on California Earthquake Probabilities that has been responsible for the official California earthquake forecast since 1988. However, in a time of increasingly severe global financial constraints, we are now moving inexorably towards personal risk management, wherein mitigating risk is becoming the responsibility of individual members of the public. Under these circumstances, open access to a variety of web-based tools, utilities and information is a necessity. Here we describe a web-based system that has been operational since 2009 at www.openhazards.com and www.quakesim.org. Models for earthquake physics and forecasting require input data, along with model parameters. The models we consider are the Natural Time Weibull (NTW) model for regional earthquake forecasting, together with models for activation and quiescence. These models use small earthquakes ('seismicity-based models") to forecast the occurrence of large earthquakes, either through varying rates of small earthquake activity, or via an accumulation of this activity over time. These approaches use data-mining algorithms combined with the ANSS earthquake catalog. The basic idea is to compute large earthquake probabilities using the number of small earthquakes that have occurred in a region since the last large earthquake. Each of these approaches has computational challenges associated with computing forecast information in real time. Using 25 years of data from the ANSS California-Nevada catalog of earthquakes, we show that real-time forecasting is possible at a grid scale of 0.1o. We have analyzed the performance of these models using Reliability/Attributes and standard Receiver Operating Characteristic (ROC) tests. We show how the Reliability and ROC tests allow us to judge data completeness and estimate error. It is clear from much of the analysis that data quality is a major limitation on the accurate computation of earthquake probabilities. We discuss the challenges and pitfalls in serving up these datasets over the web.
Toward an optimal solver for time-spectral fluid-dynamic and aeroelastic solutions on unstructured meshes

NASA Astrophysics Data System (ADS)

Mundis, Nathan L.; Mavriplis, Dimitri J.

2017-09-01

The time-spectral method applied to the Euler and coupled aeroelastic equations theoretically offers significant computational savings for purely periodic problems when compared to standard time-implicit methods. However, attaining superior efficiency with time-spectral methods over traditional time-implicit methods hinges on the ability rapidly to solve the large non-linear system resulting from time-spectral discretizations which become larger and stiffer as more time instances are employed or the period of the flow becomes especially short (i.e. the maximum resolvable wave-number increases). In order to increase the efficiency of these solvers, and to improve robustness, particularly for large numbers of time instances, the Generalized Minimal Residual Method (GMRES) is used to solve the implicit linear system over all coupled time instances. The use of GMRES as the linear solver makes time-spectral methods more robust, allows them to be applied to a far greater subset of time-accurate problems, including those with a broad range of harmonic content, and vastly improves the efficiency of time-spectral methods. In previous work, a wave-number independent preconditioner that mitigates the increased stiffness of the time-spectral method when applied to problems with large resolvable wave numbers has been developed. This preconditioner, however, directly inverts a large matrix whose size increases in proportion to the number of time instances. As a result, the computational time of this method scales as the cube of the number of time instances. In the present work, this preconditioner has been reworked to take advantage of an approximate-factorization approach that effectively decouples the spatial and temporal systems. Once decoupled, the time-spectral matrix can be inverted in frequency space, where it has entries only on the main diagonal and therefore can be inverted quite efficiently. This new GMRES/preconditioner combination is shown to be over an order of magnitude more efficient than the previous wave-number independent preconditioner for problems with large numbers of time instances and/or large reduced frequencies.
Fast computation of an optimal controller for large-scale adaptive optics.

PubMed

Massioni, Paolo; Kulcsár, Caroline; Raynaud, Henri-François; Conan, Jean-Marc

2011-11-01

The linear quadratic Gaussian regulator provides the minimum-variance control solution for a linear time-invariant system. For adaptive optics (AO) applications, under the hypothesis of a deformable mirror with instantaneous response, such a controller boils down to a minimum-variance phase estimator (a Kalman filter) and a projection onto the mirror space. The Kalman filter gain can be computed by solving an algebraic Riccati matrix equation, whose computational complexity grows very quickly with the size of the telescope aperture. This "curse of dimensionality" makes the standard solvers for Riccati equations very slow in the case of extremely large telescopes. In this article, we propose a way of computing the Kalman gain for AO systems by means of an approximation that considers the turbulence phase screen as the cropped version of an infinite-size screen. We demonstrate the advantages of the methods for both off- and on-line computational time, and we evaluate its performance for classical AO as well as for wide-field tomographic AO with multiple natural guide stars. Simulation results are reported.
French Meteor Network for High Precision Orbits of Meteoroids

NASA Technical Reports Server (NTRS)

Atreya, P.; Vaubaillon, J.; Colas, F.; Bouley, S.; Gaillard, B.; Sauli, I.; Kwon, M. K.

2011-01-01

There is a lack of precise meteoroids orbit from video observations as most of the meteor stations use off-the-shelf CCD cameras. Few meteoroids orbit with precise semi-major axis are available using film photographic method. Precise orbits are necessary to compute the dust flux in the Earth s vicinity, and to estimate the ejection time of the meteoroids accurately by comparing them with the theoretical evolution model. We investigate the use of large CCD sensors to observe multi-station meteors and to compute precise orbit of these meteoroids. An ideal spatial and temporal resolution to get an accuracy to those similar of photographic plates are discussed. Various problems faced due to the use of large CCD, such as increasing the spatial and the temporal resolution at the same time and computational problems in finding the meteor position are illustrated.
Synopsis of a computer program designed to interface a personal computer with the fast data acquisition system of a time-of-flight mass spectrometer

NASA Technical Reports Server (NTRS)

Bechtel, R. D.; Mateos, M. A.; Lincoln, K. A.

1988-01-01

Briefly described are the essential features of a computer program designed to interface a personal computer with the fast, digital data acquisition system of a time-of-flight mass spectrometer. The instrumentation was developed to provide a time-resolved analysis of individual vapor pulses produced by the incidence of a pulsed laser beam on an ablative material. The high repetition rate spectrometer coupled to a fast transient recorder captures complete mass spectra every 20 to 35 microsecs, thereby providing the time resolution needed for the study of this sort of transient event. The program enables the computer to record the large amount of data generated by the system in short time intervals, and it provides the operator the immediate option of presenting the spectral data in several different formats. Furthermore, the system does this with a high degree of automation, including the tasks of mass labeling the spectra and logging pertinent instrumental parameters.
Computer use changes generalization of movement learning.

PubMed

Wei, Kunlin; Yan, Xiang; Kong, Gaiqing; Yin, Cong; Zhang, Fan; Wang, Qining; Kording, Konrad Paul

2014-01-06

Over the past few decades, one of the most salient lifestyle changes for us has been the use of computers. For many of us, manual interaction with a computer occupies a large portion of our working time. Through neural plasticity, this extensive movement training should change our representation of movements (e.g., [1-3]), just like search engines affect memory [4]. However, how computer use affects motor learning is largely understudied. Additionally, as virtually all participants in studies of perception and actions are computer users, a legitimate question is whether insights from these studies bear the signature of computer-use experience. We compared non-computer users with age- and education-matched computer users in standard motor learning experiments. We found that people learned equally fast but that non-computer users generalized significantly less across space, a difference negated by two weeks of intensive computer training. Our findings suggest that computer-use experience shaped our basic sensorimotor behaviors, and this influence should be considered whenever computer users are recruited as study participants. Copyright © 2014 Elsevier Ltd. All rights reserved.
Development of Computational Aeroacoustics Code for Jet Noise and Flow Prediction

NASA Astrophysics Data System (ADS)

Keith, Theo G., Jr.; Hixon, Duane R.

2002-07-01

Accurate prediction of jet fan and exhaust plume flow and noise generation and propagation is very important in developing advanced aircraft engines that will pass current and future noise regulations. In jet fan flows as well as exhaust plumes, two major sources of noise are present: large-scale, coherent instabilities and small-scale turbulent eddies. In previous work for the NASA Glenn Research Center, three strategies have been explored in an effort to computationally predict the noise radiation from supersonic jet exhaust plumes. In order from the least expensive computationally to the most expensive computationally, these are: 1) Linearized Euler equations (LEE). 2) Very Large Eddy Simulations (VLES). 3) Large Eddy Simulations (LES). The first method solves the linearized Euler equations (LEE). These equations are obtained by linearizing about a given mean flow and the neglecting viscous effects. In this way, the noise from large-scale instabilities can be found for a given mean flow. The linearized Euler equations are computationally inexpensive, and have produced good noise results for supersonic jets where the large-scale instability noise dominates, as well as for the tone noise from a jet engine blade row. However, these linear equations do not predict the absolute magnitude of the noise; instead, only the relative magnitude is predicted. Also, the predicted disturbances do not modify the mean flow, removing a physical mechanism by which the amplitude of the disturbance may be controlled. Recent research for isolated airfoils' indicates that this may not affect the solution greatly at low frequencies. The second method addresses some of the concerns raised by the LEE method. In this approach, called Very Large Eddy Simulation (VLES), the unsteady Reynolds averaged Navier-Stokes equations are solved directly using a high-accuracy computational aeroacoustics numerical scheme. With the addition of a two-equation turbulence model and the use of a relatively coarse grid, the numerical solution is effectively filtered into a directly calculated mean flow with the small-scale turbulence being modeled, and an unsteady large-scale component that is also being directly calculated. In this way, the unsteady disturbances are calculated in a nonlinear way, with a direct effect on the mean flow. This method is not as fast as the LEE approach, but does have many advantages to recommend it; however, like the LEE approach, only the effect of the largest unsteady structures will be captured. An initial calculation was performed on a supersonic jet exhaust plume, with promising results, but the calculation was hampered by the explicit time marching scheme that was employed. This explicit scheme required a very small time step to resolve the nozzle boundary layer, which caused a long run time. Current work is focused on testing a lower-order implicit time marching method to combat this problem.

Fast algorithms for computing phylogenetic divergence time.

PubMed

Crosby, Ralph W; Williams, Tiffani L

2017-12-06

The inference of species divergence time is a key step in most phylogenetic studies. Methods have been available for the last ten years to perform the inference, but the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. For example a study of 349 primate taxa was estimated to require over 9 months of processing time. In this work, we present a new algorithm, AncestralAge, that significantly improves the performance of the divergence time process. As part of AncestralAge, we demonstrate a new method for the computation of phylogenetic likelihood and our experiments show a 90% improvement in likelihood computation time on the aforementioned dataset of 349 primates taxa with over 60,000 DNA base pairs. Additionally, we show that our new method for the computation of the Bayesian prior on node ages reduces the running time for this computation on the 349 taxa dataset by 99%. Through the use of these new algorithms we open up the ability to perform divergence time inference on large phylogenetic studies.
Large scale cardiac modeling on the Blue Gene supercomputer.

PubMed

Reumann, Matthias; Fitch, Blake G; Rayshubskiy, Aleksandr; Keller, David U; Weiss, Daniel L; Seemann, Gunnar; Dössel, Olaf; Pitman, Michael C; Rice, John J

2008-01-01

Multi-scale, multi-physical heart models have not yet been able to include a high degree of accuracy and resolution with respect to model detail and spatial resolution due to computational limitations of current systems. We propose a framework to compute large scale cardiac models. Decomposition of anatomical data in segments to be distributed on a parallel computer is carried out by optimal recursive bisection (ORB). The algorithm takes into account a computational load parameter which has to be adjusted according to the cell models used. The diffusion term is realized by the monodomain equations. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Heterogeneous anisotropy was included in the computation. Model weights as input for the decomposition and load balancing were set to (a) 1 for tissue and 0 for non-tissue elements; (b) 10 for tissue and 1 for non-tissue elements. Scaling results for 512, 1024, 2048, 4096 and 8192 computational nodes were obtained for 10 ms simulation time. The simulations were carried out on an IBM Blue Gene/L parallel computer. A 1 s simulation was then carried out on 2048 nodes for the optimal model load. Load balances did not differ significantly across computational nodes even if the number of data elements distributed to each node differed greatly. Since the ORB algorithm did not take into account computational load due to communication cycles, the speedup is close to optimal for the computation time but not optimal overall due to the communication overhead. However, the simulation times were reduced form 87 minutes on 512 to 11 minutes on 8192 nodes. This work demonstrates that it is possible to run simulations of the presented detailed cardiac model within hours for the simulation of a heart beat.
Implicit method for the computation of unsteady flows on unstructured grids

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Mavriplis, D. J.

1995-01-01

An implicit method for the computation of unsteady flows on unstructured grids is presented. Following a finite difference approximation for the time derivative, the resulting nonlinear system of equations is solved at each time step by using an agglomeration multigrid procedure. The method allows for arbitrarily large time steps and is efficient in terms of computational effort and storage. Inviscid and viscous unsteady flows are computed to validate the procedure. The issue of the mass matrix which arises with vertex-centered finite volume schemes is addressed. The present formulation allows the mass matrix to be inverted indirectly. A mesh point movement and reconnection procedure is described that allows the grids to evolve with the motion of bodies. As an example of flow over bodies in relative motion, flow over a multi-element airfoil system undergoing deployment is computed.
Fast computation of quadrupole and hexadecapole approximations in microlensing with a single point-source evaluation

NASA Astrophysics Data System (ADS)

Cassan, Arnaud

2017-07-01

The exoplanet detection rate from gravitational microlensing has grown significantly in recent years thanks to a great enhancement of resources and improved observational strategy. Current observatories include ground-based wide-field and/or robotic world-wide networks of telescopes, as well as space-based observatories such as satellites Spitzer or Kepler/K2. This results in a large quantity of data to be processed and analysed, which is a challenge for modelling codes because of the complexity of the parameter space to be explored and the intensive computations required to evaluate the models. In this work, I present a method that allows to compute the quadrupole and hexadecapole approximations of the finite-source magnification with more efficiency than previously available codes, with routines about six times and four times faster, respectively. The quadrupole takes just about twice the time of a point-source evaluation, which advocates for generalizing its use to large portions of the light curves. The corresponding routines are available as open-source python codes.
An impulsive receptance technique for the time domain computation of the vibration of a whole aero-engine model with nonlinear bearings

NASA Astrophysics Data System (ADS)

Hai, Pham Minh; Bonello, Philip

2008-12-01

The direct study of the vibration of real engine structures with nonlinear bearings, particularly aero-engines, has been severely limited by the fact that current nonlinear computational techniques are not well-suited for complex large-order systems. This paper introduces a novel implicit "impulsive receptance method" (IRM) for the time domain analysis of such structures. The IRM's computational efficiency is largely immune to the number of modes used and dependent only on the number of nonlinear elements. This means that, apart from retaining numerical accuracy, a much more physically accurate solution is achievable within a short timeframe. Simulation tests on a realistically sized representative twin-spool aero-engine showed that the new method was around 40 times faster than a conventional implicit integration scheme. Preliminary results for a given rotor unbalance distribution revealed the varying degree of journal lift, orbit size and shape at the example engine's squeeze-film damper bearings, and the effect of end-sealing at these bearings.
Performance analysis of a large-grain dataflow scheduling paradigm

NASA Technical Reports Server (NTRS)

Young, Steven D.; Wills, Robert W.

1993-01-01

A paradigm for scheduling computations on a network of multiprocessors using large-grain data flow scheduling at run time is described and analyzed. The computations to be scheduled must follow a static flow graph, while the schedule itself will be dynamic (i.e., determined at run time). Many applications characterized by static flow exist, and they include real-time control and digital signal processing. With the advent of computer-aided software engineering (CASE) tools for capturing software designs in dataflow-like structures, macro-dataflow scheduling becomes increasingly attractive, if not necessary. For parallel implementations, using the macro-dataflow method allows the scheduling to be insulated from the application designer and enables the maximum utilization of available resources. Further, by allowing multitasking, processor utilizations can approach 100 percent while they maintain maximum speedup. Extensive simulation studies are performed on 4-, 8-, and 16-processor architectures that reflect the effects of communication delays, scheduling delays, algorithm class, and multitasking on performance and speedup gains.
Computer measurement of particle sizes in electron microscope images

NASA Technical Reports Server (NTRS)

Hall, E. L.; Thompson, W. B.; Varsi, G.; Gauldin, R.

1976-01-01

Computer image processing techniques have been applied to particle counting and sizing in electron microscope images. Distributions of particle sizes were computed for several images and compared to manually computed distributions. The results of these experiments indicate that automatic particle counting within a reasonable error and computer processing time is feasible. The significance of the results is that the tedious task of manually counting a large number of particles can be eliminated while still providing the scientist with accurate results.
On the Large-Scaling Issues of Cloud-based Applications for Earth Science Dat

NASA Astrophysics Data System (ADS)

Hua, H.

2016-12-01

Next generation science data systems are needed to address the incoming flood of data from new missions such as NASA's SWOT and NISAR where its SAR data volumes and data throughput rates are order of magnitude larger than present day missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Experiences have shown that to embrace efficient cloud computing approaches for large-scale science data systems requires more than just moving existing code to cloud environments. At large cloud scales, we need to deal with scaling and cost issues. We present our experiences on deploying multiple instances of our hybrid-cloud computing science data system (HySDS) to support large-scale processing of Earth Science data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer 75%-90% costs savings but with an unpredictable computing environment based on market forces.
Stream-based Hebbian eigenfilter for real-time neuronal spike discrimination

PubMed Central

2012-01-01

Background Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calculating PCs of neuronal spikes in our previous work, which eliminates the needs of computationally expensive covariance analysis and eigenvalue decomposition in conventional PCA algorithms. However, large memory resources are still inherently required for storing a large volume of aligned spikes for training PCs. The large size memory will consume large hardware resources and contribute significant power dissipation, which make GHA difficult to be implemented in portable or implantable multi-channel recording micro-systems. Method In this paper, we present a new algorithm for PCA-based spike sorting based on GHA, namely stream-based Hebbian eigenfilter, which eliminates the inherent memory requirements of GHA while keeping the accuracy of spike sorting by utilizing the pseudo-stationarity of neuronal spikes. Because of the reduction of large hardware storage requirements, the proposed algorithm can lead to ultra-low hardware resources and power consumption of hardware implementations, which is critical for the future multi-channel micro-systems. Both clinical and synthetic neural recording data sets were employed for evaluating the accuracy of the stream-based Hebbian eigenfilter. The performance of spike sorting using stream-based eigenfilter and the computational complexity of the eigenfilter were rigorously evaluated and compared with conventional PCA algorithms. Field programmable logic arrays (FPGAs) were employed to implement the proposed algorithm, evaluate the hardware implementations and demonstrate the reduction in both power consumption and hardware memories achieved by the streaming computing Results and discussion Results demonstrate that the stream-based eigenfilter can achieve the same accuracy and is 10 times more computationally efficient when compared with conventional PCA algorithms. Hardware evaluations show that 90.3% logic resources, 95.1% power consumption and 86.8% computing latency can be reduced by the stream-based eigenfilter when compared with PCA hardware. By utilizing the streaming method, 92% memory resources and 67% power consumption can be saved when compared with the direct implementation of GHA. Conclusion Stream-based Hebbian eigenfilter presents a novel approach to enable real-time spike sorting with reduced computational complexity and hardware costs. This new design can be further utilized for multi-channel neuro-physiological experiments or chronic implants. PMID:22490725
Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine.

PubMed

Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A

2017-02-11

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.
Theoretical and empirical comparison of big data image processing with Apache Hadoop and Sun Grid Engine

NASA Astrophysics Data System (ADS)

Bao, Shunxing; Weitendorf, Frederick D.; Plassard, Andrew J.; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.

2017-03-01

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and nonrelevant for medical imaging.
Universal adiabatic quantum computation via the space-time circuit-to-Hamiltonian construction.

PubMed

Gosset, David; Terhal, Barbara M; Vershynina, Anna

2015-04-10

We show how to perform universal adiabatic quantum computation using a Hamiltonian which describes a set of particles with local interactions on a two-dimensional grid. A single parameter in the Hamiltonian is adiabatically changed as a function of time to simulate the quantum circuit. We bound the eigenvalue gap above the unique ground state by mapping our model onto the ferromagnetic XXZ chain with kink boundary conditions; the gap of this spin chain was computed exactly by Koma and Nachtergaele using its q-deformed version of SU(2) symmetry. We also discuss a related time-independent Hamiltonian which was shown by Janzing to be capable of universal computation. We observe that in the limit of large system size, the time evolution is equivalent to the exactly solvable quantum walk on Young's lattice.
Real-Time Fourier Synthesis of Ensembles with Timbral Interpolation

NASA Astrophysics Data System (ADS)

Haken, Lippold

1990-01-01

In Fourier synthesis, natural musical sounds are produced by summing time-varying sinusoids. Sounds are analyzed to find the amplitude and frequency characteristics for their sinusoids; interpolation between the characteristics of several sounds is used to produce intermediate timbres. An ensemble can be synthesized by summing all the sinusoids for several sounds, but in practice it is difficult to perform such computations in real time. To solve this problem on inexpensive hardware, it is useful to take advantage of the masking effects of the auditory system. By avoiding the computations for perceptually unimportant sinusoids, and by employing other computation reduction techniques, a large ensemble may be synthesized in real time on the Platypus signal processor. Unlike existing computation reduction techniques, the techniques described in this thesis do not sacrifice independent fine control over the amplitude and frequency characteristics of each sinusoid.
Universal Adiabatic Quantum Computation via the Space-Time Circuit-to-Hamiltonian Construction

NASA Astrophysics Data System (ADS)

Gosset, David; Terhal, Barbara M.; Vershynina, Anna

2015-04-01

We show how to perform universal adiabatic quantum computation using a Hamiltonian which describes a set of particles with local interactions on a two-dimensional grid. A single parameter in the Hamiltonian is adiabatically changed as a function of time to simulate the quantum circuit. We bound the eigenvalue gap above the unique ground state by mapping our model onto the ferromagnetic X X Z chain with kink boundary conditions; the gap of this spin chain was computed exactly by Koma and Nachtergaele using its q -deformed version of SU(2) symmetry. We also discuss a related time-independent Hamiltonian which was shown by Janzing to be capable of universal computation. We observe that in the limit of large system size, the time evolution is equivalent to the exactly solvable quantum walk on Young's lattice.
Modeling and optimum time performance for concurrent processing

NASA Technical Reports Server (NTRS)

Mielke, Roland R.; Stoughton, John W.; Som, Sukhamoy

1988-01-01

The development of a new graph theoretic model for describing the relation between a decomposed algorithm and its execution in a data flow environment is presented. Called ATAMM, the model consists of a set of Petri net marked graphs useful for representing decision-free algorithms having large-grained, computationally complex primitive operations. Performance time measures which determine computing speed and throughput capacity are defined, and the ATAMM model is used to develop lower bounds for these times. A concurrent processing operating strategy for achieving optimum time performance is presented and illustrated by example.
Boundary conditions for simulating large SAW devices using ANSYS.

PubMed

Peng, Dasong; Yu, Fengqi; Hu, Jian; Li, Peng

2010-08-01

In this report, we propose improved substrate left and right boundary conditions for simulating SAW devices using ANSYS. Compared with the previous methods, the proposed method can greatly reduce computation time. Furthermore, the longer the distance from the first reflector to the last one, the more computation time can be reduced. To verify the proposed method, a design example is presented with device center frequency 971.14 MHz.
Numerical simulation of transonic compressor under circumferential inlet distortion and rotor/stator interference using harmonic balance method

NASA Astrophysics Data System (ADS)

Wang, Ziwei; Jiang, Xiong; Chen, Ti; Hao, Yan; Qiu, Min

2018-05-01

Simulating the unsteady flow of compressor under circumferential inlet distortion and rotor/stator interference would need full-annulus grid with a dual time method. This process is time consuming and needs a large amount of computational resources. Harmonic balance method simulates the unsteady flow in compressor on single passage grid with a series of steady simulations. This will largely increase the computational efficiency in comparison with the dual time method. However, most simulations with harmonic balance method are conducted on the flow under either circumferential inlet distortion or rotor/stator interference. Based on an in-house CFD code, the harmonic balance method is applied in the simulation of flow in the NASA Stage 35 under both circumferential inlet distortion and rotor/stator interference. As the unsteady flow is influenced by two different unsteady disturbances, it leads to the computational instability. The instability can be avoided by coupling the harmonic balance method with an optimizing algorithm. The computational result of harmonic balance method is compared with the result of full-annulus simulation. It denotes that, the harmonic balance method simulates the flow under circumferential inlet distortion and rotor/stator interference as precise as the full-annulus simulation with a speed-up of about 8 times.
Dynamic Identification for Control of Large Space Structures

NASA Technical Reports Server (NTRS)

Ibrahim, S. R.

1985-01-01

This is a compilation of reports by the one author on one subject. It consists of the following five journal articles: (1) A Parametric Study of the Ibrahim Time Domain Modal Identification Algorithm; (2) Large Modal Survey Testing Using the Ibrahim Time Domain Identification Technique; (3) Computation of Normal Modes from Identified Complex Modes; (4) Dynamic Modeling of Structural from Measured Complex Modes; and (5) Time Domain Quasi-Linear Identification of Nonlinear Dynamic Systems.
A Large number of fast cosmological simulations

NASA Astrophysics Data System (ADS)

Koda, Jun; Kazin, E.; Blake, C.

2014-01-01

Mock galaxy catalogs are essential tools to analyze large-scale structure data. Many independent realizations of mock catalogs are necessary to evaluate the uncertainties in the measurements. We perform 3600 cosmological simulations for the WiggleZ Dark Energy Survey to obtain the new improved Baron Acoustic Oscillation (BAO) cosmic distance measurements using the density field "reconstruction" technique. We use 1296^3 particles in a periodic box of 600/h Mpc on a side, which is the minimum requirement from the survey volume and observed galaxies. In order to perform such large number of simulations, we developed a parallel code using the COmoving Lagrangian Acceleration (COLA) method, which can simulate cosmological large-scale structure reasonably well with only 10 time steps. Our simulation is more than 100 times faster than conventional N-body simulations; one COLA simulation takes only 15 minutes with 216 computing cores. We have completed the 3600 simulations with a reasonable computation time of 200k core hours. We also present the results of the revised WiggleZ BAO distance measurement, which are significantly improved by the reconstruction technique.
GPU-accelerated computing for Lagrangian coherent structures of multi-body gravitational regimes

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-04-01

Based on a well-established theoretical foundation, Lagrangian Coherent Structures (LCSs) have elicited widespread research on the intrinsic structures of dynamical systems in many fields, including the field of astrodynamics. Although the application of LCSs in dynamical problems seems straightforward theoretically, its associated computational cost is prohibitive. We propose a block decomposition algorithm developed on Compute Unified Device Architecture (CUDA) platform for the computation of the LCSs of multi-body gravitational regimes. In order to take advantage of GPU's outstanding computing properties, such as Shared Memory, Constant Memory, and Zero-Copy, the algorithm utilizes a block decomposition strategy to facilitate computation of finite-time Lyapunov exponent (FTLE) fields of arbitrary size and timespan. Simulation results demonstrate that this GPU-based algorithm can satisfy double-precision accuracy requirements and greatly decrease the time needed to calculate final results, increasing speed by approximately 13 times. Additionally, this algorithm can be generalized to various large-scale computing problems, such as particle filters, constellation design, and Monte-Carlo simulation.

Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

NASA Technical Reports Server (NTRS)

Morgan, Philip E.

2004-01-01

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Computational modelling of oxygenation processes in enzymes and biomimetic model complexes.

PubMed

de Visser, Sam P; Quesne, Matthew G; Martin, Bodo; Comba, Peter; Ryde, Ulf

2014-01-11

With computational resources becoming more efficient and more powerful and at the same time cheaper, computational methods have become more and more popular for studies on biochemical and biomimetic systems. Although large efforts from the scientific community have gone into exploring the possibilities of computational methods for studies on large biochemical systems, such studies are not without pitfalls and often cannot be routinely done but require expert execution. In this review we summarize and highlight advances in computational methodology and its application to enzymatic and biomimetic model complexes. In particular, we emphasize on topical and state-of-the-art methodologies that are able to either reproduce experimental findings, e.g., spectroscopic parameters and rate constants, accurately or make predictions of short-lived intermediates and fast reaction processes in nature. Moreover, we give examples of processes where certain computational methods dramatically fail.
A computer-aided design system geared toward conceptual design in a research environment. [for hypersonic vehicles

NASA Technical Reports Server (NTRS)

STACK S. H.

1981-01-01

A computer-aided design system has recently been developed specifically for the small research group environment. The system is implemented on a Prime 400 minicomputer linked with a CDC 6600 computer. The goal was to assign the minicomputer specific tasks, such as data input and graphics, thereby reserving the large mainframe computer for time-consuming analysis codes. The basic structure of the design system consists of GEMPAK, a computer code that generates detailed configuration geometry from a minimum of input; interface programs that reformat GEMPAK geometry for input to the analysis codes; and utility programs that simplify computer access and data interpretation. The working system has had a large positive impact on the quantity and quality of research performed by the originating group. This paper describes the system, the major factors that contributed to its particular form, and presents examples of its application.
Domain decomposition in time for PDE-constrained optimization

DOE PAGES

Barker, Andrew T.; Stoll, Martin

2015-08-28

Here, PDE-constrained optimization problems have a wide range of applications, but they lead to very large and ill-conditioned linear systems, especially if the problems are time dependent. In this paper we outline an approach for dealing with such problems by decomposing them in time and applying an additive Schwarz preconditioner in time, so that we can take advantage of parallel computers to deal with the very large linear systems. We then illustrate the performance of our method on a variety of problems.
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

PubMed

Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

2014-09-01

Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
Inhomogeneous Radiation Boundary Conditions Simulating Incoming Acoustic Waves for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Tam, Christopher K. W.; Fang, Jun; Kurbatskii, Konstantin A.

1996-01-01

A set of nonhomogeneous radiation and outflow conditions which automatically generate prescribed incoming acoustic or vorticity waves and, at the same time, are transparent to outgoing sound waves produced internally in a finite computation domain is proposed. This type of boundary condition is needed for the numerical solution of many exterior aeroacoustics problems. In computational aeroacoustics, the computation scheme must be as nondispersive ans nondissipative as possible. It must also support waves with wave speeds which are nearly the same as those of the original linearized Euler equations. To meet these requirements, a high-order/large-stencil scheme is necessary The proposed nonhomogeneous radiation and outflow boundary conditions are designed primarily for use in conjunction with such high-order/large-stencil finite difference schemes.
Discrete-time neural network for fast solving large linear L1 estimation problems and its application to image restoration.

PubMed

Xia, Youshen; Sun, Changyin; Zheng, Wei Xing

2012-05-01

There is growing interest in solving linear L1 estimation problems for sparsity of the solution and robustness against non-Gaussian noise. This paper proposes a discrete-time neural network which can calculate large linear L1 estimation problems fast. The proposed neural network has a fixed computational step length and is proved to be globally convergent to an optimal solution. Then, the proposed neural network is efficiently applied to image restoration. Numerical results show that the proposed neural network is not only efficient in solving degenerate problems resulting from the nonunique solutions of the linear L1 estimation problems but also needs much less computational time than the related algorithms in solving both linear L1 estimation and image restoration problems.
On the reliability of computed chaotic solutions of non-linear differential equations

NASA Astrophysics Data System (ADS)

Liao, Shijun

2009-08-01

A new concept, namely the critical predictable time Tc, is introduced to give a more precise description of computed chaotic solutions of non-linear differential equations: it is suggested that computed chaotic solutions are unreliable and doubtable when t > Tc. This provides us a strategy to detect reliable solution from a given computed result. In this way, the computational phenomena, such as computational chaos (CC), computational periodicity (CP) and computational prediction uncertainty, which are mainly based on long-term properties of computed time-series, can be completely avoided. Using this concept, the famous conclusion `accurate long-term prediction of chaos is impossible' should be replaced by a more precise conclusion that `accurate prediction of chaos beyond the critical predictable time Tc is impossible'. So, this concept also provides us a timescale to determine whether or not a particular time is long enough for a given non-linear dynamic system. Besides, the influence of data inaccuracy and various numerical schemes on the critical predictable time is investigated in details by using symbolic computation software as a tool. A reliable chaotic solution of Lorenz equation in a rather large interval 0 <= t < 1200 non-dimensional Lorenz time units is obtained for the first time. It is found that the precision of the initial condition and the computed data at each time step, which is mathematically necessary to get such a reliable chaotic solution in such a long time, is so high that it is physically impossible due to the Heisenberg uncertainty principle in quantum physics. This, however, provides us a so-called `precision paradox of chaos', which suggests that the prediction uncertainty of chaos is physically unavoidable, and that even the macroscopical phenomena might be essentially stochastic and thus could be described by probability more economically.
Seismic wavefield modeling based on time-domain symplectic and Fourier finite-difference method

NASA Astrophysics Data System (ADS)

Fang, Gang; Ba, Jing; Liu, Xin-xin; Zhu, Kun; Liu, Guo-Chang

2017-06-01

Seismic wavefield modeling is important for improving seismic data processing and interpretation. Calculations of wavefield propagation are sometimes not stable when forward modeling of seismic wave uses large time steps for long times. Based on the Hamiltonian expression of the acoustic wave equation, we propose a structure-preserving method for seismic wavefield modeling by applying the symplectic finite-difference method on time grids and the Fourier finite-difference method on space grids to solve the acoustic wave equation. The proposed method is called the symplectic Fourier finite-difference (symplectic FFD) method, and offers high computational accuracy and improves the computational stability. Using acoustic approximation, we extend the method to anisotropic media. We discuss the calculations in the symplectic FFD method for seismic wavefield modeling of isotropic and anisotropic media, and use the BP salt model and BP TTI model to test the proposed method. The numerical examples suggest that the proposed method can be used in seismic modeling of strongly variable velocities, offering high computational accuracy and low numerical dispersion. The symplectic FFD method overcomes the residual qSV wave of seismic modeling in anisotropic media and maintains the stability of the wavefield propagation for large time steps.
Interactive design and analysis of future large spacecraft concepts

NASA Technical Reports Server (NTRS)

Garrett, L. B.

1981-01-01

An interactive computer aided design program used to perform systems level design and analysis of large spacecraft concepts is presented. Emphasis is on rapid design, analysis of integrated spacecraft, and automatic spacecraft modeling for lattice structures. Capabilities and performance of multidiscipline applications modules, the executive and data management software, and graphics display features are reviewed. A single user at an interactive terminal create, design, analyze, and conduct parametric studies of Earth orbiting spacecraft with relative ease. Data generated in the design, analysis, and performance evaluation of an Earth-orbiting large diameter antenna satellite are used to illustrate current capabilities. Computer run time statistics for the individual modules quantify the speed at which modeling, analysis, and design evaluation of integrated spacecraft concepts is accomplished in a user interactive computing environment.
Computing and Visualizing Reachable Volumes for Maneuvering Satellites

NASA Astrophysics Data System (ADS)

Jiang, M.; de Vries, W.; Pertica, A.; Olivier, S.

2011-09-01

Detecting and predicting maneuvering satellites is an important problem for Space Situational Awareness. The spatial envelope of all possible locations within reach of such a maneuvering satellite is known as the Reachable Volume (RV). As soon as custody of a satellite is lost, calculating the RV and its subsequent time evolution is a critical component in the rapid recovery of the satellite. In this paper, we present a Monte Carlo approach to computing the RV for a given object. Essentially, our approach samples all possible trajectories by randomizing thrust-vectors, thrust magnitudes and time of burn. At any given instance, the distribution of the "point-cloud" of the virtual particles defines the RV. For short orbital time-scales, the temporal evolution of the point-cloud can result in complex, multi-reentrant manifolds. Visualization plays an important role in gaining insight and understanding into this complex and evolving manifold. In the second part of this paper, we focus on how to effectively visualize the large number of virtual trajectories and the computed RV. We present a real-time out-of-core rendering technique for visualizing the large number of virtual trajectories. We also examine different techniques for visualizing the computed volume of probability density distribution, including volume slicing, convex hull and isosurfacing. We compare and contrast these techniques in terms of computational cost and visualization effectiveness, and describe the main implementation issues encountered during our development process. Finally, we will present some of the results from our end-to-end system for computing and visualizing RVs using examples of maneuvering satellites.
Large Scale Flutter Data for Design of Rotating Blades Using Navier-Stokes Equations

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.

2012-01-01

A procedure to compute flutter boundaries of rotating blades is presented; a) Navier-Stokes equations. b) Frequency domain method compatible with industry practice. Procedure is initially validated: a) Unsteady loads with flapping wing experiment. b) Flutter boundary with fixed wing experiment. Large scale flutter computation is demonstrated for rotating blade: a) Single job submission script. b) Flutter boundary in 24 hour wall clock time with 100 cores. c) Linearly scalable with number of cores. Tested with 1000 cores that produced data in 25 hrs for 10 flutter boundaries. Further wall-clock speed-up is possible by performing parallel computations within each case.
Statistical Model Applied to NetFlow for Network Intrusion Detection

NASA Astrophysics Data System (ADS)

Proto, André; Alexandre, Leandro A.; Batista, Maira L.; Oliveira, Isabela L.; Cansian, Adriano M.

The computers and network services became presence guaranteed in several places. These characteristics resulted in the growth of illicit events and therefore the computers and networks security has become an essential point in any computing environment. Many methodologies were created to identify these events; however, with increasing of users and services on the Internet, many difficulties are found in trying to monitor a large network environment. This paper proposes a methodology for events detection in large-scale networks. The proposal approaches the anomaly detection using the NetFlow protocol, statistical methods and monitoring the environment in a best time for the application.
The Next Frontier in Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sarrao, John

2016-11-16

Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
Opportunities for Breakthroughs in Large-Scale Computational Simulation and Design

NASA Technical Reports Server (NTRS)

Alexandrov, Natalia; Alter, Stephen J.; Atkins, Harold L.; Bey, Kim S.; Bibb, Karen L.; Biedron, Robert T.; Carpenter, Mark H.; Cheatwood, F. McNeil; Drummond, Philip J.; Gnoffo, Peter A.

2002-01-01

Opportunities for breakthroughs in the large-scale computational simulation and design of aerospace vehicles are presented. Computational fluid dynamics tools to be used within multidisciplinary analysis and design methods are emphasized. The opportunities stem from speedups and robustness improvements in the underlying unit operations associated with simulation (geometry modeling, grid generation, physical modeling, analysis, etc.). Further, an improved programming environment can synergistically integrate these unit operations to leverage the gains. The speedups result from reducing the problem setup time through geometry modeling and grid generation operations, and reducing the solution time through the operation counts associated with solving the discretized equations to a sufficient accuracy. The opportunities are addressed only at a general level here, but an extensive list of references containing further details is included. The opportunities discussed are being addressed through the Fast Adaptive Aerospace Tools (FAAST) element of the Advanced Systems Concept to Test (ASCoT) and the third Generation Reusable Launch Vehicles (RLV) projects at NASA Langley Research Center. The overall goal is to enable greater inroads into the design process with large-scale simulations.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.
Vectorization with SIMD extensions speeds up reconstruction in electron tomography.

PubMed

Agulleiro, J I; Garzón, E M; García, I; Fernández, J J

2010-06-01

Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processor's SIMD extensions in the field of 3D electron microscopy.
Physically-enhanced data visualisation: towards real time solution of Partial Differential Equations in 3D domains

NASA Astrophysics Data System (ADS)

Zlotnik, Sergio

2017-04-01

Information provided by visualisation environments can be largely increased if the data shown is combined with some relevant physical processes and the used is allowed to interact with those processes. This is particularly interesting in VR environments where the user has a deep interplay with the data. For example, a geological seismic line in a 3D "cave" shows information of the geological structure of the subsoil. The available information could be enhanced with the thermal state of the region under study, with water-flow patterns in porous rocks or with rock displacements under some stress conditions. The information added by the physical processes is usually the output of some numerical technique applied to solve a Partial Differential Equation (PDE) that describes the underlying physics. Many techniques are available to obtain numerical solutions of PDE (e.g. Finite Elements, Finite Volumes, Finite Differences, etc). Although, all these traditional techniques require very large computational resources (particularly in 3D), making them useless in a real time visualization environment -such as VR- because the time required to compute a solution is measured in minutes or even in hours. We present here a novel alternative for the resolution of PDE-based problems that is able to provide a 3D solutions for a very large family of problems in real time. That is, the solution is evaluated in a one thousands of a second, making the solver ideal to be embedded into VR environments. Based on Model Order Reduction ideas, the proposed technique divides the computational work in to a computationally intensive "offline" phase, that is run only once in a life time, and an "online" phase that allow the real time evaluation of any solution within a family of problems. Preliminary examples of real time solutions of complex PDE-based problems will be presented, including thermal problems, flow problems, wave problems and some simple coupled problems.
Full Quantum Dynamics Simulation of a Realistic Molecular System Using the Adaptive Time-Dependent Density Matrix Renormalization Group Method.

PubMed

Yao, Yao; Sun, Ke-Wei; Luo, Zhen; Ma, Haibo

2018-01-18

The accurate theoretical interpretation of ultrafast time-resolved spectroscopy experiments relies on full quantum dynamics simulations for the investigated system, which is nevertheless computationally prohibitive for realistic molecular systems with a large number of electronic and/or vibrational degrees of freedom. In this work, we propose a unitary transformation approach for realistic vibronic Hamiltonians, which can be coped with using the adaptive time-dependent density matrix renormalization group (t-DMRG) method to efficiently evolve the nonadiabatic dynamics of a large molecular system. We demonstrate the accuracy and efficiency of this approach with an example of simulating the exciton dissociation process within an oligothiophene/fullerene heterojunction, indicating that t-DMRG can be a promising method for full quantum dynamics simulation in large chemical systems. Moreover, it is also shown that the proper vibronic features in the ultrafast electronic process can be obtained by simulating the two-dimensional (2D) electronic spectrum by virtue of the high computational efficiency of the t-DMRG method.

Cloud computing for comparative genomics

PubMed Central

2010-01-01

Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Cloud computing for comparative genomics.

PubMed

Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

2010-05-18

Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
The Use of Computer Simulation Techniques in Educational Planning.

ERIC Educational Resources Information Center

Wilson, Charles Z.

Computer simulations provide powerful models for establishing goals, guidelines, and constraints in educational planning. They are dynamic models that allow planners to examine logical descriptions of organizational behavior over time as well as permitting consideration of the large and complex systems required to provide realistic descriptions of…
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

PubMed

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

PubMed Central

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-01-01

Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
A real-time control system for the control of suspended interferometers based on hybrid computing techniques

NASA Astrophysics Data System (ADS)

Acernese, Fausto; Barone, Fabrizio; De Rosa, Rosario; Eleuteri, Antonio; Milano, Leopoldo; Pardi, Silvio; Ricciardi, Iolanda; Russo, Guido

2004-09-01

One of the main requirements of a digital system for the control of interferometric detectors of gravitational waves is the computing power, that is a direct consequence of the increasing complexity of the digital algorithms necessary for the control signals generation. For this specific task many specialized non standard real-time architectures have been developed, often very expensive and difficult to upgrade. On the other hand, such computing power is generally fully available for off-line applications on standard Pc based systems. Therefore, a possible and obvious solution may be provided by the integration of both the real-time and off-line architecture resulting in a hybrid control system architecture based on standards available components, trying to get both the advantages of the perfect data synchronization provided by the real-time systems and by the large computing power available on Pc based systems. Such integration may be provided by the implementation of the link between the two different architectures through the standard Ethernet network, whose data transfer speed is largely increasing in these years, using the TCP/IP, UDP and raw Ethernet protocols. In this paper we describe the architecture of an hybrid Ethernet based real-time control system prototype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss a possible application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection (IDGW-3P) operational in Napoli.
A two steps solution approach to solving large nonlinear models: application to a problem of conjunctive use.

PubMed

Vieira, J; Cunha, M C

2011-01-01

This article describes a solution method of solving large nonlinear problems in two steps. The two steps solution approach takes advantage of handling smaller and simpler models and having better starting points to improve solution efficiency. The set of nonlinear constraints (named as complicating constraints) which makes the solution of the model rather complex and time consuming is eliminated from step one. The complicating constraints are added only in the second step so that a solution of the complete model is then found. The solution method is applied to a large-scale problem of conjunctive use of surface water and groundwater resources. The results obtained are compared with solutions determined with the direct solve of the complete model in one single step. In all examples the two steps solution approach allowed a significant reduction of the computation time. This potential gain of efficiency of the two steps solution approach can be extremely important for work in progress and it can be particularly useful for cases where the computation time would be a critical factor for having an optimized solution in due time.
Nonlinear relaxation algorithms for circuit simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saleh, R.A.

Circuit simulation is an important Computer-Aided Design (CAD) tool in the design of Integrated Circuits (IC). However, the standard techniques used in programs such as SPICE result in very long computer-run times when applied to large problems. In order to reduce the overall run time, a number of new approaches to circuit simulation were developed and are described. These methods are based on nonlinear relaxation techniques and exploit the relative inactivity of large circuits. Simple waveform-processing techniques are described to determine the maximum possible speed improvement that can be obtained by exploiting this property of large circuits. Three simulation algorithmsmore » are described, two of which are based on the Iterated Timing Analysis (ITA) method and a third based on the Waveform-Relaxation Newton (WRN) method. New programs that incorporate these techniques were developed and used to simulate a variety of industrial circuits. The results from these simulations are provided. The techniques are shown to be much faster than the standard approach. In addition, a number of parallel aspects of these algorithms are described, and a general space-time model of parallel-task scheduling is developed.« less
Bin recycling strategy for improving the histogram precision on GPU

NASA Astrophysics Data System (ADS)

Cárdenas-Montes, Miguel; Rodríguez-Vázquez, Juan José; Vega-Rodríguez, Miguel A.

2016-07-01

Histogram is an easily comprehensible way to present data and analyses. In the current scientific context with access to large volumes of data, the processing time for building histogram has dramatically increased. For this reason, parallel construction is necessary to alleviate the impact of the processing time in the analysis activities. In this scenario, GPU computing is becoming widely used for reducing until affordable levels the processing time of histogram construction. Associated to the increment of the processing time, the implementations are stressed on the bin-count accuracy. Accuracy aspects due to the particularities of the implementations are not usually taken into consideration when building histogram with very large data sets. In this work, a bin recycling strategy to create an accuracy-aware implementation for building histogram on GPU is presented. In order to evaluate the approach, this strategy was applied to the computation of the three-point angular correlation function, which is a relevant function in Cosmology for the study of the Large Scale Structure of Universe. As a consequence of the study a high-accuracy implementation for histogram construction on GPU is proposed.
Large-Scale Simulations of Plastic Neural Networks on Neuromorphic Hardware

PubMed Central

Knight, James C.; Tully, Philip J.; Kaplan, Bernhard A.; Lansner, Anders; Furber, Steve B.

2016-01-01

SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 2.0 × 104 neurons and 5.1 × 107 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately 45× more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models. PMID:27092061
Large-scale neuromorphic computing systems

NASA Astrophysics Data System (ADS)

Furber, Steve

2016-10-01

Neuromorphic computing covers a diverse range of approaches to information processing all of which demonstrate some degree of neurobiological inspiration that differentiates them from mainstream conventional computing systems. The philosophy behind neuromorphic computing has its origins in the seminal work carried out by Carver Mead at Caltech in the late 1980s. This early work influenced others to carry developments forward, and advances in VLSI technology supported steady growth in the scale and capability of neuromorphic devices. Recently, a number of large-scale neuromorphic projects have emerged, taking the approach to unprecedented scales and capabilities. These large-scale projects are associated with major new funding initiatives for brain-related research, creating a sense that the time and circumstances are right for progress in our understanding of information processing in the brain. In this review we present a brief history of neuromorphic engineering then focus on some of the principal current large-scale projects, their main features, how their approaches are complementary and distinct, their advantages and drawbacks, and highlight the sorts of capabilities that each can deliver to neural modellers.
Volunteered Cloud Computing for Disaster Management

NASA Astrophysics Data System (ADS)

Evans, J. D.; Hao, W.; Chettri, S. R.

2014-12-01

Disaster management relies increasingly on interpreting earth observations and running numerical models; which require significant computing capacity - usually on short notice and at irregular intervals. Peak computing demand during event detection, hazard assessment, or incident response may exceed agency budgets; however some of it can be met through volunteered computing, which distributes subtasks to participating computers via the Internet. This approach has enabled large projects in mathematics, basic science, and climate research to harness the slack computing capacity of thousands of desktop computers. This capacity is likely to diminish as desktops give way to battery-powered mobile devices (laptops, smartphones, tablets) in the consumer market; but as cloud computing becomes commonplace, it may offer significant slack capacity -- if its users are given an easy, trustworthy mechanism for participating. Such a "volunteered cloud computing" mechanism would also offer several advantages over traditional volunteered computing: tasks distributed within a cloud have fewer bandwidth limitations; granular billing mechanisms allow small slices of "interstitial" computing at no marginal cost; and virtual storage volumes allow in-depth, reversible machine reconfiguration. Volunteered cloud computing is especially suitable for "embarrassingly parallel" tasks, including ones requiring large data volumes: examples in disaster management include near-real-time image interpretation, pattern / trend detection, or large model ensembles. In the context of a major disaster, we estimate that cloud users (if suitably informed) might volunteer hundreds to thousands of CPU cores across a large provider such as Amazon Web Services. To explore this potential, we are building a volunteered cloud computing platform and targeting it to a disaster management context. Using a lightweight, fault-tolerant network protocol, this platform helps cloud users join parallel computing projects; automates reconfiguration of their virtual machines; ensures accountability for donated computing; and optimizes the use of "interstitial" computing. Initial applications include fire detection from multispectral satellite imagery and flood risk mapping through hydrological simulations.
1001 Ways to run AutoDock Vina for virtual screening

NASA Astrophysics Data System (ADS)

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
1001 Ways to run AutoDock Vina for virtual screening.

PubMed

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
NASA's computer science research program

NASA Technical Reports Server (NTRS)

Larsen, R. L.

1983-01-01

Following a major assessment of NASA's computing technology needs, a new program of computer science research has been initiated by the Agency. The program includes work in concurrent processing, management of large scale scientific databases, software engineering, reliable computing, and artificial intelligence. The program is driven by applications requirements in computational fluid dynamics, image processing, sensor data management, real-time mission control and autonomous systems. It consists of university research, in-house NASA research, and NASA's Research Institute for Advanced Computer Science (RIACS) and Institute for Computer Applications in Science and Engineering (ICASE). The overall goal is to provide the technical foundation within NASA to exploit advancing computing technology in aerospace applications.
Aerodynamics and vortical structures in hovering fruitflies

NASA Astrophysics Data System (ADS)

Meng, Xue Guang; Sun, Mao

2015-03-01

We measure the wing kinematics and morphological parameters of seven freely hovering fruitflies and numerically compute the flows of the flapping wings. The computed mean lift approximately equals to the measured weight and the mean horizontal force is approximately zero, validating the computational model. Because of the very small relative velocity of the wing, the mean lift coefficient required to support the weight is rather large, around 1.8, and the Reynolds number of the wing is low, around 100. How such a large lift is produced at such a low Reynolds number is explained by combining the wing motion data, the computed vortical structures, and the theory of vorticity dynamics. It has been shown that two unsteady mechanisms are responsible for the high lift. One is referred as to "fast pitching-up rotation": at the start of an up- or downstroke when the wing has very small speed, it fast pitches down to a small angle of attack, and then, when its speed is higher, it fast pitches up to the angle it normally uses. When the wing pitches up while moving forward, large vorticity is produced and sheds at the trailing edge, and vorticity of opposite sign is produced near the leading edge and on the upper surface, resulting in a large time rate of change of the first moment of vorticity (or fluid impulse), hence a large aerodynamic force. The other is the well known "delayed stall" mechanism: in the mid-portion of the up- or downstroke the wing moves at large angle of attack (about 45 deg) and the leading-edge-vortex (LEV) moves with the wing; thus, the vortex ring, formed by the LEV, the tip vortices, and the starting vortex, expands in size continuously, producing a large time rate of change of fluid impulse or a large aerodynamic force.
Design and implementation of a UNIX based distributed computing system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Love, J.S.; Michael, M.W.

1994-12-31

We have designed, implemented, and are running a corporate-wide distributed processing batch queue on a large number of networked workstations using the UNIX{reg_sign} operating system. Atlas Wireline researchers and scientists have used the system for over a year. The large increase in available computer power has greatly reduced the time required for nuclear and electromagnetic tool modeling. Use of remote distributed computing has simultaneously reduced computation costs and increased usable computer time. The system integrates equipment from different manufacturers, using various CPU architectures, distinct operating system revisions, and even multiple processors per machine. Various differences between the machines have tomore » be accounted for in the master scheduler. These differences include shells, command sets, swap spaces, memory sizes, CPU sizes, and OS revision levels. Remote processing across a network must be performed in a manner that is seamless from the users` perspective. The system currently uses IBM RISC System/6000{reg_sign}, SPARCstation{sup TM}, HP9000s700, HP9000s800, and DEC Alpha AXP{sup TM} machines. Each CPU in the network has its own speed rating, allowed working hours, and workload parameters. The system if designed so that all of the computers in the network can be optimally scheduled without adversely impacting the primary users of the machines. The increase in the total usable computational capacity by means of distributed batch computing can change corporate computing strategy. The integration of disparate computer platforms eliminates the need to buy one type of computer for computations, another for graphics, and yet another for day-to-day operations. It might be possible, for example, to meet all research and engineering computing needs with existing networked computers.« less
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Experience in using commercial clouds in CMS

NASA Astrophysics Data System (ADS)

Bauerdick, L.; Bockelman, B.; Dykstra, D.; Fuess, S.; Garzoglio, G.; Girone, M.; Gutsche, O.; Holzman, B.; Hufnagel, D.; Kim, H.; Kennedy, R.; Mason, D.; Spentzouris, P.; Timm, S.; Tiradani, A.; Vaandering, E.; CMS Collaboration

2017-10-01

Historically high energy physics computing has been performed on large purpose-built computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.
Experience in using commercial clouds in CMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bauerdick, L.; Bockelman, B.; Dykstra, D.

Historically high energy physics computing has been performed on large purposebuilt computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is amore » growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.« less

Large Scale Document Inversion using a Multi-threaded Computing System

PubMed Central

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2018-01-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.

PubMed

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2017-06-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
OpenCL-based vicinity computation for 3D multiresolution mesh compression

NASA Astrophysics Data System (ADS)

Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri

2017-03-01

3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Load Balancing Strategies for Multiphase Flows on Structured Grids

NASA Astrophysics Data System (ADS)

Olshefski, Kristopher; Owkes, Mark

2017-11-01

The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Restricted access Improved hydrogeophysical characterization and monitoring through parallel modeling and inversion of time-domain resistivity andinduced-polarization data

USGS Publications Warehouse

Johnson, Timothy C.; Versteeg, Roelof J.; Ward, Andy; Day-Lewis, Frederick D.; Revil, André

2010-01-01

Electrical geophysical methods have found wide use in the growing discipline of hydrogeophysics for characterizing the electrical properties of the subsurface and for monitoring subsurface processes in terms of the spatiotemporal changes in subsurface conductivity, chargeability, and source currents they govern. Presently, multichannel and multielectrode data collections systems can collect large data sets in relatively short periods of time. Practitioners, however, often are unable to fully utilize these large data sets and the information they contain because of standard desktop-computer processing limitations. These limitations can be addressed by utilizing the storage and processing capabilities of parallel computing environments. We have developed a parallel distributed-memory forward and inverse modeling algorithm for analyzing resistivity and time-domain induced polar-ization (IP) data. The primary components of the parallel computations include distributed computation of the pole solutions in forward mode, distributed storage and computation of the Jacobian matrix in inverse mode, and parallel execution of the inverse equation solver. We have tested the corresponding parallel code in three efforts: (1) resistivity characterization of the Hanford 300 Area Integrated Field Research Challenge site in Hanford, Washington, U.S.A., (2) resistivity characterization of a volcanic island in the southern Tyrrhenian Sea in Italy, and (3) resistivity and IP monitoring of biostimulation at a Superfund site in Brandywine, Maryland, U.S.A. Inverse analysis of each of these data sets would be limited or impossible in a standard serial computing environment, which underscores the need for parallel high-performance computing to fully utilize the potential of electrical geophysical methods in hydrogeophysical applications.
Accurate chemical master equation solution using multi-finite buffers

DOE PAGES

Cao, Youfang; Terebus, Anna; Liang, Jie

2016-06-29

Here, the discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multiscale nature of many networks where reaction rates have a large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate solutions can be computed. It is also important to know if all major probabilistic peaks have been computed. Here we introduce the accurate CME (ACME) algorithm for obtaining direct solutions to dCMEs. With multifinite buffers for reducing the state spacemore » by $O(n!)$, exact steady-state and time-evolving network probability landscapes can be computed. We further describe a theoretical framework of aggregating microstates into a smaller number of macrostates by decomposing a network into independent aggregated birth and death processes and give an a priori method for rapidly determining steady-state truncation errors. The maximal sizes of the finite buffers for a given error tolerance can also be precomputed without costly trial solutions of dCMEs. We show exactly computed probability landscapes of three multiscale networks, namely, a 6-node toggle switch, 11-node phage-lambda epigenetic circuit, and 16-node MAPK cascade network, the latter two with no known solutions. We also show how probabilities of rare events can be computed from first-passage times, another class of unsolved problems challenging for simulation-based techniques due to large separations in time scales. Overall, the ACME method enables accurate and efficient solutions of the dCME for a large class of networks.« less
Accurate chemical master equation solution using multi-finite buffers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cao, Youfang; Terebus, Anna; Liang, Jie

Here, the discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multiscale nature of many networks where reaction rates have a large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate solutions can be computed. It is also important to know if all major probabilistic peaks have been computed. Here we introduce the accurate CME (ACME) algorithm for obtaining direct solutions to dCMEs. With multifinite buffers for reducing the state spacemore » by $O(n!)$, exact steady-state and time-evolving network probability landscapes can be computed. We further describe a theoretical framework of aggregating microstates into a smaller number of macrostates by decomposing a network into independent aggregated birth and death processes and give an a priori method for rapidly determining steady-state truncation errors. The maximal sizes of the finite buffers for a given error tolerance can also be precomputed without costly trial solutions of dCMEs. We show exactly computed probability landscapes of three multiscale networks, namely, a 6-node toggle switch, 11-node phage-lambda epigenetic circuit, and 16-node MAPK cascade network, the latter two with no known solutions. We also show how probabilities of rare events can be computed from first-passage times, another class of unsolved problems challenging for simulation-based techniques due to large separations in time scales. Overall, the ACME method enables accurate and efficient solutions of the dCME for a large class of networks.« less
Mono and multi-objective optimization techniques applied to a large range of industrial test cases using Metamodel assisted Evolutionary Algorithms

NASA Astrophysics Data System (ADS)

Fourment, Lionel; Ducloux, Richard; Marie, Stéphane; Ejday, Mohsen; Monnereau, Dominique; Massé, Thomas; Montmitonnet, Pierre

2010-06-01

The use of material processing numerical simulation allows a strategy of trial and error to improve virtual processes without incurring material costs or interrupting production and therefore save a lot of money, but it requires user time to analyze the results, adjust the operating conditions and restart the simulation. Automatic optimization is the perfect complement to simulation. Evolutionary Algorithm coupled with metamodelling makes it possible to obtain industrially relevant results on a very large range of applications within a few tens of simulations and without any specific automatic optimization technique knowledge. Ten industrial partners have been selected to cover the different area of the mechanical forging industry and provide different examples of the forming simulation tools. It aims to demonstrate that it is possible to obtain industrially relevant results on a very large range of applications within a few tens of simulations and without any specific automatic optimization technique knowledge. The large computational time is handled by a metamodel approach. It allows interpolating the objective function on the entire parameter space by only knowing the exact function values at a reduced number of "master points". Two algorithms are used: an evolution strategy combined with a Kriging metamodel and a genetic algorithm combined with a Meshless Finite Difference Method. The later approach is extended to multi-objective optimization. The set of solutions, which corresponds to the best possible compromises between the different objectives, is then computed in the same way. The population based approach allows using the parallel capabilities of the utilized computer with a high efficiency. An optimization module, fully embedded within the Forge2009 IHM, makes possible to cover all the defined examples, and the use of new multi-core hardware to compute several simulations at the same time reduces the needed time dramatically. The presented examples demonstrate the method versatility. They include billet shape optimization of a common rail, the cogging of a bar and a wire drawing problem.
Thermal/structural modeling of a large scale in situ overtest experiment for defense high level waste at the Waste Isolation Pilot Plant Facility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morgan, H.S.; Stone, C.M.; Krieg, R.D.

Several large scale in situ experiments in bedded salt formations are currently underway at the Waste Isolation Pilot Plant (WIPP) near Carlsbad, New Mexico, USA. In these experiments, the thermal and creep responses of salt around several different underground room configurations are being measured. Data from the tests are to be compared to thermal and structural responses predicted in pretest reference calculations. The purpose of these comparisons is to evaluate computational models developed from laboratory data prior to fielding of the in situ experiments. In this paper, the computational models used in the pretest reference calculation for one of themore » large scale tests, The Overtest for Defense High Level Waste, are described; and the pretest computed thermal and structural responses are compared to early data from the experiment. The comparisons indicate that computed and measured temperatures for the test agree to within ten percent but that measured deformation rates are between two and three times greater than corresponsing computed rates. 10 figs., 3 tabs.« less
Localization Algorithm Based on a Spring Model (LASM) for Large Scale Wireless Sensor Networks.

PubMed

Chen, Wanming; Mei, Tao; Meng, Max Q-H; Liang, Huawei; Liu, Yumei; Li, Yangming; Li, Shuai

2008-03-15

A navigation method for a lunar rover based on large scale wireless sensornetworks is proposed. To obtain high navigation accuracy and large exploration area, highnode localization accuracy and large network scale are required. However, thecomputational and communication complexity and time consumption are greatly increasedwith the increase of the network scales. A localization algorithm based on a spring model(LASM) method is proposed to reduce the computational complexity, while maintainingthe localization accuracy in large scale sensor networks. The algorithm simulates thedynamics of physical spring system to estimate the positions of nodes. The sensor nodesare set as particles with masses and connected with neighbor nodes by virtual springs. Thevirtual springs will force the particles move to the original positions, the node positionscorrespondingly, from the randomly set positions. Therefore, a blind node position can bedetermined from the LASM algorithm by calculating the related forces with the neighbornodes. The computational and communication complexity are O(1) for each node, since thenumber of the neighbor nodes does not increase proportionally with the network scale size.Three patches are proposed to avoid local optimization, kick out bad nodes and deal withnode variation. Simulation results show that the computational and communicationcomplexity are almost constant despite of the increase of the network scale size. The time consumption has also been proven to remain almost constant since the calculation steps arealmost unrelated with the network scale size.
On the study of control effectiveness and computational efficiency of reduced Saint-Venant model in model predictive control of open channel flow

NASA Astrophysics Data System (ADS)

Xu, M.; van Overloop, P. J.; van de Giesen, N. C.

2011-02-01

Model predictive control (MPC) of open channel flow is becoming an important tool in water management. The complexity of the prediction model has a large influence on the MPC application in terms of control effectiveness and computational efficiency. The Saint-Venant equations, called SV model in this paper, and the Integrator Delay (ID) model are either accurate but computationally costly, or simple but restricted to allowed flow changes. In this paper, a reduced Saint-Venant (RSV) model is developed through a model reduction technique, Proper Orthogonal Decomposition (POD), on the SV equations. The RSV model keeps the main flow dynamics and functions over a large flow range but is easier to implement in MPC. In the test case of a modeled canal reach, the number of states and disturbances in the RSV model is about 45 and 16 times less than the SV model, respectively. The computational time of MPC with the RSV model is significantly reduced, while the controller remains effective. Thus, the RSV model is a promising means to balance the control effectiveness and computational efficiency.
The Next Frontier in Computing

ScienceCinema

Sarrao, John

2018-06-13

Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of todayâs most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
Complexity Optimization and High-Throughput Low-Latency Hardware Implementation of a Multi-Electrode Spike-Sorting Algorithm

PubMed Central

Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix

2017-01-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Complexity optimization and high-throughput low-latency hardware implementation of a multi-electrode spike-sorting algorithm.

PubMed

Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix

2015-03-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
Analysis on the security of cloud computing

NASA Astrophysics Data System (ADS)

He, Zhonglin; He, Yuhua

2011-02-01

Cloud computing is a new technology, which is the fusion of computer technology and Internet development. It will lead the revolution of IT and information field. However, in cloud computing data and application software is stored at large data centers, and the management of data and service is not completely trustable, resulting in safety problems, which is the difficult point to improve the quality of cloud service. This paper briefly introduces the concept of cloud computing. Considering the characteristics of cloud computing, it constructs the security architecture of cloud computing. At the same time, with an eye toward the security threats cloud computing faces, several corresponding strategies are provided from the aspect of cloud computing users and service providers.
Extraction of drainage networks from large terrain datasets using high throughput computing

NASA Astrophysics Data System (ADS)

Gong, Jianya; Xie, Jibo

2009-02-01

Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.
An Assessment of Artificial Compressibility and Pressure Projection Methods for Incompressible Flow Simulations

NASA Technical Reports Server (NTRS)

Kwak, Dochan; Kiris, C.; Smith, Charles A. (Technical Monitor)

1998-01-01

Performance of the two commonly used numerical procedures, one based on artificial compressibility method and the other pressure projection method, are compared. These formulations are selected primarily because they are designed for three-dimensional applications. The computational procedures are compared by obtaining steady state solutions of a wake vortex and unsteady solutions of a curved duct flow. For steady computations, artificial compressibility was very efficient in terms of computing time and robustness. For an unsteady flow which requires small physical time step, pressure projection method was found to be computationally more efficient than an artificial compressibility method. This comparison is intended to give some basis for selecting a method or a flow solution code for large three-dimensional applications where computing resources become a critical issue.
Recent Advances in X-ray Cone-beam Computed Laminography.

PubMed

O'Brien, Neil S; Boardman, Richard P; Sinclair, Ian; Blumensath, Thomas

2016-10-06

X-ray computed tomography is an established volume imaging technique used routinely in medical diagnosis, industrial non-destructive testing, and a wide range of scientific fields. Traditionally, computed tomography uses scanning geometries with a single axis of rotation together with reconstruction algorithms specifically designed for this setup. Recently there has however been increasing interest in more complex scanning geometries. These include so called X-ray computed laminography systems capable of imaging specimens with large lateral dimensions or large aspect ratios, neither of which are well suited to conventional CT scanning procedures. Developments throughout this field have thus been rapid, including the introduction of novel system trajectories, the application and refinement of various reconstruction methods, and the use of recently developed computational hardware and software techniques to accelerate reconstruction times. Here we examine the advances made in the last several years and consider their impact on the state of the art.
Efficient coarse simulation of a growing avascular tumor

PubMed Central

Kavousanakis, Michail E.; Liu, Ping; Boudouvis, Andreas G.; Lowengrub, John; Kevrekidis, Ioannis G.

2013-01-01

The subject of this work is the development and implementation of algorithms which accelerate the simulation of early stage tumor growth models. Among the different computational approaches used for the simulation of tumor progression, discrete stochastic models (e.g., cellular automata) have been widely used to describe processes occurring at the cell and subcell scales (e.g., cell-cell interactions and signaling processes). To describe macroscopic characteristics (e.g., morphology) of growing tumors, large numbers of interacting cells must be simulated. However, the high computational demands of stochastic models make the simulation of large-scale systems impractical. Alternatively, continuum models, which can describe behavior at the tumor scale, often rely on phenomenological assumptions in place of rigorous upscaling of microscopic models. This limits their predictive power. In this work, we circumvent the derivation of closed macroscopic equations for the growing cancer cell populations; instead, we construct, based on the so-called “equation-free” framework, a computational superstructure, which wraps around the individual-based cell-level simulator and accelerates the computations required for the study of the long-time behavior of systems involving many interacting cells. The microscopic model, e.g., a cellular automaton, which simulates the evolution of cancer cell populations, is executed for relatively short time intervals, at the end of which coarse-scale information is obtained. These coarse variables evolve on slower time scales than each individual cell in the population, enabling the application of forward projection schemes, which extrapolate their values at later times. This technique is referred to as coarse projective integration. Increasing the ratio of projection times to microscopic simulator execution times enhances the computational savings. Crucial accuracy issues arising for growing tumors with radial symmetry are addressed by applying the coarse projective integration scheme in a cotraveling (cogrowing) frame. As a proof of principle, we demonstrate that the application of this scheme yields highly accurate solutions, while preserving the computational savings of coarse projective integration. PMID:22587128
HPC on Competitive Cloud Resources

NASA Astrophysics Data System (ADS)

Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff

Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.

Simulation of Quantum Many-Body Dynamics for Generic Strongly-Interacting Systems

NASA Astrophysics Data System (ADS)

Meyer, Gregory; Machado, Francisco; Yao, Norman

2017-04-01

Recent experimental advances have enabled the bottom-up assembly of complex, strongly interacting quantum many-body systems from individual atoms, ions, molecules and photons. These advances open the door to studying dynamics in isolated quantum systems as well as the possibility of realizing novel out-of-equilibrium phases of matter. Numerical studies provide insight into these systems; however, computational time and memory usage limit common numerical methods such as exact diagonalization to relatively small Hilbert spaces of dimension 215 . Here we present progress toward a new software package for dynamical time evolution of large generic quantum systems on massively parallel computing architectures. By projecting large sparse Hamiltonians into a much smaller Krylov subspace, we are able to compute the evolution of strongly interacting systems with Hilbert space dimension nearing 230. We discuss and benchmark different design implementations, such as matrix-free methods and GPU based calculations, using both pre-thermal time crystals and the Sachdev-Ye-Kitaev model as examples. We also include a simple symbolic language to describe generic Hamiltonians, allowing simulation of diverse quantum systems without any modification of the underlying C and Fortran code.
Optimum element density studies for finite-element thermal analysis of hypersonic aircraft structures

NASA Technical Reports Server (NTRS)

Ko, William L.; Olona, Timothy; Muramoto, Kyle M.

1990-01-01

Different finite element models previously set up for thermal analysis of the space shuttle orbiter structure are discussed and their shortcomings identified. Element density criteria are established for the finite element thermal modelings of space shuttle orbiter-type large, hypersonic aircraft structures. These criteria are based on rigorous studies on solution accuracies using different finite element models having different element densities set up for one cell of the orbiter wing. Also, a method for optimization of the transient thermal analysis computer central processing unit (CPU) time is discussed. Based on the newly established element density criteria, the orbiter wing midspan segment was modeled for the examination of thermal analysis solution accuracies and the extent of computation CPU time requirements. The results showed that the distributions of the structural temperatures and the thermal stresses obtained from this wing segment model were satisfactory and the computation CPU time was at the acceptable level. The studies offered the hope that modeling the large, hypersonic aircraft structures using high-density elements for transient thermal analysis is possible if a CPU optimization technique was used.
Large-Amplitude, High-Rate Roll Oscillations of a 65 deg Delta Wing at High Incidence

NASA Technical Reports Server (NTRS)

Chaderjian, Neal M.; Schiff, Lewis B.

2000-01-01

The IAR/WL 65 deg delta wing experimental results provide both detail pressure measurements and a wide range of flow conditions covering from simple attached flow, through fully developed vortex and vortex burst flow, up to fully-stalled flow at very high incidence. Thus, the Computational Unsteady Aerodynamics researchers can use it at different level of validating the corresponding code. In this section a range of CFD results are provided for the 65 deg delta wing at selected flow conditions. The time-dependent, three-dimensional, Reynolds-averaged, Navier-Stokes (RANS) equations are used to numerically simulate the unsteady vertical flow. Two sting angles and two large- amplitude, high-rate, forced-roll motions and a damped free-to-roll motion are presented. The free-to-roll motion is computed by coupling the time-dependent RANS equations to the flight dynamic equation of motion. The computed results are compared with experimental pressures, forces, moments and roll angle time history. In addition, surface and off-surface flow particle streaks are also presented.
The adaption and use of research codes for performance assessment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebetrau, A.M.

1987-05-01

Models of real-world phenomena are developed for many reasons. The models are usually, if not always, implemented in the form of a computer code. The characteristics of a code are determined largely by its intended use. Realizations or implementations of detailed mathematical models of complex physical and/or chemical processes are often referred to as research or scientific (RS) codes. Research codes typically require large amounts of computing time. One example of an RS code is a finite-element code for solving complex systems of differential equations that describe mass transfer through some geologic medium. Considerable computing time is required because computationsmore » are done at many points in time and/or space. Codes used to evaluate the overall performance of real-world physical systems are called performance assessment (PA) codes. Performance assessment codes are used to conduct simulated experiments involving systems that cannot be directly observed. Thus, PA codes usually involve repeated simulations of system performance in situations that preclude the use of conventional experimental and statistical methods. 3 figs.« less
Efficient Mining of Interesting Patterns in Large Biological Sequences

PubMed Central

Rashid, Md. Mamunur; Karim, Md. Rezaul; Jeong, Byeong-Soo

2012-01-01

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. PMID:23105928
Efficient mining of interesting patterns in large biological sequences.

PubMed

Rashid, Md Mamunur; Karim, Md Rezaul; Jeong, Byeong-Soo; Choi, Ho-Jin

2012-03-01

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
Fast implementation for compressive recovery of highly accelerated cardiac cine MRI using the balanced sparse model.

PubMed

Ting, Samuel T; Ahmad, Rizwan; Jin, Ning; Craft, Jason; Serafim da Silveira, Juliana; Xue, Hui; Simonetti, Orlando P

2017-04-01

Sparsity-promoting regularizers can enable stable recovery of highly undersampled magnetic resonance imaging (MRI), promising to improve the clinical utility of challenging applications. However, lengthy computation time limits the clinical use of these methods, especially for dynamic MRI with its large corpus of spatiotemporal data. Here, we present a holistic framework that utilizes the balanced sparse model for compressive sensing and parallel computing to reduce the computation time of cardiac MRI recovery methods. We propose a fast, iterative soft-thresholding method to solve the resulting ℓ1-regularized least squares problem. In addition, our approach utilizes a parallel computing environment that is fully integrated with the MRI acquisition software. The methodology is applied to two formulations of the multichannel MRI problem: image-based recovery and k-space-based recovery. Using measured MRI data, we show that, for a 224 × 144 image series with 48 frames, the proposed k-space-based approach achieves a mean reconstruction time of 2.35 min, a 24-fold improvement compared a reconstruction time of 55.5 min for the nonlinear conjugate gradient method, and the proposed image-based approach achieves a mean reconstruction time of 13.8 s. Our approach can be utilized to achieve fast reconstruction of large MRI datasets, thereby increasing the clinical utility of reconstruction techniques based on compressed sensing. Magn Reson Med 77:1505-1515, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
An interactive display system for large-scale 3D models

NASA Astrophysics Data System (ADS)

Liu, Zijian; Sun, Kun; Tao, Wenbing; Liu, Liman

2018-04-01

With the improvement of 3D reconstruction theory and the rapid development of computer hardware technology, the reconstructed 3D models are enlarging in scale and increasing in complexity. Models with tens of thousands of 3D points or triangular meshes are common in practical applications. Due to storage and computing power limitation, it is difficult to achieve real-time display and interaction with large scale 3D models for some common 3D display software, such as MeshLab. In this paper, we propose a display system for large-scale 3D scene models. We construct the LOD (Levels of Detail) model of the reconstructed 3D scene in advance, and then use an out-of-core view-dependent multi-resolution rendering scheme to realize the real-time display of the large-scale 3D model. With the proposed method, our display system is able to render in real time while roaming in the reconstructed scene and 3D camera poses can also be displayed. Furthermore, the memory consumption can be significantly decreased via internal and external memory exchange mechanism, so that it is possible to display a large scale reconstructed scene with over millions of 3D points or triangular meshes in a regular PC with only 4GB RAM.
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

PubMed

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
Robust scalable stabilisability conditions for large-scale heterogeneous multi-agent systems with uncertain nonlinear interactions: towards a distributed computing architecture

NASA Astrophysics Data System (ADS)

Manfredi, Sabato

2016-06-01

Large-scale dynamic systems are becoming highly pervasive in their occurrence with applications ranging from system biology, environment monitoring, sensor networks, and power systems. They are characterised by high dimensionality, complexity, and uncertainty in the node dynamic/interactions that require more and more computational demanding methods for their analysis and control design, as well as the network size and node system/interaction complexity increase. Therefore, it is a challenging problem to find scalable computational method for distributed control design of large-scale networks. In this paper, we investigate the robust distributed stabilisation problem of large-scale nonlinear multi-agent systems (briefly MASs) composed of non-identical (heterogeneous) linear dynamical systems coupled by uncertain nonlinear time-varying interconnections. By employing Lyapunov stability theory and linear matrix inequality (LMI) technique, new conditions are given for the distributed control design of large-scale MASs that can be easily solved by the toolbox of MATLAB. The stabilisability of each node dynamic is a sufficient assumption to design a global stabilising distributed control. The proposed approach improves some of the existing LMI-based results on MAS by both overcoming their computational limits and extending the applicative scenario to large-scale nonlinear heterogeneous MASs. Additionally, the proposed LMI conditions are further reduced in terms of computational requirement in the case of weakly heterogeneous MASs, which is a common scenario in real application where the network nodes and links are affected by parameter uncertainties. One of the main advantages of the proposed approach is to allow to move from a centralised towards a distributed computing architecture so that the expensive computation workload spent to solve LMIs may be shared among processors located at the networked nodes, thus increasing the scalability of the approach than the network size. Finally, a numerical example shows the applicability of the proposed method and its advantage in terms of computational complexity when compared with the existing approaches.
Planetary-Scale Geospatial Data Analysis Techniques in Google's Earth Engine Platform (Invited)

NASA Astrophysics Data System (ADS)

Hancher, M.

2013-12-01

Geoscientists have more and more access to new tools for large-scale computing. With any tool, some tasks are easy and other tasks hard. It is natural to look to new computing platforms to increase the scale and efficiency of existing techniques, but there is a more exiting opportunity to discover and develop a new vocabulary of fundamental analysis idioms that are made easy and effective by these new tools. Google's Earth Engine platform is a cloud computing environment for earth data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog includes a nearly complete archive of scenes from Landsat 4, 5, 7, and 8 that have been processed by the USGS, as well as a wide variety of other remotely-sensed and ancillary data products. Earth Engine supports a just-in-time computation model that enables real-time preview during algorithm development and debugging as well as during experimental data analysis and open-ended data exploration. Data processing operations are performed in parallel across many computers in Google's datacenters. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, resampling, and associating image metadata with pixel data. Early applications of Earth Engine have included the development of Google's global cloud-free fifteen-meter base map and global multi-decadal time-lapse animations, as well as numerous large and small experimental analyses by scientists from a range of academic, government, and non-governmental institutions, working in a wide variety of application areas including forestry, agriculture, urban mapping, and species habitat modeling. Patterns in the successes and failures of these early efforts have begun to emerge, sketching the outlines of a new set of simple and effective approaches to geospatial data analysis.
Lifestyle and health-related quality of life: a cross-sectional study among civil servants in China.

PubMed

Xu, Jun; Qiu, Jincai; Chen, Jie; Zou, Liai; Feng, Liyi; Lu, Yan; Wei, Qian; Zhang, Jinhua

2012-05-04

Health-related quality of life (HRQoL) has been increasingly acknowledged as a valid and appropriate indicator of public health and chronic morbidity. However, limited research was conducted among Chinese civil servants owing to the different lifestyle. The aim of the study was to evaluate the HRQoL among Chinese civil servants and to identify factors might be associated with their HRQoL. A cross-sectional study was conducted to investigate HRQoL of 15,000 civil servants in China using stratified random sampling methods. Independent-Samples t-Test, one-way ANOVA, and multiple stepwise regression were used to analyse the influencing factors and the HRQoL of the civil servants. A univariate analysis showed that there were significant differences among physical component summary (PCS), mental component summary (MCS), and TS between lifestyle factors, such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness (P < 0.05). Multiple stepwise regressions showed that there were significant differences among TS between lifestyle factors, such as breakfast, sleep time, physical exercise, operating computers, sedentariness, work time, and drinking (P < 0.05). In this study, using Short Form 36 items (SF-36), we assessed the association of HRQoL with lifestyle factors, including smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness in China. The performance of the questionnaire in the large-scale survey is satisfactory and provides a large picture of the HRQoL status in Chinese civil servants. Our results indicate that lifestyle factors such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness affect the HRQoL of civil servants in China.
Lifestyle and health-related quality of life: A cross-sectional study among civil servants in China

PubMed Central

2012-01-01

Background Health-related quality of life (HRQoL) has been increasingly acknowledged as a valid and appropriate indicator of public health and chronic morbidity. However, limited research was conducted among Chinese civil servants owing to the different lifestyle. The aim of the study was to evaluate the HRQoL among Chinese civil servants and to identify factors might be associated with their HRQoL. Methods A cross-sectional study was conducted to investigate HRQoL of 15,000 civil servants in China using stratified random sampling methods. Independent-Samples t-Test, one-way ANOVA, and multiple stepwise regression were used to analyse the influencing factors and the HRQoL of the civil servants. Results A univariate analysis showed that there were significant differences among physical component summary (PCS), mental component summary (MCS), and TS between lifestyle factors, such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness (P < 0.05). Multiple stepwise regressions showed that there were significant differences among TS between lifestyle factors, such as breakfast, sleep time, physical exercise, operating computers, sedentariness, work time, and drinking (P < 0.05). Conclusion In this study, using Short Form 36 items (SF-36), we assessed the association of HRQoL with lifestyle factors, including smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness in China. The performance of the questionnaire in the large-scale survey is satisfactory and provides a large picture of the HRQoL status in Chinese civil servants. Our results indicate that lifestyle factors such as smoking, drinking alcohol, having breakfast, sleep time, physical exercise, work time, operating computers, and sedentariness affect the HRQoL of civil servants in China. PMID:22559315
Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.

PubMed

Cole, J B; Newman, S; Foertter, F; Aguilar, I; Coffey, M

2012-03-01

Modern animal breeding data sets are large and getting larger, due in part to recent availability of high-density SNP arrays and cheap sequencing technology. High-performance computing methods for efficient data warehousing and analysis are under development. Financial and security considerations are important when using shared clusters. Sound software engineering practices are needed, and it is better to use existing solutions when possible. Storage requirements for genotypes are modest, although full-sequence data will require greater storage capacity. Storage requirements for intermediate and results files for genetic evaluations are much greater, particularly when multiple runs must be stored for research and validation studies. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, and there is increasing interest in new health and management traits. The collection of sufficient phenotypes to produce accurate evaluations may take many years, and high-reliability proofs for older bulls are needed to estimate marker effects. Data mining algorithms applied to large data sets may help identify unexpected relationships in the data, and improved visualization tools will provide insights. Genomic selection using large data requires a lot of computing power, particularly when large fractions of the population are genotyped. Theoretical improvements have made possible the inversion of large numerator relationship matrices, permitted the solving of large systems of equations, and produced fast algorithms for variance component estimation. Recent work shows that single-step approaches combining BLUP with a genomic relationship (G) matrix have similar computational requirements to traditional BLUP, and the limiting factor is the construction and inversion of G for many genotypes. A naïve algorithm for creating G for 14,000 individuals required almost 24 h to run, but custom libraries and parallel computing reduced that to 15 m. Large data sets also create challenges for the delivery of genetic evaluations that must be overcome in a way that does not disrupt the transition from conventional to genomic evaluations. Processing time is important, especially as real-time systems for on-farm decisions are developed. The ultimate value of these systems is to decrease time-to-results in research, increase accuracy in genomic evaluations, and accelerate rates of genetic improvement.
Adaptation of a program for nonlinear finite element analysis to the CDC STAR 100 computer

NASA Technical Reports Server (NTRS)

Pifko, A. B.; Ogilvie, P. L.

1978-01-01

The conversion of a nonlinear finite element program to the CDC STAR 100 pipeline computer is discussed. The program called DYCAST was developed for the crash simulation of structures. Initial results with the STAR 100 computer indicated that significant gains in computation time are possible for operations on gloval arrays. However, for element level computations that do not lend themselves easily to long vector processing, the STAR 100 was slower than comparable scalar computers. On this basis it is concluded that in order for pipeline computers to impact the economic feasibility of large nonlinear analyses it is absolutely essential that algorithms be devised to improve the efficiency of element level computations.
The potential benefits of photonics in the computing platform

NASA Astrophysics Data System (ADS)

Bautista, Jerry

2005-03-01

The increase in computational requirements for real-time image processing, complex computational fluid dynamics, very large scale data mining in the health industry/Internet, and predictive models for financial markets are driving computer architects to consider new paradigms that rely upon very high speed interconnects within and between computing elements. Further challenges result from reduced power requirements, reduced transmission latency, and greater interconnect density. Optical interconnects may solve many of these problems with the added benefit extended reach. In addition, photonic interconnects provide relative EMI immunity which is becoming an increasing issue with a greater dependence on wireless connectivity. However, to be truly functional, the optical interconnect mesh should be able to support arbitration, addressing, etc. completely in the optical domain with a BER that is more stringent than "traditional" communication requirements. Outlined are challenges in the advanced computing environment, some possible optical architectures and relevant platform technologies, as well roughly sizing these opportunities which are quite large relative to the more "traditional" optical markets.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*

PubMed Central

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.

2015-01-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.

PubMed

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L

2015-02-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Engineering computer graphics in gas turbine engine design, analysis and manufacture

NASA Technical Reports Server (NTRS)

Lopatka, R. S.

1975-01-01

A time-sharing and computer graphics facility designed to provide effective interactive tools to a large number of engineering users with varied requirements was described. The application of computer graphics displays at several levels of hardware complexity and capability is discussed, with examples of graphics systems tracing gas turbine product development, beginning with preliminary design through manufacture. Highlights of an operating system stylized for interactive engineering graphics is described.
Lambda Data Grid: Communications Architecture in Support of Grid Computing

DTIC Science & Technology

2006-12-21

number of paradigm shifts in the 20th century, including the growth of large geographically dispersed teams and the use of simulations and computational...get results. The work in this thesis automates the orchestration of networks with other resources, better utilizing all resources in a time efficient...domains, over transatlantic links in around minute. The main goal of this thesis is to build a new grid-computing paradigm that fully harnesses the

Rapid Calculation of Spacecraft Trajectories Using Efficient Taylor Series Integration

NASA Technical Reports Server (NTRS)

Scott, James R.; Martini, Michael C.

2011-01-01

A variable-order, variable-step Taylor series integration algorithm was implemented in NASA Glenn's SNAP (Spacecraft N-body Analysis Program) code. SNAP is a high-fidelity trajectory propagation program that can propagate the trajectory of a spacecraft about virtually any body in the solar system. The Taylor series algorithm's very high order accuracy and excellent stability properties lead to large reductions in computer time relative to the code's existing 8th order Runge-Kutta scheme. Head-to-head comparison on near-Earth, lunar, Mars, and Europa missions showed that Taylor series integration is 15.8 times faster than Runge- Kutta on average, and is more accurate. These speedups were obtained for calculations involving central body, other body, thrust, and drag forces. Similar speedups have been obtained for calculations that include J2 spherical harmonic for central body gravitation. The algorithm includes a step size selection method that directly calculates the step size and never requires a repeat step. High-order Taylor series integration algorithms have been shown to provide major reductions in computer time over conventional integration methods in numerous scientific applications. The objective here was to directly implement Taylor series integration in an existing trajectory analysis code and demonstrate that large reductions in computer time (order of magnitude) could be achieved while simultaneously maintaining high accuracy. This software greatly accelerates the calculation of spacecraft trajectories. At each time level, the spacecraft position, velocity, and mass are expanded in a high-order Taylor series whose coefficients are obtained through efficient differentiation arithmetic. This makes it possible to take very large time steps at minimal cost, resulting in large savings in computer time. The Taylor series algorithm is implemented primarily through three subroutines: (1) a driver routine that automatically introduces auxiliary variables and sets up initial conditions and integrates; (2) a routine that calculates system reduced derivatives using recurrence relations for quotients and products; and (3) a routine that determines the step size and sums the series. The order of accuracy used in a trajectory calculation is arbitrary and can be set by the user. The algorithm directly calculates the motion of other planetary bodies and does not require ephemeris files (except to start the calculation). The code also runs with Taylor series and Runge-Kutta used interchangeably for different phases of a mission.
A Lagrangian subgrid-scale model with dynamic estimation of Lagrangian time scale for large eddy simulation of complex flows

NASA Astrophysics Data System (ADS)

Verma, Aman; Mahesh, Krishnan

2012-08-01

The dynamic Lagrangian averaging approach for the dynamic Smagorinsky model for large eddy simulation is extended to an unstructured grid framework and applied to complex flows. The Lagrangian time scale is dynamically computed from the solution and does not need any adjustable parameter. The time scale used in the standard Lagrangian model contains an adjustable parameter θ. The dynamic time scale is computed based on a "surrogate-correlation" of the Germano-identity error (GIE). Also, a simple material derivative relation is used to approximate GIE at different events along a pathline instead of Lagrangian tracking or multi-linear interpolation. Previously, the time scale for homogeneous flows was computed by averaging along directions of homogeneity. The present work proposes modifications for inhomogeneous flows. This development allows the Lagrangian averaged dynamic model to be applied to inhomogeneous flows without any adjustable parameter. The proposed model is applied to LES of turbulent channel flow on unstructured zonal grids at various Reynolds numbers. Improvement is observed when compared to other averaging procedures for the dynamic Smagorinsky model, especially at coarse resolutions. The model is also applied to flow over a cylinder at two Reynolds numbers and good agreement with previous computations and experiments is obtained. Noticeable improvement is obtained using the proposed model over the standard Lagrangian model. The improvement is attributed to a physically consistent Lagrangian time scale. The model also shows good performance when applied to flow past a marine propeller in an off-design condition; it regularizes the eddy viscosity and adjusts locally to the dominant flow features.
Structural performance analysis and redesign

NASA Technical Reports Server (NTRS)

Whetstone, W. D.

1978-01-01

Program performs stress buckling and vibrational analysis of large, linear, finite-element systems in excess of 50,000 degrees of freedom. Cost, execution time, and storage requirements are kept reasonable through use of sparse matrix solution techniques, and other computational and data management procedures designed for problems of very large size.
Use of parallel computing for analyzing big data in EEG studies of ambiguous perception

NASA Astrophysics Data System (ADS)

Maksimenko, Vladimir A.; Grubov, Vadim V.; Kirsanov, Daniil V.

2018-02-01

Problem of interaction between human and machine systems through the neuro-interfaces (or brain-computer interfaces) is an urgent task which requires analysis of large amount of neurophysiological EEG data. In present paper we consider the methods of parallel computing as one of the most powerful tools for processing experimental data in real-time with respect to multichannel structure of EEG. In this context we demonstrate the application of parallel computing for the estimation of the spectral properties of multichannel EEG signals, associated with the visual perception. Using CUDA C library we run wavelet-based algorithm on GPUs and show possibility for detection of specific patterns in multichannel set of EEG data in real-time.
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

NASA Astrophysics Data System (ADS)

Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

2017-11-01

The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
A rapid local singularity analysis algorithm with applications

NASA Astrophysics Data System (ADS)

Chen, Zhijun; Cheng, Qiuming; Agterberg, Frits

2015-04-01

The local singularity model developed by Cheng is fast gaining popularity in characterizing mineralization and detecting anomalies of geochemical, geophysical and remote sensing data. However in one of the conventional algorithms involving the moving average values with different scales is time-consuming especially while analyzing a large dataset. Summed area table (SAT), also called as integral image, is a fast algorithm used within the Viola-Jones object detection framework in computer vision area. Historically, the principle of SAT is well-known in the study of multi-dimensional probability distribution functions, namely in computing 2D (or ND) probabilities (area under the probability distribution) from the respective cumulative distribution functions. We introduce SAT and it's variation Rotated Summed Area Table in the isotropic, anisotropic or directional local singularity mapping in this study. Once computed using SAT, any one of the rectangular sum can be computed at any scale or location in constant time. The area for any rectangular region in the image can be computed by using only 4 array accesses in constant time independently of the size of the region; effectively reducing the time complexity from O(n) to O(1). New programs using Python, Julia, matlab and C++ are implemented respectively to satisfy different applications, especially to the big data analysis. Several large geochemical and remote sensing datasets are tested. A wide variety of scale changes (linear spacing or log spacing) for non-iterative or iterative approach are adopted to calculate the singularity index values and compare the results. The results indicate that the local singularity analysis with SAT is more robust and superior to traditional approach in identifying anomalies.
A Cloud-Based Infrastructure for Near-Real-Time Processing and Dissemination of NPP Data

NASA Astrophysics Data System (ADS)

Evans, J. D.; Valente, E. G.; Chettri, S. S.

2011-12-01

We are building a scalable cloud-based infrastructure for generating and disseminating near-real-time data products from a variety of geospatial and meteorological data sources, including the new National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP). Our approach relies on linking Direct Broadcast and other data streams to a suite of scientific algorithms coordinated by NASA's International Polar-Orbiter Processing Package (IPOPP). The resulting data products are directly accessible to a wide variety of end-user applications, via industry-standard protocols such as OGC Web Services, Unidata Local Data Manager, or OPeNDAP, using open source software components. The processing chain employs on-demand computing resources from Amazon.com's Elastic Compute Cloud and NASA's Nebula cloud services. Our current prototype targets short-term weather forecasting, in collaboration with NASA's Short-term Prediction Research and Transition (SPoRT) program and the National Weather Service. Direct Broadcast is especially crucial for NPP, whose current ground segment is unlikely to deliver data quickly enough for short-term weather forecasters and other near-real-time users. Direct Broadcast also allows full local control over data handling, from the receiving antenna to end-user applications: this provides opportunities to streamline processes for data ingest, processing, and dissemination, and thus to make interpreted data products (Environmental Data Records) available to practitioners within minutes of data capture at the sensor. Cloud computing lets us grow and shrink computing resources to meet large and rapid fluctuations in data availability (twice daily for polar orbiters) - and similarly large fluctuations in demand from our target (near-real-time) users. This offers a compelling business case for cloud computing: the processing or dissemination systems can grow arbitrarily large to sustain near-real time data access despite surges in data volumes or user demand, but that computing capacity (and hourly costs) can be dropped almost instantly once the surge passes. Cloud computing also allows low-risk experimentation with a variety of machine architectures (processor types; bandwidth, memory, and storage capacities, etc.) and of system configurations (including massively parallel computing patterns). Finally, our service-based approach (in which user applications invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored products on demand. To maximize the usefulness and impact of our technology, we have emphasized open, industry-standard software interfaces. We are also using and developing open source software to facilitate the widespread adoption of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sources.
A novel computational approach towards the certification of large-scale boson sampling

NASA Astrophysics Data System (ADS)

Huh, Joonsuk

Recent proposals of boson sampling and the corresponding experiments exhibit the possible disproof of extended Church-Turning Thesis. Furthermore, the application of boson sampling to molecular computation has been suggested theoretically. Till now, however, only small-scale experiments with a few photons have been successfully performed. The boson sampling experiments of 20-30 photons are expected to reveal the computational superiority of the quantum device. A novel theoretical proposal for the large-scale boson sampling using microwave photons is highly promising due to the deterministic photon sources and the scalability. Therefore, the certification protocol of large-scale boson sampling experiments should be presented to complete the exciting story. We propose, in this presentation, a computational protocol towards the certification of large-scale boson sampling. The correlations of paired photon modes and the time-dependent characteristic functional with its Fourier component can show the fingerprint of large-scale boson sampling. This work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(NRF-2015R1A6A3A04059773), the ICT R&D program of MSIP/IITP [2015-019, Fundamental Research Toward Secure Quantum Communication] and Mueunjae Institute for Chemistry (MIC) postdoctoral fellowship.
Citizens unite for computational immunology!

PubMed

Belden, Orrin S; Baker, Sarah Catherine; Baker, Brian M

2015-07-01

Recruiting volunteers who can provide computational time, programming expertise, or puzzle-solving talent has emerged as a powerful tool for biomedical research. Recent projects demonstrate the potential for such 'crowdsourcing' efforts in immunology. Tools for developing applications, new funding opportunities, and an eager public make crowdsourcing a serious option for creative solutions for computationally-challenging problems. Expanded uses of crowdsourcing in immunology will allow for more efficient large-scale data collection and analysis. It will also involve, inspire, educate, and engage the public in a variety of meaningful ways. The benefits are real - it is time to jump in! Copyright © 2015 Elsevier Ltd. All rights reserved.
Multitasking the code ARC3D. [for computational fluid dynamics

NASA Technical Reports Server (NTRS)

Barton, John T.; Hsiung, Christopher C.

1986-01-01

The CRAY multitasking system was developed in order to utilize all four processors and sharply reduce the wall clock run time. This paper describes the techniques used to modify the computational fluid dynamics code ARC3D for this run and analyzes the achieved speedup. The ARC3D code solves either the Euler or thin-layer N-S equations using an implicit approximate factorization scheme. Results indicate that multitask processing can be used to achieve wall clock speedup factors of over three times, depending on the nature of the program code being used. Multitasking appears to be particularly advantageous for large-memory problems running on multiple CPU computers.
An Examination of Parameters Affecting Large Eddy Simulations of Flow Past a Square Cylinder

NASA Technical Reports Server (NTRS)

Mankbadi, M. R.; Georgiadis, N. J.

2014-01-01

Separated flow over a bluff body is analyzed via large eddy simulations. The turbulent flow around a square cylinder features a variety of complex flow phenomena such as highly unsteady vortical structures, reverse flow in the near wall region, and wake turbulence. The formation of spanwise vortices is often times artificially suppressed in computations by either insufficient depth or a coarse spanwise resolution. As the resolution is refined and the domain extended, the artificial turbulent energy exchange between spanwise and streamwise turbulence is eliminated within the wake region. A parametric study is performed highlighting the effects of spanwise vortices where the spanwise computational domain's resolution and depth are varied. For Re=22,000, the mean and turbulent statistics computed from the numerical large eddy simulations (NLES) are in good agreement with experimental data. Von-Karman shedding is observed in the wake of the cylinder. Mesh independence is illustrated by comparing a mesh resolution of 2 million to 16 million. Sensitivities to time stepping were minimized and sampling frequency sensitivities were nonpresent. While increasing the spanwise depth and resolution can be costly, this practice was found to be necessary to eliminating the artificial turbulent energy exchange.
Geocomputation over Hybrid Computer Architecture and Systems: Prior Works and On-going Initiatives at UARK

NASA Astrophysics Data System (ADS)

Shi, X.

2015-12-01

As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Validity of questionnaire self‐reports on computer, mouse and keyboard usage during a four‐week period

PubMed Central

Mikkelsen, Sigurd; Vilstrup, Imogen; Lassen, Christina Funch; Kryger, Ann Isabel; Thomsen, Jane Frølund; Andersen, Johan Hviid

2007-01-01

Objective To examine the validity and potential biases in self‐reports of computer, mouse and keyboard usage times, compared with objective recordings. Methods A study population of 1211 people was asked in a questionnaire to estimate the average time they had worked with computer, mouse and keyboard during the past four working weeks. During the same period, a software program recorded these activities objectively. The study was part of a one‐year follow‐up study from 2000–1 of musculoskeletal outcomes among Danish computer workers. Results Self‐reports on computer, mouse and keyboard usage times were positively associated with objectively measured activity, but the validity was low. Self‐reports explained only between a quarter and a third of the variance of objectively measured activity, and were even lower for one measure (keyboard time). Self‐reports overestimated usage times. Overestimation was large at low levels and declined with increasing levels of objectively measured activity. Mouse usage time proportion was an exception with a near 1:1 relation. Variability in objectively measured activity, arm pain, gender and age influenced self‐reports in a systematic way, but the effects were modest and sometimes in different directions. Conclusion Self‐reported durations of computer activities are positively associated with objective measures but they are quite inaccurate. Studies using self‐reports to establish relations between computer work times and musculoskeletal pain could be biased and lead to falsely increased or decreased risk estimates. PMID:17387136
Validity of questionnaire self-reports on computer, mouse and keyboard usage during a four-week period.

PubMed

Mikkelsen, Sigurd; Vilstrup, Imogen; Lassen, Christina Funch; Kryger, Ann Isabel; Thomsen, Jane Frølund; Andersen, Johan Hviid

2007-08-01

To examine the validity and potential biases in self-reports of computer, mouse and keyboard usage times, compared with objective recordings. A study population of 1211 people was asked in a questionnaire to estimate the average time they had worked with computer, mouse and keyboard during the past four working weeks. During the same period, a software program recorded these activities objectively. The study was part of a one-year follow-up study from 2000-1 of musculoskeletal outcomes among Danish computer workers. Self-reports on computer, mouse and keyboard usage times were positively associated with objectively measured activity, but the validity was low. Self-reports explained only between a quarter and a third of the variance of objectively measured activity, and were even lower for one measure (keyboard time). Self-reports overestimated usage times. Overestimation was large at low levels and declined with increasing levels of objectively measured activity. Mouse usage time proportion was an exception with a near 1:1 relation. Variability in objectively measured activity, arm pain, gender and age influenced self-reports in a systematic way, but the effects were modest and sometimes in different directions. Self-reported durations of computer activities are positively associated with objective measures but they are quite inaccurate. Studies using self-reports to establish relations between computer work times and musculoskeletal pain could be biased and lead to falsely increased or decreased risk estimates.
Dynamic remapping of parallel computations with varying resource demands

NASA Technical Reports Server (NTRS)

Nicol, D. M.; Saltz, J. H.

1986-01-01

A large class of computational problems is characterized by frequent synchronization, and computational requirements which change as a function of time. When such a problem must be solved on a message passing multiprocessor machine, the combination of these characteristics lead to system performance which decreases in time. Performance can be improved with periodic redistribution of computational load; however, redistribution can exact a sometimes large delay cost. We study the issue of deciding when to invoke a global load remapping mechanism. Such a decision policy must effectively weigh the costs of remapping against the performance benefits. We treat this problem by constructing two analytic models which exhibit stochastically decreasing performance. One model is quite tractable; we are able to describe the optimal remapping algorithm, and the optimal decision policy governing when to invoke that algorithm. However, computational complexity prohibits the use of the optimal remapping decision policy. We then study the performance of a general remapping policy on both analytic models. This policy attempts to minimize a statistic W(n) which measures the system degradation (including the cost of remapping) per computation step over a period of n steps. We show that as a function of time, the expected value of W(n) has at most one minimum, and that when this minimum exists it defines the optimal fixed-interval remapping policy. Our decision policy appeals to this result by remapping when it estimates that W(n) is minimized. Our performance data suggests that this policy effectively finds the natural frequency of remapping. We also use the analytic models to express the relationship between performance and remapping cost, number of processors, and the computation's stochastic activity.
Conjugate-Gradient Algorithms For Dynamics Of Manipulators

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1993-01-01

Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
The Potentials of Using Cloud Computing in Schools: A Systematic Literature Review

ERIC Educational Resources Information Center

Hartmann, Simon Birk; Braae, Lotte Qulleq Nygaard; Pedersen, Sine; Khalid, Md. Saifuddin

2017-01-01

Cloud Computing (CC) refers to the physical structure of a communications network, where data is stored in large data centers and can be accessed anywhere, at any time, and from different devices. This systematic literature review identifies and categorizes the potential and barriers of cloud-based teaching in schools from an international…
A Survey of Navigational Computer Program in the Museum Setting.

ERIC Educational Resources Information Center

Sliman, Paula

Patron service is a high priority in the library setting and alleviating a large percentage of the directional questions will provide librarians with more time to help patrons more thoroughly than they are able to currently. Furthermore, in view of the current economic trend of downsizing, a navigational computer system program has the potential…
Finite-difference modeling with variable grid-size and adaptive time-step in porous media

NASA Astrophysics Data System (ADS)

Liu, Xinxin; Yin, Xingyao; Wu, Guochen

2014-04-01

Forward modeling of elastic wave propagation in porous media has great importance for understanding and interpreting the influences of rock properties on characteristics of seismic wavefield. However, the finite-difference forward-modeling method is usually implemented with global spatial grid-size and time-step; it consumes large amounts of computational cost when small-scaled oil/gas-bearing structures or large velocity-contrast exist underground. To overcome this handicap, combined with variable grid-size and time-step, this paper developed a staggered-grid finite-difference scheme for elastic wave modeling in porous media. Variable finite-difference coefficients and wavefield interpolation were used to realize the transition of wave propagation between regions of different grid-size. The accuracy and efficiency of the algorithm were shown by numerical examples. The proposed method is advanced with low computational cost in elastic wave simulation for heterogeneous oil/gas reservoirs.
A novel approach to estimate emissions from large transportation networks: Hierarchical clustering-based link-driving-schedules for EPA-MOVES using dynamic time warping measures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aziz, H. M. Abdul; Ukkusuri, Satish V.

We present that EPA-MOVES (Motor Vehicle Emission Simulator) is often integrated with traffic simulators to assess emission levels of large-scale urban networks with signalized intersections. High variations in speed profiles exist in the context of congested urban networks with signalized intersections. The traditional average-speed-based emission estimation technique with EPA-MOVES provides faster execution while underestimates the emissions in most cases because of ignoring the speed variation at congested networks with signalized intersections. In contrast, the atomic second-by-second speed profile (i.e., the trajectory of each vehicle)-based technique provides accurate emissions at the cost of excessive computational power and time. We addressed thismore » issue by developing a novel method to determine the link-driving-schedules (LDSs) for the EPA-MOVES tool. Our research developed a hierarchical clustering technique with dynamic time warping similarity measures (HC-DTW) to find the LDS for EPA-MOVES that is capable of producing emission estimates better than the average-speed-based technique with execution time faster than the atomic speed profile approach. We applied the HC-DTW on a sample data from a signalized corridor and found that HC-DTW can significantly reduce computational time without compromising the accuracy. The developed technique in this research can substantially contribute to the EPA-MOVES-based emission estimation process for large-scale urban transportation network by reducing the computational time with reasonably accurate estimates. This method is highly appropriate for transportation networks with higher variation in speed such as signalized intersections. Lastly, experimental results show error difference ranging from 2% to 8% for most pollutants except PM 10.« less

A novel approach to estimate emissions from large transportation networks: Hierarchical clustering-based link-driving-schedules for EPA-MOVES using dynamic time warping measures

DOE PAGES

Aziz, H. M. Abdul; Ukkusuri, Satish V.

2017-06-29

We present that EPA-MOVES (Motor Vehicle Emission Simulator) is often integrated with traffic simulators to assess emission levels of large-scale urban networks with signalized intersections. High variations in speed profiles exist in the context of congested urban networks with signalized intersections. The traditional average-speed-based emission estimation technique with EPA-MOVES provides faster execution while underestimates the emissions in most cases because of ignoring the speed variation at congested networks with signalized intersections. In contrast, the atomic second-by-second speed profile (i.e., the trajectory of each vehicle)-based technique provides accurate emissions at the cost of excessive computational power and time. We addressed thismore » issue by developing a novel method to determine the link-driving-schedules (LDSs) for the EPA-MOVES tool. Our research developed a hierarchical clustering technique with dynamic time warping similarity measures (HC-DTW) to find the LDS for EPA-MOVES that is capable of producing emission estimates better than the average-speed-based technique with execution time faster than the atomic speed profile approach. We applied the HC-DTW on a sample data from a signalized corridor and found that HC-DTW can significantly reduce computational time without compromising the accuracy. The developed technique in this research can substantially contribute to the EPA-MOVES-based emission estimation process for large-scale urban transportation network by reducing the computational time with reasonably accurate estimates. This method is highly appropriate for transportation networks with higher variation in speed such as signalized intersections. Lastly, experimental results show error difference ranging from 2% to 8% for most pollutants except PM 10.« less
Design of transonic airfoil sections using a similarity theory

NASA Technical Reports Server (NTRS)

Nixon, D.

1978-01-01

A study of the available methods for transonic airfoil and wing design indicates that the most powerful technique is the numerical optimization procedure. However, the computer time for this method is relatively large because of the amount of computation required in the searches during optimization. The optimization method requires that base and calibration solutions be computed to determine a minimum drag direction. The design space is then computationally searched in this direction; it is these searches that dominate the computation time. A recent similarity theory allows certain transonic flows to be calculated rapidly from the base and calibration solutions. In this paper the application of the similarity theory to design problems is examined with the object of at least partially eliminating the costly searches of the design optimization method. An example of an airfoil design is presented.
Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

PubMed

Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

2017-01-01

Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Covering Resilience: A Recent Development for Binomial Checkpointing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walther, Andrea; Narayanan, Sri Hari Krishna

In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, required, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algorithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulationsmore » and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We describe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding implementation and discuss first numerical results.« less
Rapid Modeling of and Response to Large Earthquakes Using Real-Time GPS Networks (Invited)

NASA Astrophysics Data System (ADS)

Crowell, B. W.; Bock, Y.; Squibb, M. B.

2010-12-01

Real-time GPS networks have the advantage of capturing motions throughout the entire earthquake cycle (interseismic, seismic, coseismic, postseismic), and because of this, are ideal for real-time monitoring of fault slip in the region. Real-time GPS networks provide the perfect supplement to seismic networks, which operate with lower noise and higher sampling rates than GPS networks, but only measure accelerations or velocities, putting them at a supreme disadvantage for ascertaining the full extent of slip during a large earthquake in real-time. Here we report on two examples of rapid modeling of recent large earthquakes near large regional real-time GPS networks. The first utilizes Japan’s GEONET consisting of about 1200 stations during the 2003 Mw 8.3 Tokachi-Oki earthquake about 100 km offshore Hokkaido Island and the second investigates the 2010 Mw 7.2 El Mayor-Cucapah earthquake recorded by more than 100 stations in the California Real Time Network. The principal components of strain were computed throughout the networks and utilized as a trigger to initiate earthquake modeling. Total displacement waveforms were then computed in a simulated real-time fashion using a real-time network adjustment algorithm that fixes a station far away from the rupture to obtain a stable reference frame. Initial peak ground displacement measurements can then be used to obtain an initial size through scaling relationships. Finally, a full coseismic model of the event can be run minutes after the event, given predefined fault geometries, allowing emergency first responders and researchers to pinpoint the regions of highest damage. Furthermore, we are also investigating using total displacement waveforms for real-time moment tensor inversions to look at spatiotemporal variations in slip.
Automatic detection and analysis of cell motility in phase-contrast time-lapse images using a combination of maximally stable extremal regions and Kalman filter approaches.

PubMed

Kaakinen, M; Huttunen, S; Paavolainen, L; Marjomäki, V; Heikkilä, J; Eklund, L

2014-01-01

Phase-contrast illumination is simple and most commonly used microscopic method to observe nonstained living cells. Automatic cell segmentation and motion analysis provide tools to analyze single cell motility in large cell populations. However, the challenge is to find a sophisticated method that is sufficiently accurate to generate reliable results, robust to function under the wide range of illumination conditions encountered in phase-contrast microscopy, and also computationally light for efficient analysis of large number of cells and image frames. To develop better automatic tools for analysis of low magnification phase-contrast images in time-lapse cell migration movies, we investigated the performance of cell segmentation method that is based on the intrinsic properties of maximally stable extremal regions (MSER). MSER was found to be reliable and effective in a wide range of experimental conditions. When compared to the commonly used segmentation approaches, MSER required negligible preoptimization steps thus dramatically reducing the computation time. To analyze cell migration characteristics in time-lapse movies, the MSER-based automatic cell detection was accompanied by a Kalman filter multiobject tracker that efficiently tracked individual cells even in confluent cell populations. This allowed quantitative cell motion analysis resulting in accurate measurements of the migration magnitude and direction of individual cells, as well as characteristics of collective migration of cell groups. Our results demonstrate that MSER accompanied by temporal data association is a powerful tool for accurate and reliable analysis of the dynamic behaviour of cells in phase-contrast image sequences. These techniques tolerate varying and nonoptimal imaging conditions and due to their relatively light computational requirements they should help to resolve problems in computationally demanding and often time-consuming large-scale dynamical analysis of cultured cells. © 2013 The Authors Journal of Microscopy © 2013 Royal Microscopical Society.
Library Information System Time-Sharing (LISTS) Project. Final Report.

ERIC Educational Resources Information Center

Black, Donald V.

The Library Information System Time-Sharing (LISTS) experiment was based on three innovations in data processing technology: (1) the advent of computer time-sharing on third-generation machines, (2) the development of general-purpose file-management software and (3) the introduction of large, library-oriented data bases. The main body of the…
Lagrangian velocity and acceleration correlations of large inertial particles in a closed turbulent flow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Machicoane, Nathanaël; Volk, Romain

We investigate the response of large inertial particle to turbulent fluctuations in an inhomogeneous and anisotropic flow. We conduct a Lagrangian study using particles both heavier and lighter than the surrounding fluid, and whose diameters are comparable to the flow integral scale. Both velocity and acceleration correlation functions are analyzed to compute the Lagrangian integral time and the acceleration time scale of such particles. The knowledge of how size and density affect these time scales is crucial in understanding particle dynamics and may permit stochastic process modelization using two-time models (for instance, Sawford’s). As particles are tracked over long timesmore » in the quasi-totality of a closed flow, the mean flow influences their behaviour and also biases the velocity time statistics, in particular the velocity correlation functions. By using a method that allows for the computation of turbulent velocity trajectories, we can obtain unbiased Lagrangian integral time. This is particularly useful in accessing the scale separation for such particles and to comparing it to the case of fluid particles in a similar configuration.« less
MRUniNovo: an efficient tool for de novo peptide sequencing utilizing the hadoop distributed computing framework.

PubMed

Li, Chuang; Chen, Tao; He, Qiang; Zhu, Yunping; Li, Kenli

2017-03-15

Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot. MRUniNovo is an open source software tool implemented in java. The source code and the parameter settings are available at http://bioinfo.hupo.org.cn/MRUniNovo/index.php. s131020002@hnu.edu.cn ; taochen1019@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

PubMed Central

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905
Efficient path-based computations on pedigree graphs with compact encodings

PubMed Central

2012-01-01

A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements. PMID:22536898
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data

NASA Astrophysics Data System (ADS)

Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.

2016-12-01

Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
Real-Time Symbol Extraction From Grey-Level Images

NASA Astrophysics Data System (ADS)

Massen, R.; Simnacher, M.; Rosch, J.; Herre, E.; Wuhrer, H. W.

1988-04-01

A VME-bus image pipeline processor for extracting vectorized contours from grey-level images in real-time is presented. This 3 Giga operation per second processor uses large kernel convolvers and new non-linear neighbourhood processing algorithms to compute true 1-pixel wide and noise-free contours without thresholding even from grey-level images with quite varying edge sharpness. The local edge orientation is used as an additional cue to compute a list of vectors describing the closed and open contours in real-time and to dump a CAD-like symbolic image description into a symbol memory at pixel clock rate.
ALMA Correlator Real-Time Data Processor

NASA Astrophysics Data System (ADS)

Pisano, J.; Amestica, R.; Perez, J.

2005-10-01

The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
MUSIDH, multiple use of simulated demographic histories, a novel method to reduce computation time in microsimulation models of infectious diseases.

PubMed

Fischer, E A J; De Vlas, S J; Richardus, J H; Habbema, J D F

2008-09-01

Microsimulation of infectious diseases requires simulation of many life histories of interacting individuals. In particular, relatively rare infections such as leprosy need to be studied in very large populations. Computation time increases disproportionally with the size of the simulated population. We present a novel method, MUSIDH, an acronym for multiple use of simulated demographic histories, to reduce computation time. Demographic history refers to the processes of birth, death and all other demographic events that should be unrelated to the natural course of an infection, thus non-fatal infections. MUSIDH attaches a fixed number of infection histories to each demographic history, and these infection histories interact as if being the infection history of separate individuals. With two examples, mumps and leprosy, we show that the method can give a factor 50 reduction in computation time at the cost of a small loss in precision. The largest reductions are obtained for rare infections with complex demographic histories.
Extending the length and time scales of Gram-Schmidt Lyapunov vector computations

NASA Astrophysics Data System (ADS)

Costa, Anthony B.; Green, Jason R.

2013-08-01

Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram-Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N2 (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram-Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard-Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram-Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.
Neural Network Training by Integration of Adjoint Systems of Equations Forward in Time

NASA Technical Reports Server (NTRS)

Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)

1999-01-01

A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically. it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved. but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. Tbc trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
Neural network training by integration of adjoint systems of equations forward in time

NASA Technical Reports Server (NTRS)

Toomarian, Nikzad (Inventor); Barhen, Jacob (Inventor)

1992-01-01

A method and apparatus for supervised neural learning of time dependent trajectories exploits the concepts of adjoint operators to enable computation of the gradient of an objective functional with respect to the various parameters of the network architecture in a highly efficient manner. Specifically, it combines the advantage of dramatic reductions in computational complexity inherent in adjoint methods with the ability to solve two adjoint systems of equations together forward in time. Not only is a large amount of computation and storage saved, but the handling of real-time applications becomes also possible. The invention has been applied it to two examples of representative complexity which have recently been analyzed in the open literature and demonstrated that a circular trajectory can be learned in approximately 200 iterations compared to the 12000 reported in the literature. A figure eight trajectory was achieved in under 500 iterations compared to 20000 previously required. The trajectories computed using our new method are much closer to the target trajectories than was reported in previous studies.
Visualization assisted by parallel processing

NASA Astrophysics Data System (ADS)

Lange, B.; Rey, H.; Vasques, X.; Puech, W.; Rodriguez, N.

2011-01-01

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two mathematical methods: Delaunay triangulation and Voronoý cells. Avis and Bhattacharya present these two algorithms in. Particles provide information on the room temperature at different coordinates over time. To locate and update particles data we define a computational cost function. To solve this function in an efficient way, we use a client server paradigm. Server computes data and client display this data on different kind of hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large flow of data. The second part presents different platforms and methods used, which was evaluated in order to determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm that formed based on located particles compared to sensors and on update of particles value. The benchmark was done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU. GPU programming method is growing in the research field; this method allows getting a real time rendering instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
Optimal processor assignment for pipeline computations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath

1991-01-01

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.

Interconnection requirements in avionic systems

NASA Astrophysics Data System (ADS)

Vergnolle, Claude; Houssay, Bruno

1991-04-01

The future aircraft generation will have thousand smart electromagnetic sensors distributed allover. Each sensor is connected with fibers links to the main-frame computer in charge of the real time signal''s correlation. Such a computer must be compactly built and massively parallel: it needs the use of 3 D optical free-space interconnect between neighbouring boards and reconfigurable interconnects via holographic backplane. The optical interconnect facilities will be also used to build fault-tolerant computer through large redundancy.
The revolution in data gathering systems

NASA Technical Reports Server (NTRS)

Cambra, J. M.; Trover, W. F.

1975-01-01

Data acquisition systems used in NASA's wind tunnels from the 1950's through the present time are summarized as a baseline for assessing the impact of minicomputers and microcomputers on data acquisition and data processing. Emphasis is placed on the cyclic evolution in computer technology which transformed the central computer system, and finally the distributed computer system. Other developments discussed include: medium scale integration, large scale integration, combining the functions of data acquisition and control, and micro and minicomputers.
Quantum computation in the analysis of hyperspectral data

NASA Astrophysics Data System (ADS)

Gomez, Richard B.; Ghoshal, Debabrata; Jayanna, Anil

2004-08-01

Recent research on the topic of quantum computation provides us with some quantum algorithms with higher efficiency and speedup compared to their classical counterparts. In this paper, it is our intent to provide the results of our investigation of several applications of such quantum algorithms - especially the Grover's Search algorithm - in the analysis of Hyperspectral Data. We found many parallels with Grover's method in existing data processing work that make use of classical spectral matching algorithms. Our efforts also included the study of several methods dealing with hyperspectral image analysis work where classical computation methods involving large data sets could be replaced with quantum computation methods. The crux of the problem in computation involving a hyperspectral image data cube is to convert the large amount of data in high dimensional space to real information. Currently, using the classical model, different time consuming methods and steps are necessary to analyze these data including: Animation, Minimum Noise Fraction Transform, Pixel Purity Index algorithm, N-dimensional scatter plot, Identification of Endmember spectra - are such steps. If a quantum model of computation involving hyperspectral image data can be developed and formalized - it is highly likely that information retrieval from hyperspectral image data cubes would be a much easier process and the final information content would be much more meaningful and timely. In this case, dimensionality would not be a curse, but a blessing.
Computational solutions to large-scale data management and analysis

PubMed Central

Schadt, Eric E.; Linderman, Michael D.; Sorenson, Jon; Lee, Lawrence; Nolan, Garry P.

2011-01-01

Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems. PMID:20717155
GPU accelerated FDTD solver and its application in MRI.

PubMed

Chi, J; Liu, F; Jin, J; Mason, D G; Crozier, S

2010-01-01

The finite difference time domain (FDTD) method is a popular technique for computational electromagnetics (CEM). The large computational power often required, however, has been a limiting factor for its applications. In this paper, we will present a graphics processing unit (GPU)-based parallel FDTD solver and its successful application to the investigation of a novel B1 shimming scheme for high-field magnetic resonance imaging (MRI). The optimized shimming scheme exhibits considerably improved transmit B(1) profiles. The GPU implementation dramatically shortened the runtime of FDTD simulation of electromagnetic field compared with its CPU counterpart. The acceleration in runtime has made such investigation possible, and will pave the way for other studies of large-scale computational electromagnetic problems in modern MRI which were previously impractical.
Architecture independent environment for developing engineering software on MIMD computers

NASA Technical Reports Server (NTRS)

Valimohamed, Karim A.; Lopez, L. A.

1990-01-01

Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.
Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

NASA Technical Reports Server (NTRS)

Long, Lyle N.; Brentner, Kenneth S.

2000-01-01

This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
Sentinel-1 data massive processing for large scale DInSAR analyses within Cloud Computing environments through the P-SBAS approach

NASA Astrophysics Data System (ADS)

Lanari, Riccardo; Bonano, Manuela; Buonanno, Sabatino; Casu, Francesco; De Luca, Claudio; Fusco, Adele; Manunta, Michele; Manzo, Mariarosaria; Pepe, Antonio; Zinno, Ivana

2017-04-01

The SENTINEL-1 (S1) mission is designed to provide operational capability for continuous mapping of the Earth thanks to its two polar-orbiting satellites (SENTINEL-1A and B) performing C-band synthetic aperture radar (SAR) imaging. It is, indeed, characterized by enhanced revisit frequency, coverage and reliability for operational services and applications requiring long SAR data time series. Moreover, SENTINEL-1 is specifically oriented to interferometry applications with stringent requirements based on attitude and orbit accuracy and it is intrinsically characterized by small spatial and temporal baselines. Consequently, SENTINEL-1 data are particularly suitable to be exploited through advanced interferometric techniques such as the well-known DInSAR algorithm referred to as Small BAseline Subset (SBAS), which allows the generation of deformation time series and displacement velocity maps. In this work we present an advanced interferometric processing chain, based on the Parallel SBAS (P-SBAS) approach, for the massive processing of S1 Interferometric Wide Swath (IWS) data aimed at generating deformation time series in efficient, automatic and systematic way. Such a DInSAR chain is designed to exploit distributed computing infrastructures, and more specifically Cloud Computing environments, to properly deal with the storage and the processing of huge S1 datasets. In particular, since S1 IWS data are acquired with the innovative Terrain Observation with Progressive Scans (TOPS) mode, we could benefit from the structure of S1 data, which are composed by bursts that can be considered as separate acquisitions. Indeed, the processing is intrinsically parallelizable with respect to such independent input data and therefore we basically exploited this coarse granularity parallelization strategy in the majority of the steps of the SBAS processing chain. Moreover, we also implemented more sophisticated parallelization approaches, exploiting both multi-node and multi-core programming techniques. Currently, Cloud Computing environments make available large collections of computing resources and storage that can be effectively exploited through the presented S1 P-SBAS processing chain to carry out interferometric analyses at a very large scale, in reduced time. This allows us to deal also with the problems connected to the use of S1 P-SBAS chain in operational contexts, related to hazard monitoring and risk prevention and mitigation, where handling large amounts of data represents a challenging task. As a significant experimental result we performed a large spatial scale SBAS analysis relevant to the Central and Southern Italy by exploiting the Amazon Web Services Cloud Computing platform. In particular, we processed in parallel 300 S1 acquisitions covering the Italian peninsula from Lazio to Sicily through the presented S1 P-SBAS processing chain, generating 710 interferograms, thus finally obtaining the displacement time series of the whole processed area. This work has been partially supported by the CNR-DPC agreement, the H2020 EPOS-IP project (GA 676564) and the ESA GEP project.
Computational Modeling and Real-Time Control of Patient-Specific Laser Treatment of Cancer

PubMed Central

Fuentes, D.; Oden, J. T.; Diller, K. R.; Hazle, J. D.; Elliott, A.; Shetty, A.; Stafford, R. J.

2014-01-01

An adaptive feedback control system is presented which employs a computational model of bioheat transfer in living tissue to guide, in real-time, laser treatments of prostate cancer monitored by magnetic resonance thermal imaging (MRTI). The system is built on what can be referred to as cyberinfrastructure - a complex structure of high-speed network, large-scale parallel computing devices, laser optics, imaging, visualizations, inverse-analysis algorithms, mesh generation, and control systems that guide laser therapy to optimally control the ablation of cancerous tissue. The computational system has been successfully tested on in-vivo, canine prostate. Over the course of an 18 minute laser induced thermal therapy (LITT) performed at M.D. Anderson Cancer Center (MDACC) in Houston, Texas, the computational models were calibrated to intra-operative real time thermal imaging treatment data and the calibrated models controlled the bioheat transfer to within 5°C of the predetermined treatment plan. The computational arena is in Austin, Texas and managed at the Institute for Computational Engineering and Sciences (ICES). The system is designed to control the bioheat transfer remotely while simultaneously providing real-time remote visualization of the on-going treatment. Post operative histology of the canine prostate reveal that the damage region was within the targeted 1.2cm diameter treatment objective. PMID:19148754
Computational modeling and real-time control of patient-specific laser treatment of cancer.

PubMed

Fuentes, D; Oden, J T; Diller, K R; Hazle, J D; Elliott, A; Shetty, A; Stafford, R J

2009-04-01

An adaptive feedback control system is presented which employs a computational model of bioheat transfer in living tissue to guide, in real-time, laser treatments of prostate cancer monitored by magnetic resonance thermal imaging. The system is built on what can be referred to as cyberinfrastructure-a complex structure of high-speed network, large-scale parallel computing devices, laser optics, imaging, visualizations, inverse-analysis algorithms, mesh generation, and control systems that guide laser therapy to optimally control the ablation of cancerous tissue. The computational system has been successfully tested on in vivo, canine prostate. Over the course of an 18 min laser-induced thermal therapy performed at M.D. Anderson Cancer Center (MDACC) in Houston, Texas, the computational models were calibrated to intra-operative real-time thermal imaging treatment data and the calibrated models controlled the bioheat transfer to within 5 degrees C of the predetermined treatment plan. The computational arena is in Austin, Texas and managed at the Institute for Computational Engineering and Sciences (ICES). The system is designed to control the bioheat transfer remotely while simultaneously providing real-time remote visualization of the on-going treatment. Post-operative histology of the canine prostate reveal that the damage region was within the targeted 1.2 cm diameter treatment objective.
Optimization of Angular-Momentum Biases of Reaction Wheels

NASA Technical Reports Server (NTRS)

Lee, Clifford; Lee, Allan

2008-01-01

RBOT [RWA Bias Optimization Tool (wherein RWA signifies Reaction Wheel Assembly )] is a computer program designed for computing angular momentum biases for reaction wheels used for providing spacecraft pointing in various directions as required for scientific observations. RBOT is currently deployed to support the Cassini mission to prevent operation of reaction wheels at unsafely high speeds while minimizing time in undesirable low-speed range, where elasto-hydrodynamic lubrication films in bearings become ineffective, leading to premature bearing failure. The problem is formulated as a constrained optimization problem in which maximum wheel speed limit is a hard constraint and a cost functional that increases as speed decreases below a low-speed threshold. The optimization problem is solved using a parametric search routine known as the Nelder-Mead simplex algorithm. To increase computational efficiency for extended operation involving large quantity of data, the algorithm is designed to (1) use large time increments during intervals when spacecraft attitudes or rates of rotation are nearly stationary, (2) use sinusoidal-approximation sampling to model repeated long periods of Earth-point rolling maneuvers to reduce computational loads, and (3) utilize an efficient equation to obtain wheel-rate profiles as functions of initial wheel biases based on conservation of angular momentum (in an inertial frame) using pre-computed terms.
Solving large mixed linear models using preconditioned conjugate gradient iteration.

PubMed

Strandén, I; Lidauer, M

1999-12-01

Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.
Speeding Up Ecological and Evolutionary Computations in R; Essentials of High Performance Computing for Biologists

PubMed Central

Visser, Marco D.; McMahon, Sean M.; Merow, Cory; Dixon, Philip M.; Record, Sydne; Jongejans, Eelke

2015-01-01

Computation has become a critical component of research in biology. A risk has emerged that computational and programming challenges may limit research scope, depth, and quality. We review various solutions to common computational efficiency problems in ecological and evolutionary research. Our review pulls together material that is currently scattered across many sources and emphasizes those techniques that are especially effective for typical ecological and environmental problems. We demonstrate how straightforward it can be to write efficient code and implement techniques such as profiling or parallel computing. We supply a newly developed R package (aprof) that helps to identify computational bottlenecks in R code and determine whether optimization can be effective. Our review is complemented by a practical set of examples and detailed Supporting Information material (S1–S3 Texts) that demonstrate large improvements in computational speed (ranging from 10.5 times to 14,000 times faster). By improving computational efficiency, biologists can feasibly solve more complex tasks, ask more ambitious questions, and include more sophisticated analyses in their research. PMID:25811842
InfoSymbiotics/DDDAS - The power of Dynamic Data Driven Applications Systems for New Capabilities in Environmental -, Geo-, and Space- Sciences

NASA Astrophysics Data System (ADS)

Darema, F.

2016-12-01

InfoSymbiotics/DDDAS embodies the power of Dynamic Data Driven Applications Systems (DDDAS), a concept whereby an executing application model is dynamically integrated, in a feed-back loop, with the real-time data-acquisition and control components, as well as other data sources of the application system. Advanced capabilities can be created through such new computational approaches in modeling and simulations, and in instrumentation methods, and include: enhancing the accuracy of the application model; speeding-up the computation to allow faster and more comprehensive models of a system, and create decision support systems with the accuracy of full-scale simulations; in addition, the notion of controlling instrumentation processes by the executing application results in more efficient management of application-data and addresses challenges of how to architect and dynamically manage large sets of heterogeneous sensors and controllers, an advance over the static and ad-hoc ways of today - with DDDAS these sets of resources can be managed adaptively and in optimized ways. Large-Scale-Dynamic-Data encompasses the next wave of Big Data, and namely dynamic data arising from ubiquitous sensing and control in engineered, natural, and societal systems, through multitudes of heterogeneous sensors and controllers instrumenting these systems, and where opportunities and challenges at these "large-scales" relate not only to data size but the heterogeneity in data, data collection modalities, fidelities, and timescales, ranging from real-time data to archival data. In tandem with this important dimension of dynamic data, there is an extended view of Big Computing, which includes the collective computing by networked assemblies of multitudes of sensors and controllers, this range from the high-end to the real-time seamlessly integrated and unified, and comprising the Large-Scale-Big-Computing. InfoSymbiotics/DDDAS engenders transformative impact in many application domains, ranging from the nano-scale to the terra-scale and to the extra-terra-scale. The talk will address opportunities for new capabilities together with corresponding research challenges, with illustrative examples from several application areas including environmental sciences, geosciences, and space sciences.
An efficient two-stage approach for image-based FSI analysis of atherosclerotic arteries

PubMed Central

Rayz, Vitaliy L.; Mofrad, Mohammad R. K.; Saloner, David

2010-01-01

Patient-specific biomechanical modeling of atherosclerotic arteries has the potential to aid clinicians in characterizing lesions and determining optimal treatment plans. To attain high levels of accuracy, recent models use medical imaging data to determine plaque component boundaries in three dimensions, and fluid–structure interaction is used to capture mechanical loading of the diseased vessel. As the plaque components and vessel wall are often highly complex in shape, constructing a suitable structured computational mesh is very challenging and can require a great deal of time. Models based on unstructured computational meshes require relatively less time to construct and are capable of accurately representing plaque components in three dimensions. These models unfortunately require additional computational resources and computing time for accurate and meaningful results. A two-stage modeling strategy based on unstructured computational meshes is proposed to achieve a reasonable balance between meshing difficulty and computational resource and time demand. In this method, a coarsegrained simulation of the full arterial domain is used to guide and constrain a fine-scale simulation of a smaller region of interest within the full domain. Results for a patient-specific carotid bifurcation model demonstrate that the two-stage approach can afford a large savings in both time for mesh generation and time and resources needed for computation. The effects of solid and fluid domain truncation were explored, and were shown to minimally affect accuracy of the stress fields predicted with the two-stage approach. PMID:19756798
Consequence modeling using the fire dynamics simulator.

PubMed

Ryder, Noah L; Sutula, Jason A; Schemel, Christopher F; Hamer, Andrew J; Van Brunt, Vincent

2004-11-11

The use of Computational Fluid Dynamics (CFD) and in particular Large Eddy Simulation (LES) codes to model fires provides an efficient tool for the prediction of large-scale effects that include plume characteristics, combustion product dispersion, and heat effects to adjacent objects. This paper illustrates the strengths of the Fire Dynamics Simulator (FDS), an LES code developed by the National Institute of Standards and Technology (NIST), through several small and large-scale validation runs and process safety applications. The paper presents two fire experiments--a small room fire and a large (15 m diameter) pool fire. The model results are compared to experimental data and demonstrate good agreement between the models and data. The validation work is then extended to demonstrate applicability to process safety concerns by detailing a model of a tank farm fire and a model of the ignition of a gaseous fuel in a confined space. In this simulation, a room was filled with propane, given time to disperse, and was then ignited. The model yields accurate results of the dispersion of the gas throughout the space. This information can be used to determine flammability and explosive limits in a space and can be used in subsequent models to determine the pressure and temperature waves that would result from an explosion. The model dispersion results were compared to an experiment performed by Factory Mutual. Using the above examples, this paper will demonstrate that FDS is ideally suited to build realistic models of process geometries in which large scale explosion and fire failure risks can be evaluated with several distinct advantages over more traditional CFD codes. Namely transient solutions to fire and explosion growth can be produced with less sophisticated hardware (lower cost) than needed for traditional CFD codes (PC type computer verses UNIX workstation) and can be solved for longer time histories (on the order of hundreds of seconds of computed time) with minimal computer resources and length of model run. Additionally results that are produced can be analyzed, viewed, and tabulated during and following a model run within a PC environment. There are some tradeoffs, however, as rapid computations in PC's may require a sacrifice in the grid resolution or in the sub-grid modeling, depending on the size of the geometry modeled.
ELT-scale Adaptive Optics real-time control with thes Intel Xeon Phi Many Integrated Core Architecture

NASA Astrophysics Data System (ADS)

Jenkins, David R.; Basden, Alastair; Myers, Richard M.

2018-05-01

We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Recent Developments in the Application of Biologically Inspired Computation to Chemical Sensing

NASA Astrophysics Data System (ADS)

Marco, S.; Gutierrez-Gálvez, A.

2009-05-01

Biological olfaction outperforms chemical instrumentation in specificity, response time, detection limit, coding capacity, time stability, robustness, size, power consumption, and portability. This biological function provides outstanding performance due, to a large extent, to the unique architecture of the olfactory pathway, which combines a high degree of redundancy, an efficient combinatorial coding along with unmatched chemical information processing mechanisms. The last decade has witnessed important advances in the understanding of the computational primitives underlying the functioning of the olfactory system. In this work, the state of the art concerning biologically inspired computation for chemical sensing will be reviewed. Instead of reviewing the whole body of computational neuroscience of olfaction, we restrict this review to the application of models to the processing of real chemical sensor data.
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

PubMed

Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

PubMed Central

Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285

An efficient, explicit finite-rate algorithm to compute flows in chemical nonequilibrium

NASA Technical Reports Server (NTRS)

Palmer, Grant

1989-01-01

An explicit finite-rate code was developed to compute hypersonic viscous chemically reacting flows about three-dimensional bodies. Equations describing the finite-rate chemical reactions were fully coupled to the gas dynamic equations using a new coupling technique. The new technique maintains stability in the explicit finite-rate formulation while permitting relatively large global time steps.
Guide to making time-lapse graphics using the facilities of the National Magnetic Fusion Energy Computing Center

DOE Office of Scientific and Technical Information (OSTI.GOV)

Munro, J.K. Jr.

1980-05-01

The advent of large, fast computers has opened the way to modeling more complex physical processes and to handling very large quantities of experimental data. The amount of information that can be processed in a short period of time is so great that use of graphical displays assumes greater importance as a means of displaying this information. Information from dynamical processes can be displayed conveniently by use of animated graphics. This guide presents the basic techniques for generating black and white animated graphics, with consideration of aesthetic, mechanical, and computational problems. The guide is intended for use by someone whomore » wants to make movies on the National Magnetic Fusion Energy Computing Center (NMFECC) CDC-7600. Problems encountered by a geographically remote user are given particular attention. Detailed information is given that will allow a remote user to do some file checking and diagnosis before giving graphics files to the system for processing into film in order to spot problems without having to wait for film to be delivered. Source listings of some useful software are given in appendices along with descriptions of how to use it. 3 figures, 5 tables.« less
Massive Cloud Computing Processing of P-SBAS Time Series for Displacement Analyses at Large Spatial Scale

NASA Astrophysics Data System (ADS)

Casu, F.; de Luca, C.; Lanari, R.; Manunta, M.; Zinno, I.

2016-12-01

A methodology for computing surface deformation time series and mean velocity maps of large areas is presented. Our approach relies on the availability of a multi-temporal set of Synthetic Aperture Radar (SAR) data collected from ascending and descending orbits over an area of interest, and also permits to estimate the vertical and horizontal (East-West) displacement components of the Earth's surface. The adopted methodology is based on an advanced Cloud Computing implementation of the Differential SAR Interferometry (DInSAR) Parallel Small Baseline Subset (P-SBAS) processing chain which allows the unsupervised processing of large SAR data volumes, from the raw data (level-0) imagery up to the generation of DInSAR time series and maps. The presented solution, which is highly scalable, has been tested on the ascending and descending ENVISAT SAR archives, which have been acquired over a large area of Southern California (US) that extends for about 90.000 km2. Such an input dataset has been processed in parallel by exploiting 280 computing nodes of the Amazon Web Services Cloud environment. Moreover, to produce the final mean deformation velocity maps of the vertical and East-West displacement components of the whole investigated area, we took also advantage of the information available from external GPS measurements that permit to account for possible regional trends not easily detectable by DInSAR and to refer the P-SBAS measurements to an external geodetic datum. The presented results clearly demonstrate the effectiveness of the proposed approach that paves the way to the extensive use of the available ERS and ENVISAT SAR data archives. Furthermore, the proposed methodology can be particularly suitable to deal with the very huge data flow provided by the Sentinel-1 constellation, thus permitting to extend the DInSAR analyses at a nearly global scale. This work is partially supported by: the DPC-CNR agreement, the EPOS-IP project and the ESA GEP project.
Montage Version 3.0

NASA Technical Reports Server (NTRS)

Jacob, Joseph; Katz, Daniel; Prince, Thomas; Berriman, Graham; Good, John; Laity, Anastasia

2006-01-01

The final version (3.0) of the Montage software has been released. To recapitulate from previous NASA Tech Briefs articles about Montage: This software generates custom, science-grade mosaics of astronomical images on demand from input files that comply with the Flexible Image Transport System (FITS) standard and contain image data registered on projections that comply with the World Coordinate System (WCS) standards. This software can be executed on single-processor computers, multi-processor computers, and such networks of geographically dispersed computers as the National Science Foundation s TeraGrid or NASA s Information Power Grid. The primary advantage of running Montage in a grid environment is that computations can be done on a remote supercomputer for efficiency. Multiple computers at different sites can be used for different parts of a computation a significant advantage in cases of computations for large mosaics that demand more processor time than is available at any one site. Version 3.0 incorporates several improvements over prior versions. The most significant improvement is that this version is accessible to scientists located anywhere, through operational Web services that provide access to data from several large astronomical surveys and construct mosaics on either local workstations or remote computational grids as needed.
Scalable Parallel Distance Field Construction for Large-Scale Applications.

PubMed

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H

2015-10-01

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
Rotordynamics on the PC: Transient Analysis With ARDS

NASA Technical Reports Server (NTRS)

Fleming, David P.

1997-01-01

Personal computers can now do many jobs that formerly required a large mainframe computer. An example is NASA Lewis Research Center's program Analysis of RotorDynamic Systems (ARDS), which uses the component mode synthesis method to analyze the dynamic motion of up to five rotating shafts. As originally written in the early 1980's, this program was considered large for the mainframe computers of the time. ARDS, which was written in Fortran 77, has been successfully ported to a 486 personal computer. Plots appear on the computer monitor via calls programmed for the original CALCOMP plotter; plots can also be output on a standard laser printer. The executable code, which uses the full array sizes of the mainframe version, easily fits on a high-density floppy disk. The program runs under DOS with an extended memory manager. In addition to transient analysis of blade loss, step turns, and base acceleration, with simulation of squeeze-film dampers and rubs, ARDS calculates natural frequencies and unbalance response.
On the Relevancy of Efficient, Integrated Computer and Network Monitoring in HEP Distributed Online Environment

NASA Astrophysics Data System (ADS)

Carvalho, D.; Gavillet, Ph.; Delgado, V.; Albert, J. N.; Bellas, N.; Javello, J.; Miere, Y.; Ruffinoni, D.; Smith, G.

Large Scientific Equipments are controlled by Computer Systems whose complexity is growing driven, on the one hand by the volume and variety of the information, its distributed nature, the sophistication of its treatment and, on the other hand by the fast evolution of the computer and network market. Some people call them genetically Large-Scale Distributed Data Intensive Information Systems or Distributed Computer Control Systems (DCCS) for those systems dealing more with real time control. Taking advantage of (or forced by) the distributed architecture, the tasks are more and more often implemented as Client-Server applications. In this framework the monitoring of the computer nodes, the communications network and the applications becomes of primary importance for ensuring the safe running and guaranteed performance of the system. With the future generation of HEP experiments, such as those at the LHC in view, it is proposed to integrate the various functions of DCCS monitoring into one general purpose Multi-layer System.
The Cell Collective: Toward an open and collaborative approach to systems biology

PubMed Central

2012-01-01

Background Despite decades of new discoveries in biomedical research, the overwhelming complexity of cells has been a significant barrier to a fundamental understanding of how cells work as a whole. As such, the holistic study of biochemical pathways requires computer modeling. Due to the complexity of cells, it is not feasible for one person or group to model the cell in its entirety. Results The Cell Collective is a platform that allows the world-wide scientific community to create these models collectively. Its interface enables users to build and use models without specifying any mathematical equations or computer code - addressing one of the major hurdles with computational research. In addition, this platform allows scientists to simulate and analyze the models in real-time on the web, including the ability to simulate loss/gain of function and test what-if scenarios in real time. Conclusions The Cell Collective is a web-based platform that enables laboratory scientists from across the globe to collaboratively build large-scale models of various biological processes, and simulate/analyze them in real time. In this manuscript, we show examples of its application to a large-scale model of signal transduction. PMID:22871178
Theory and computation of optimal low- and medium-thrust transfers

NASA Technical Reports Server (NTRS)

Chuang, C.-H.

1994-01-01

This report presents two numerical methods considered for the computation of fuel-optimal, low-thrust orbit transfers in large numbers of burns. The origins of these methods are observations made with the extremal solutions of transfers in small numbers of burns; there seems to exist a trend such that the longer the time allowed to perform an optimal transfer the less fuel that is used. These longer transfers are obviously of interest since they require a motor of low thrust; however, we also find a trend that the longer the time allowed to perform the optimal transfer the more burns are required to satisfy optimality. Unfortunately, this usually increases the difficulty of computation. Both of the methods described use small-numbered burn solutions to determine solutions in large numbers of burns. One method is a homotopy method that corrects for problems that arise when a solution requires a new burn or coast arc for optimality. The other method is to simply patch together long transfers from smaller ones. An orbit correction problem is solved to develop this method. This method may also lead to a good guidance law for transfer orbits with long transfer times.
Knowledge representation by connection matrices: A method for the on-board implementation of large expert systems

NASA Technical Reports Server (NTRS)

Kellner, A.

1987-01-01

Extremely large knowledge sources and efficient knowledge access characterizing future real-life artificial intelligence applications represent crucial requirements for on-board artificial intelligence systems due to obvious computer time and storage constraints on spacecraft. A type of knowledge representation and corresponding reasoning mechanism is proposed which is particularly suited for the efficient processing of such large knowledge bases in expert systems.
The Interplanetary Meteoroid Environment for eXploration

NASA Astrophysics Data System (ADS)

Soja, R.; Sommer, M.; Srama, R.; Strub, P.; Grün, E.; Rodmann, J.; Vaubaillon, J.; Hornig, A.; Bausch, L.

2014-07-01

The Interplanetary Meteoroid Environment for eXploration (IMEX) project, funded by the European Space Agency (ESA), aims to characterize dust trails and streams produced by comets in the inner solar system. The goal is to predict meteor showers at any position or time in the solar system, such as at specific spacecraft or planets. This model will allow for the assessment of the dust impact hazard to spacecraft, which is important because hypervelocity impacts of micrometeoroids can damage or destroy spacecraft or their subsystems through physical damage or electromagnetic effects. Such considerations are particularly important in the context of human exploration of the solar system. Additionally, such a model will allow for scientific study of specific trails and their connections to observed dust phenomena, such as cometary trails and new meteor showers at Earth. We have recently expanded the model to include explicit integrations of large numbers of particles from each comet, utilizing the Constellation platform to perform the calculations. This is a distributed computing system, where currently 10,000 users are donating their idle computing time at home and thus generating a virtual supercomputer of 40,000 host PCs connected via the Internet (aerospaceresearch.net). This form of citizen science provides the required computing performance for simulating millions of particles ejected by each of the ˜400 comets, while developing the relationship between scientists and the general public. The result will be a unique set of saved orbital information for a large number of cometary streams, allowing efficient computation of their locations at any point in space and time. Here we will present the results from several test streams and discuss the progress towards obtaining the full set of integrated particles for each of the selected ˜400 short-period comets. individual Constellation users for their computing time.
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment.

PubMed

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment

PubMed Central

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
Efficient Constant-Time Complexity Algorithm for Stochastic Simulation of Large Reaction Networks.

PubMed

Thanh, Vo Hong; Zunino, Roberto; Priami, Corrado

2017-01-01

Exact stochastic simulation is an indispensable tool for a quantitative study of biochemical reaction networks. The simulation realizes the time evolution of the model by randomly choosing a reaction to fire and update the system state according to a probability that is proportional to the reaction propensity. Two computationally expensive tasks in simulating large biochemical networks are the selection of next reaction firings and the update of reaction propensities due to state changes. We present in this work a new exact algorithm to optimize both of these simulation bottlenecks. Our algorithm employs the composition-rejection on the propensity bounds of reactions to select the next reaction firing. The selection of next reaction firings is independent of the number reactions while the update of propensities is skipped and performed only when necessary. It therefore provides a favorable scaling for the computational complexity in simulating large reaction networks. We benchmark our new algorithm with the state of the art algorithms available in literature to demonstrate its applicability and efficiency.
The rid-redundant procedure in C-Prolog

NASA Technical Reports Server (NTRS)

Chen, Huo-Yan; Wah, Benjamin W.

1987-01-01

C-Prolog can conveniently be used for logical inferences on knowledge bases. However, as similar to many search methods using backward chaining, a large number of redundant computation may be produced in recursive calls. To overcome this problem, the 'rid-redundant' procedure was designed to rid all redundant computations in running multi-recursive procedures. Experimental results obtained for C-Prolog on the Vax 11/780 computer show that there is an order of magnitude improvement in the running time and solvable problem size.
Fast numerical methods for simulating large-scale integrate-and-fire neuronal networks.

PubMed

Rangan, Aaditya V; Cai, David

2007-02-01

We discuss numerical methods for simulating large-scale, integrate-and-fire (I&F) neuronal networks. Important elements in our numerical methods are (i) a neurophysiologically inspired integrating factor which casts the solution as a numerically tractable integral equation, and allows us to obtain stable and accurate individual neuronal trajectories (i.e., voltage and conductance time-courses) even when the I&F neuronal equations are stiff, such as in strongly fluctuating, high-conductance states; (ii) an iterated process of spike-spike corrections within groups of strongly coupled neurons to account for spike-spike interactions within a single large numerical time-step; and (iii) a clustering procedure of firing events in the network to take advantage of localized architectures, such as spatial scales of strong local interactions, which are often present in large-scale computational models-for example, those of the primary visual cortex. (We note that the spike-spike corrections in our methods are more involved than the correction of single neuron spike-time via a polynomial interpolation as in the modified Runge-Kutta methods commonly used in simulations of I&F neuronal networks.) Our methods can evolve networks with relatively strong local interactions in an asymptotically optimal way such that each neuron fires approximately once in [Formula: see text] operations, where N is the number of neurons in the system. We note that quantifications used in computational modeling are often statistical, since measurements in a real experiment to characterize physiological systems are typically statistical, such as firing rate, interspike interval distributions, and spike-triggered voltage distributions. We emphasize that it takes much less computational effort to resolve statistical properties of certain I&F neuronal networks than to fully resolve trajectories of each and every neuron within the system. For networks operating in realistic dynamical regimes, such as strongly fluctuating, high-conductance states, our methods are designed to achieve statistical accuracy when very large time-steps are used. Moreover, our methods can also achieve trajectory-wise accuracy when small time-steps are used.
GPU-computing in econophysics and statistical physics

NASA Astrophysics Data System (ADS)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Need for speed: An optimized gridding approach for spatially explicit disease simulations.

PubMed

Sellman, Stefan; Tsao, Kimberly; Tildesley, Michael J; Brommesson, Peter; Webb, Colleen T; Wennergren, Uno; Keeling, Matt J; Lindström, Tom

2018-04-01

Numerical models for simulating outbreaks of infectious diseases are powerful tools for informing surveillance and control strategy decisions. However, large-scale spatially explicit models can be limited by the amount of computational resources they require, which poses a problem when multiple scenarios need to be explored to provide policy recommendations. We introduce an easily implemented method that can reduce computation time in a standard Susceptible-Exposed-Infectious-Removed (SEIR) model without introducing any further approximations or truncations. It is based on a hierarchical infection process that operates on entire groups of spatially related nodes (cells in a grid) in order to efficiently filter out large volumes of susceptible nodes that would otherwise have required expensive calculations. After the filtering of the cells, only a subset of the nodes that were originally at risk are then evaluated for actual infection. The increase in efficiency is sensitive to the exact configuration of the grid, and we describe a simple method to find an estimate of the optimal configuration of a given landscape as well as a method to partition the landscape into a grid configuration. To investigate its efficiency, we compare the introduced methods to other algorithms and evaluate computation time, focusing on simulated outbreaks of foot-and-mouth disease (FMD) on the farm population of the USA, the UK and Sweden, as well as on three randomly generated populations with varying degree of clustering. The introduced method provided up to 500 times faster calculations than pairwise computation, and consistently performed as well or better than other available methods. This enables large scale, spatially explicit simulations such as for the entire continental USA without sacrificing realism or predictive power.
Need for speed: An optimized gridding approach for spatially explicit disease simulations

PubMed Central

Tildesley, Michael J.; Brommesson, Peter; Webb, Colleen T.; Wennergren, Uno; Lindström, Tom

2018-01-01

Numerical models for simulating outbreaks of infectious diseases are powerful tools for informing surveillance and control strategy decisions. However, large-scale spatially explicit models can be limited by the amount of computational resources they require, which poses a problem when multiple scenarios need to be explored to provide policy recommendations. We introduce an easily implemented method that can reduce computation time in a standard Susceptible-Exposed-Infectious-Removed (SEIR) model without introducing any further approximations or truncations. It is based on a hierarchical infection process that operates on entire groups of spatially related nodes (cells in a grid) in order to efficiently filter out large volumes of susceptible nodes that would otherwise have required expensive calculations. After the filtering of the cells, only a subset of the nodes that were originally at risk are then evaluated for actual infection. The increase in efficiency is sensitive to the exact configuration of the grid, and we describe a simple method to find an estimate of the optimal configuration of a given landscape as well as a method to partition the landscape into a grid configuration. To investigate its efficiency, we compare the introduced methods to other algorithms and evaluate computation time, focusing on simulated outbreaks of foot-and-mouth disease (FMD) on the farm population of the USA, the UK and Sweden, as well as on three randomly generated populations with varying degree of clustering. The introduced method provided up to 500 times faster calculations than pairwise computation, and consistently performed as well or better than other available methods. This enables large scale, spatially explicit simulations such as for the entire continental USA without sacrificing realism or predictive power. PMID:29624574

Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Large-Scale Sentinel-1 Processing for Solid Earth Science and Urgent Response using Cloud Computing and Machine Learning

NASA Astrophysics Data System (ADS)

Hua, H.; Owen, S. E.; Yun, S. H.; Agram, P. S.; Manipon, G.; Starch, M.; Sacco, G. F.; Bue, B. D.; Dang, L. B.; Linick, J. P.; Malarout, N.; Rosen, P. A.; Fielding, E. J.; Lundgren, P.; Moore, A. W.; Liu, Z.; Farr, T.; Webb, F.; Simons, M.; Gurrola, E. M.

2017-12-01

With the increased availability of open SAR data (e.g. Sentinel-1 A/B), new challenges are being faced with processing and analyzing the voluminous SAR datasets to make geodetic measurements. Upcoming SAR missions such as NISAR are expected to generate close to 100TB per day. The Advanced Rapid Imaging and Analysis (ARIA) project can now generate geocoded unwrapped phase and coherence products from Sentinel-1 TOPS mode data in an automated fashion, using the ISCE software. This capability is currently being exercised on various study sites across the United States and around the globe, including Hawaii, Central California, Iceland and South America. The automated and large-scale SAR data processing and analysis capabilities use cloud computing techniques to speed the computations and provide scalable processing power and storage. Aspects such as how to processing these voluminous SLCs and interferograms at global scales, keeping up with the large daily SAR data volumes, and how to handle the voluminous data rates are being explored. Scene-partitioning approaches in the processing pipeline help in handling global-scale processing up to unwrapped interferograms with stitching done at a late stage. We have built an advanced science data system with rapid search functions to enable access to the derived data products. Rapid image processing of Sentinel-1 data to interferograms and time series is already being applied to natural hazards including earthquakes, floods, volcanic eruptions, and land subsidence due to fluid withdrawal. We will present the status of the ARIA science data system for generating science-ready data products and challenges that arise from being able to process SAR datasets to derived time series data products at large scales. For example, how do we perform large-scale data quality screening on interferograms? What approaches can be used to minimize compute, storage, and data movement costs for time series analysis in the cloud? We will also present some of our findings from applying machine learning and data analytics on the processed SAR data streams. We will also present lessons learned on how to ease the SAR community onto interfacing with these cloud-based SAR science data systems.
The short time Fourier transform and local signals

NASA Astrophysics Data System (ADS)

Okumura, Shuhei

In this thesis, I examine the theoretical properties of the short time discrete Fourier transform (STFT). The STFT is obtained by applying the Fourier transform by a fixed-sized, moving window to input series. We move the window by one time point at a time, so we have overlapping windows. I present several theoretical properties of the STFT, applied to various types of complex-valued, univariate time series inputs, and their outputs in closed forms. In particular, just like the discrete Fourier transform, the STFT's modulus time series takes large positive values when the input is a periodic signal. One main point is that a white noise time series input results in the STFT output being a complex-valued stationary time series and we can derive the time and time-frequency dependency structure such as the cross-covariance functions. Our primary focus is the detection of local periodic signals. I present a method to detect local signals by computing the probability that the squared modulus STFT time series has consecutive large values exceeding some threshold after one exceeding observation following one observation less than the threshold. We discuss a method to reduce the computation of such probabilities by the Box-Cox transformation and the delta method, and show that it works well in comparison to the Monte Carlo simulation method.
Near-realtime simulations of biolelectric activity in small mammalian hearts using graphical processing units

PubMed Central

Vigmond, Edward J.; Boyle, Patrick M.; Leon, L. Joshua; Plank, Gernot

2014-01-01

Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time. PMID:19964295
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
A Modular Environment for Geophysical Inversion and Run-time Autotuning using Heterogeneous Computing Systems

NASA Astrophysics Data System (ADS)

Myre, Joseph M.

Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
A Comparison of Missing-Data Procedures for Arima Time-Series Analysis

ERIC Educational Resources Information Center

Velicer, Wayne F.; Colby, Suzanne M.

2005-01-01

Missing data are a common practical problem for longitudinal designs. Time-series analysis is a longitudinal method that involves a large number of observations on a single unit. Four different missing-data methods (deletion, mean substitution, mean of adjacent observations, and maximum likelihood estimation) were evaluated. Computer-generated…
Television viewing and physical activity among Latino children

USDA-ARS?s Scientific Manuscript database

Watching television and using other forms of media such as video games, computers, print, music and movies takes up a surprisingly large amount of our children’s time. U.S. children spend more time watching television than any other activity except sleep. According to a recent nationwide report on c...
PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

PubMed Central

2014-01-01

Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. PMID:24894600
PVT: an efficient computational procedure to speed up next-generation sequence analysis.

PubMed

Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur

2014-06-04

High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks.

PubMed

Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro

2010-04-21

The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Airloads on Bluff Bodies, with Application to the Rotor-Induced Downloads on Tilt-Rotor Aircraft.

DTIC Science & Technology

1983-09-01

interference aerodynamics would be tion on hover performance (Ref. (11). to study the two-dimensional sec- tion characteristics of a wing in the wake of a...resources for large numbers of vortices; a typical case requires 10-15 min CPU time on the Ames Cray IS computer. Figure 6 shows a typical result. Here...CPU time per case on a Prime 550UPPER SURFACE (WINDWARD) computer to converge to a steady solution; this would be equivalent to one or two seconds on
The economics of time shared computing: Congestion, user costs and capacity

NASA Technical Reports Server (NTRS)

Agnew, C. E.

1982-01-01

Time shared systems permit the fixed costs of computing resources to be spread over large numbers of users. However, bottleneck results in the theory of closed queueing networks can be used to show that this economy of scale will be offset by the increased congestion that results as more users are added to the system. If one considers the total costs, including the congestion cost, there is an optimal number of users for a system which equals the saturation value usually used to define system capacity.
Large-Scale Calculations for Material Sciences Using Accelerators to Improve Time- and Energy-to-Solution

DOE PAGES

Eisenbach, Markus

2017-01-01

A major impediment to deploying next-generation high-performance computational systems is the required electrical power, often measured in units of megawatts. The solution to this problem is driving the introduction of novel machine architectures, such as those employing many-core processors and specialized accelerators. In this article, we describe the use of a hybrid accelerated architecture to achieve both reduced time to solution and the associated reduction in the electrical cost for a state-of-the-art materials science computation.
Numerical solutions for patterns statistics on Markov chains.

PubMed

Nuel, Gregory

2006-01-01

We propose here a review of the methods available to compute pattern statistics on text generated by a Markov source. Theoretical, but also numerical aspects are detailed for a wide range of techniques (exact, Gaussian, large deviations, binomial and compound Poisson). The SPatt package (Statistics for Pattern, free software available at http://stat.genopole.cnrs.fr/spatt) implementing all these methods is then used to compare all these approaches in terms of computational time and reliability in the most complete pattern statistics benchmark available at the present time.
Systolic Processor Array For Recognition Of Spectra

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Peterson, John C.

1995-01-01

Spectral signatures of materials detected and identified quickly. Spectral Analysis Systolic Processor Array (SPA2) relatively inexpensive and satisfies need to analyze large, complex volume of multispectral data generated by imaging spectrometers to extract desired information: computational performance needed to do this in real time exceeds that of current supercomputers. Locates highly similar segments or contiguous subsegments in two different spectra at time. Compares sampled spectra from instruments with data base of spectral signatures of known materials. Computes and reports scores that express degrees of similarity between sampled and data-base spectra.
Merging the Machines of Modern Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Laura; Collins, Jim

Two recent projects have harnessed supercomputing resources at the US Department of Energy’s Argonne National Laboratory in a novel way to support major fusion science and particle collider experiments. Using leadership computing resources, one team ran fine-grid analysis of real-time data to make near-real-time adjustments to an ongoing experiment, while a second team is working to integrate Argonne’s supercomputers into the Large Hadron Collider/ATLAS workflow. Together these efforts represent a new paradigm of the high-performance computing center as a partner in experimental science.
Biocellion: accelerating computer simulation of multicellular biological system models

PubMed Central

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-01-01

Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
Computational and experimental studies of LEBUs at high device Reynolds numbers

NASA Technical Reports Server (NTRS)

Bertelrud, Arild; Watson, R. D.

1988-01-01

The present paper summarizes computational and experimental studies for large-eddy breakup devices (LEBUs). LEBU optimization (using a computational approach considering compressibility, Reynolds number, and the unsteadiness of the flow) and experiments with LEBUs at high Reynolds numbers in flight are discussed. The measurements include streamwise as well as spanwise distributions of local skin friction. The unsteady flows around the LEBU devices and far downstream are characterized by strain-gage measurements on the devices and hot-wire readings downstream. Computations are made with available time-averaged and quasi-stationary techniques to find suitable device profiles with minimum drag.

Subgrid or Reynolds stress-modeling for three-dimensional turbulence computations

NASA Technical Reports Server (NTRS)

Rubesin, M. W.

1975-01-01

A review is given of recent advances in two distinct computational methods for evaluating turbulence fields, namely, statistical Reynolds stress modeling and turbulence simulation, where large eddies are followed in time. It is shown that evaluation of the mean Reynolds stresses, rather than use of a scalar eddy viscosity, permits an explanation of streamline curvature effects found in several experiments. Turbulence simulation, with a new volume averaging technique and third-order accurate finite-difference computing is shown to predict the decay of isotropic turbulence in incompressible flow with rather modest computer storage requirements, even at Reynolds numbers of aerodynamic interest.
Static Schedulers for Embedded Real-Time Systems

DTIC Science & Technology

1989-12-01

Because of the need for having efficient scheduling algorithms in large scale real time systems , software engineers put a lot of effort on developing...provide static schedulers for he Embedded Real Time Systems with single processor using Ada programming language. The independent nonpreemptable...support the Computer Aided Rapid Prototyping for Embedded Real Time Systems so that we determine whether the system, as designed, meets the required
A Fast Approach to Automatic Detection of Brain Lesions

PubMed Central

Koley, Subhranil; Chakraborty, Chandan; Mainero, Caterina; Fischl, Bruce; Aganj, Iman

2017-01-01

Template matching is a popular approach to computer-aided detection of brain lesions from magnetic resonance (MR) images. The outcomes are often sufficient for localizing lesions and assisting clinicians in diagnosis. However, processing large MR volumes with three-dimensional (3D) templates is demanding in terms of computational resources, hence the importance of the reduction of computational complexity of template matching, particularly in situations in which time is crucial (e.g. emergent stroke). In view of this, we make use of 3D Gaussian templates with varying radii and propose a new method to compute the normalized cross-correlation coefficient as a similarity metric between the MR volume and the template to detect brain lesions. Contrary to the conventional fast Fourier transform (FFT) based approach, whose runtime grows as O(N logN) with the number of voxels, the proposed method computes the cross-correlation in O(N). We show through our experiments that the proposed method outperforms the FFT approach in terms of computational time, and retains comparable accuracy. PMID:29082383
Ensemble modeling of stochastic unsteady open-channel flow in terms of its time-space evolutionary probability distribution - Part 2: numerical application

NASA Astrophysics Data System (ADS)

Dib, Alain; Kavvas, M. Levent

2018-03-01

The characteristic form of the Saint-Venant equations is solved in a stochastic setting by using a newly proposed Fokker-Planck Equation (FPE) methodology. This methodology computes the ensemble behavior and variability of the unsteady flow in open channels by directly solving for the flow variables' time-space evolutionary probability distribution. The new methodology is tested on a stochastic unsteady open-channel flow problem, with an uncertainty arising from the channel's roughness coefficient. The computed statistical descriptions of the flow variables are compared to the results obtained through Monte Carlo (MC) simulations in order to evaluate the performance of the FPE methodology. The comparisons show that the proposed methodology can adequately predict the results of the considered stochastic flow problem, including the ensemble averages, variances, and probability density functions in time and space. Unlike the large number of simulations performed by the MC approach, only one simulation is required by the FPE methodology. Moreover, the total computational time of the FPE methodology is smaller than that of the MC approach, which could prove to be a particularly crucial advantage in systems with a large number of uncertain parameters. As such, the results obtained in this study indicate that the proposed FPE methodology is a powerful and time-efficient approach for predicting the ensemble average and variance behavior, in both space and time, for an open-channel flow process under an uncertain roughness coefficient.
Analysis of potential errors in real-time streamflow data and methods of data verification by digital computer

USGS Publications Warehouse

Lystrom, David J.

1972-01-01

Various methods of verifying real-time streamflow data are outlined in part II. Relatively large errors (those greater than 20-30 percent) can be detected readily by use of well-designed verification programs for a digital computer, and smaller errors can be detected only by discharge measurements and field observations. The capability to substitute a simulated discharge value for missing or erroneous data is incorporated in some of the verification routines described. The routines represent concepts ranging from basic statistical comparisons to complex watershed modeling and provide a selection from which real-time data users can choose a suitable level of verification.
Design of a fault tolerant airborne digital computer. Volume 2: Computational requirements and technology

NASA Technical Reports Server (NTRS)

Ratner, R. S.; Shapiro, E. B.; Zeidler, H. M.; Wahlstrom, S. E.; Clark, C. B.; Goldberg, J.

1973-01-01

This final report summarizes the work on the design of a fault tolerant digital computer for aircraft. Volume 2 is composed of two parts. Part 1 is concerned with the computational requirements associated with an advanced commercial aircraft. Part 2 reviews the technology that will be available for the implementation of the computer in the 1975-1985 period. With regard to the computation task 26 computations have been categorized according to computational load, memory requirements, criticality, permitted down-time, and the need to save data in order to effect a roll-back. The technology part stresses the impact of large scale integration (LSI) on the realization of logic and memory. Also considered was module interconnection possibilities so as to minimize fault propagation.
Computational complexity of ecological and evolutionary spatial dynamics

PubMed Central

Ibsen-Jensen, Rasmus; Chatterjee, Krishnendu; Nowak, Martin A.

2015-01-01

There are deep, yet largely unexplored, connections between computer science and biology. Both disciplines examine how information proliferates in time and space. Central results in computer science describe the complexity of algorithms that solve certain classes of problems. An algorithm is deemed efficient if it can solve a problem in polynomial time, which means the running time of the algorithm is a polynomial function of the length of the input. There are classes of harder problems for which the fastest possible algorithm requires exponential time. Another criterion is the space requirement of the algorithm. There is a crucial distinction between algorithms that can find a solution, verify a solution, or list several distinct solutions in given time and space. The complexity hierarchy that is generated in this way is the foundation of theoretical computer science. Precise complexity results can be notoriously difficult. The famous question whether polynomial time equals nondeterministic polynomial time (i.e., P = NP) is one of the hardest open problems in computer science and all of mathematics. Here, we consider simple processes of ecological and evolutionary spatial dynamics. The basic question is: What is the probability that a new invader (or a new mutant) will take over a resident population? We derive precise complexity results for a variety of scenarios. We therefore show that some fundamental questions in this area cannot be answered by simple equations (assuming that P is not equal to NP). PMID:26644569
Parallel task processing of very large datasets

NASA Astrophysics Data System (ADS)

Romig, Phillip Richardson, III

This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.
Molecular dynamics based enhanced sampling of collective variables with very large time steps.

PubMed

Chen, Pei-Yang; Tuckerman, Mark E

2018-01-14

Enhanced sampling techniques that target a set of collective variables and that use molecular dynamics as the driving engine have seen widespread application in the computational molecular sciences as a means to explore the free-energy landscapes of complex systems. The use of molecular dynamics as the fundamental driver of the sampling requires the introduction of a time step whose magnitude is limited by the fastest motions in a system. While standard multiple time-stepping methods allow larger time steps to be employed for the slower and computationally more expensive forces, the maximum achievable increase in time step is limited by resonance phenomena, which inextricably couple fast and slow motions. Recently, we introduced deterministic and stochastic resonance-free multiple time step algorithms for molecular dynamics that solve this resonance problem and allow ten- to twenty-fold gains in the large time step compared to standard multiple time step algorithms [P. Minary et al., Phys. Rev. Lett. 93, 150201 (2004); B. Leimkuhler et al., Mol. Phys. 111, 3579-3594 (2013)]. These methods are based on the imposition of isokinetic constraints that couple the physical system to Nosé-Hoover chains or Nosé-Hoover Langevin schemes. In this paper, we show how to adapt these methods for collective variable-based enhanced sampling techniques, specifically adiabatic free-energy dynamics/temperature-accelerated molecular dynamics, unified free-energy dynamics, and by extension, metadynamics, thus allowing simulations employing these methods to employ similarly very large time steps. The combination of resonance-free multiple time step integrators with free-energy-based enhanced sampling significantly improves the efficiency of conformational exploration.
Molecular dynamics based enhanced sampling of collective variables with very large time steps

NASA Astrophysics Data System (ADS)

Chen, Pei-Yang; Tuckerman, Mark E.

2018-01-01

Enhanced sampling techniques that target a set of collective variables and that use molecular dynamics as the driving engine have seen widespread application in the computational molecular sciences as a means to explore the free-energy landscapes of complex systems. The use of molecular dynamics as the fundamental driver of the sampling requires the introduction of a time step whose magnitude is limited by the fastest motions in a system. While standard multiple time-stepping methods allow larger time steps to be employed for the slower and computationally more expensive forces, the maximum achievable increase in time step is limited by resonance phenomena, which inextricably couple fast and slow motions. Recently, we introduced deterministic and stochastic resonance-free multiple time step algorithms for molecular dynamics that solve this resonance problem and allow ten- to twenty-fold gains in the large time step compared to standard multiple time step algorithms [P. Minary et al., Phys. Rev. Lett. 93, 150201 (2004); B. Leimkuhler et al., Mol. Phys. 111, 3579-3594 (2013)]. These methods are based on the imposition of isokinetic constraints that couple the physical system to Nosé-Hoover chains or Nosé-Hoover Langevin schemes. In this paper, we show how to adapt these methods for collective variable-based enhanced sampling techniques, specifically adiabatic free-energy dynamics/temperature-accelerated molecular dynamics, unified free-energy dynamics, and by extension, metadynamics, thus allowing simulations employing these methods to employ similarly very large time steps. The combination of resonance-free multiple time step integrators with free-energy-based enhanced sampling significantly improves the efficiency of conformational exploration.
Organization of the secure distributed computing based on multi-agent system

NASA Astrophysics Data System (ADS)

Khovanskov, Sergey; Rumyantsev, Konstantin; Khovanskova, Vera

2018-04-01

Nowadays developing methods for distributed computing is received much attention. One of the methods of distributed computing is using of multi-agent systems. The organization of distributed computing based on the conventional network computers can experience security threats performed by computational processes. Authors have developed the unified agent algorithm of control system of computing network nodes operation. Network PCs is used as computing nodes. The proposed multi-agent control system for the implementation of distributed computing allows in a short time to organize using of the processing power of computers any existing network to solve large-task by creating a distributed computing. Agents based on a computer network can: configure a distributed computing system; to distribute the computational load among computers operated agents; perform optimization distributed computing system according to the computing power of computers on the network. The number of computers connected to the network can be increased by connecting computers to the new computer system, which leads to an increase in overall processing power. Adding multi-agent system in the central agent increases the security of distributed computing. This organization of the distributed computing system reduces the problem solving time and increase fault tolerance (vitality) of computing processes in a changing computing environment (dynamic change of the number of computers on the network). Developed a multi-agent system detects cases of falsification of the results of a distributed system, which may lead to wrong decisions. In addition, the system checks and corrects wrong results.
Thermodynamic evaluation of transonic compressor rotors using the finite volume approach

NASA Technical Reports Server (NTRS)

Moore, John; Nicholson, Stephen; Moore, Joan G.

1986-01-01

The development of a computational capability to handle viscous flow with an explicit time-marching method based on the finite volume approach is summarized. Emphasis is placed on the extensions to the computational procedure which allow the handling of shock induced separation and large regions of strong backflow. Appendices contain abstracts of papers and whole reports generated during the contract period.
The Differences in Motivations of Online Game Players and Offline Game Players: A Combined Analysis of Three Studies at Higher Education Level

ERIC Educational Resources Information Center

Hainey, Tom; Connolly, Thomas; Stansfield, Mark; Boyle, Elizabeth

2011-01-01

Computer games have become a highly popular form of entertainment and have had a large impact on how University students spend their leisure time. Due to their highly motivating properties computer games have come to the attention of educationalists who wish to exploit these highly desirable properties for educational purposes. Several studies…
A non-voxel-based broad-beam (NVBB) framework for IMRT treatment planning.

PubMed

Lu, Weiguo

2010-12-07

We present a novel framework that enables very large scale intensity-modulated radiation therapy (IMRT) planning in limited computation resources with improvements in cost, plan quality and planning throughput. Current IMRT optimization uses a voxel-based beamlet superposition (VBS) framework that requires pre-calculation and storage of a large amount of beamlet data, resulting in large temporal and spatial complexity. We developed a non-voxel-based broad-beam (NVBB) framework for IMRT capable of direct treatment parameter optimization (DTPO). In this framework, both objective function and derivative are evaluated based on the continuous viewpoint, abandoning 'voxel' and 'beamlet' representations. Thus pre-calculation and storage of beamlets are no longer needed. The NVBB framework has linear complexities (O(N(3))) in both space and time. The low memory, full computation and data parallelization nature of the framework render its efficient implementation on the graphic processing unit (GPU). We implemented the NVBB framework and incorporated it with the TomoTherapy treatment planning system (TPS). The new TPS runs on a single workstation with one GPU card (NVBB-GPU). Extensive verification/validation tests were performed in house and via third parties. Benchmarks on dose accuracy, plan quality and throughput were compared with the commercial TomoTherapy TPS that is based on the VBS framework and uses a computer cluster with 14 nodes (VBS-cluster). For all tests, the dose accuracy of these two TPSs is comparable (within 1%). Plan qualities were comparable with no clinically significant difference for most cases except that superior target uniformity was seen in the NVBB-GPU for some cases. However, the planning time using the NVBB-GPU was reduced many folds over the VBS-cluster. In conclusion, we developed a novel NVBB framework for IMRT optimization. The continuous viewpoint and DTPO nature of the algorithm eliminate the need for beamlets and lead to better plan quality. The computation parallelization on a GPU instead of a computer cluster significantly reduces hardware and service costs. Compared with using the current VBS framework on a computer cluster, the planning time is significantly reduced using the NVBB framework on a single workstation with a GPU card.
Computations of Internal and External Axisymmetric Nozzle Aerodynamics at Transonic Speeds

NASA Technical Reports Server (NTRS)

Dalbello, Teryn; Georgiadis, Nicholas; Yoder, Dennis; Keith, Theo

2003-01-01

Computational Fluid Dynamics (CFD) analyses of axisymmetric circular-arc boattail nozzles have been completed in support of NASA's Next Generation Launch Technology Program to investigate the effects of high-speed nozzle geometries on the nozzle internal flow and the surrounding boattail regions. These computations span the very difficult transonic flight regime, with shock-induced separations and strong adverse pressure gradients. External afterbody and internal nozzle pressure distributions computed with the Wind code are compared with experimental data. A range of turbulence models were examined in Wind, including an Explicit Algebraic Stress model (EASM). Computations on two nozzle geometries have been completed at freestream Mach numbers ranging from 0.6 to 0.9, driven by nozzle pressure ratios (NPR) ranging from 2.9 to 5. Results obtained on converging-only geometry indicate reasonable agreement to experimental data, with the EASM and Shear Stress Transport (SST) turbulence models providing the best agreement. Calculations completed on a converging-diverging geometry involving large-scale internal flow separation did not converge to a true steady-state solution when run with variable timestepping (steady-state). Calculations obtained using constant timestepping (time-accurate) indicate less variations in flow properties compared with steady-state solutions. This failure to converge to a steady-state solution was found to be the result of difficulties in using variable time-stepping with large-scale separations present in the flow. Nevertheless, time-averaged boattail surface pressure coefficient and internal nozzle pressures show fairly good agreement with experimental data. The SST turbulence model demonstrates the best over-all agreement with experimental data.
Application of high-performance computing to numerical simulation of human movement

NASA Technical Reports Server (NTRS)

Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.

1995-01-01

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
Parallel Calculation of Sensitivity Derivatives for Aircraft Design using Automatic Differentiation

NASA Technical Reports Server (NTRS)

Bischof, c. H.; Green, L. L.; Haigler, K. J.; Knauff, T. L., Jr.

1994-01-01

Sensitivity derivative (SD) calculation via automatic differentiation (AD) typical of that required for the aerodynamic design of a transport-type aircraft is considered. Two ways of computing SD via code generated by the ADIFOR automatic differentiation tool are compared for efficiency and applicability to problems involving large numbers of design variables. A vector implementation on a Cray Y-MP computer is compared with a coarse-grained parallel implementation on an IBM SP1 computer, employing a Fortran M wrapper. The SD are computed for a swept transport wing in turbulent, transonic flow; the number of geometric design variables varies from 1 to 60 with coupling between a wing grid generation program and a state-of-the-art, 3-D computational fluid dynamics program, both augmented for derivative computation via AD. For a small number of design variables, the Cray Y-MP implementation is much faster. As the number of design variables grows, however, the IBM SP1 becomes an attractive alternative in terms of compute speed, job turnaround time, and total memory available for solutions with large numbers of design variables. The coarse-grained parallel implementation also can be moved easily to a network of workstations.
The Shortlist Method for fast computation of the Earth Mover's Distance and finding optimal solutions to transportation problems.

PubMed

Gottschlich, Carsten; Schuhmacher, Dominic

2014-01-01

Finding solutions to the classical transportation problem is of great importance, since this optimization problem arises in many engineering and computer science applications. Especially the Earth Mover's Distance is used in a plethora of applications ranging from content-based image retrieval, shape matching, fingerprint recognition, object tracking and phishing web page detection to computing color differences in linguistics and biology. Our starting point is the well-known revised simplex algorithm, which iteratively improves a feasible solution to optimality. The Shortlist Method that we propose substantially reduces the number of candidates inspected for improving the solution, while at the same time balancing the number of pivots required. Tests on simulated benchmarks demonstrate a considerable reduction in computation time for the new method as compared to the usual revised simplex algorithm implemented with state-of-the-art initialization and pivot strategies. As a consequence, the Shortlist Method facilitates the computation of large scale transportation problems in viable time. In addition we describe a novel method for finding an initial feasible solution which we coin Modified Russell's Method.
The Shortlist Method for Fast Computation of the Earth Mover's Distance and Finding Optimal Solutions to Transportation Problems

PubMed Central

Gottschlich, Carsten; Schuhmacher, Dominic

2014-01-01

Finding solutions to the classical transportation problem is of great importance, since this optimization problem arises in many engineering and computer science applications. Especially the Earth Mover's Distance is used in a plethora of applications ranging from content-based image retrieval, shape matching, fingerprint recognition, object tracking and phishing web page detection to computing color differences in linguistics and biology. Our starting point is the well-known revised simplex algorithm, which iteratively improves a feasible solution to optimality. The Shortlist Method that we propose substantially reduces the number of candidates inspected for improving the solution, while at the same time balancing the number of pivots required. Tests on simulated benchmarks demonstrate a considerable reduction in computation time for the new method as compared to the usual revised simplex algorithm implemented with state-of-the-art initialization and pivot strategies. As a consequence, the Shortlist Method facilitates the computation of large scale transportation problems in viable time. In addition we describe a novel method for finding an initial feasible solution which we coin Modified Russell's Method. PMID:25310106
The relationship between playing computer or video games with mental health and social relationships among students in guidance schools, Kermanshah.

PubMed

Reshadat, S; Ghasemi, S R; Ahmadian, M; RajabiGilan, N

2014-01-09

Computer or video games are a popular recreational activity and playing them may constitute a large part of leisure time. This cross-sectional study aimed to evaluate the relationship between playing computer or video games with mental health and social relationships among students in guidance schools in Kermanshah, Islamic Republic of Iran, in 2012. Our total sample was 573 students and our tool was the General Health Questionnaire (GHQ-28) and social relationships questionnaires. Survey respondents reported spending an average of 71.07 (SD 72.1) min/day on computer or video games. There was a significant relationship between time spent playing games and general mental health (P < 0.04) and depression (P < 0.03). There was also a significant difference between playing and not playing computer or video games with social relationships and their subscales, including trans-local relationships (P < 0.0001) and association relationships (P < 0.01) among all participants. There was also a significant relationship between social relationships and time spent playing games (P < 0.02) and its dimensions, except for family relationships.

Challenges in scaling NLO generators to leadership computers

NASA Astrophysics Data System (ADS)

Benjamin, D.; Childers, JT; Hoeche, S.; LeCompte, T.; Uram, T.

2017-10-01

Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was successfully scaled to millions of threads to achieve this level of usage on Mira. Sherpa and MadGraph are next-to-leading order generators used heavily by LHC experiments for simulation. Integration times for high-multiplicity or rare processes can take a week or more on standard Grid machines, even using all 16-cores. We will describe our ongoing work to scale the Sherpa generator to thousands of threads on leadership-class machines and reduce run-times to less than a day. This work allows the experiments to leverage large-scale parallel supercomputers for event generation today, freeing tens of millions of grid hours for other work, and paving the way for future applications (simulation, reconstruction) on these and future supercomputers.
Computational Study of Axisymmetric Off-Design Nozzle Flows

NASA Technical Reports Server (NTRS)

DalBello, Teryn; Georgiadis, Nicholas; Yoder, Dennis; Keith, Theo

2003-01-01

Computational Fluid Dynamics (CFD) analyses of axisymmetric circular-arc boattail nozzles operating off-design at transonic Mach numbers have been completed. These computations span the very difficult transonic flight regime with shock-induced separations and strong adverse pressure gradients. External afterbody and internal nozzle pressure distributions computed with the Wind code are compared with experimental data. A range of turbulence models were examined, including the Explicit Algebraic Stress model. Computations have been completed at freestream Mach numbers of 0.9 and 1.2, and nozzle pressure ratios (NPR) of 4 and 6. Calculations completed with variable time-stepping (steady-state) did not converge to a true steady-state solution. Calculations obtained using constant timestepping (timeaccurate) indicate less variations in flow properties compared with steady-state solutions. This failure to converge to a steady-state solution was the result of using variable time-stepping with large-scale separations present in the flow. Nevertheless, time-averaged boattail surface pressure coefficient and internal nozzle pressures show reasonable agreement with experimental data. The SST turbulence model demonstrates the best overall agreement with experimental data.
Fast Learning for Immersive Engagement in Energy Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bush, Brian W; Bugbee, Bruce; Gruchalla, Kenny M

The fast computation which is critical for immersive engagement with and learning from energy simulations would be furthered by developing a general method for creating rapidly computed simplified versions of NREL's computation-intensive energy simulations. Created using machine learning techniques, these 'reduced form' simulations can provide statistically sound estimates of the results of the full simulations at a fraction of the computational cost with response times - typically less than one minute of wall-clock time - suitable for real-time human-in-the-loop design and analysis. Additionally, uncertainty quantification techniques can document the accuracy of the approximate models and their domain of validity. Approximationmore » methods are applicable to a wide range of computational models, including supply-chain models, electric power grid simulations, and building models. These reduced-form representations cannot replace or re-implement existing simulations, but instead supplement them by enabling rapid scenario design and quality assurance for large sets of simulations. We present an overview of the framework and methods we have implemented for developing these reduced-form representations.« less
Implementation of a digital optical matrix-vector multiplier using a holographic look-up table and residue arithmetic

NASA Technical Reports Server (NTRS)

Habiby, Sarry F.

1987-01-01

The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. The objective is to demonstrate the operation of an optical processor designed to minimize computation time in performing a practical computing application. This is done by using the large array of processing elements in a Hughes liquid crystal light valve, and relying on the residue arithmetic representation, a holographic optical memory, and position coded optical look-up tables. In the design, all operations are performed in effectively one light valve response time regardless of matrix size. The features of the design allowing fast computation include the residue arithmetic representation, the mapping approach to computation, and the holographic memory. In addition, other features of the work include a practical light valve configuration for efficient polarization control, a model for recording multiple exposures in silver halides with equal reconstruction efficiency, and using light from an optical fiber for a reference beam source in constructing the hologram. The design can be extended to implement larger matrix arrays without increasing computation time.
Computation of Asteroid Proper Elements on the Grid

NASA Astrophysics Data System (ADS)

Novakovic, B.; Balaz, A.; Knezevic, Z.; Potocnik, M.

2009-12-01

A procedure of gridification of the computation of asteroid proper orbital elements is described. The need to speed up the time consuming computations and make them more efficient is justified by the large increase of observational data expected from the next generation all sky surveys. We give the basic notion of proper elements and of the contemporary theories and methods used to compute them for different populations of objects. Proper elements for nearly 70,000 asteroids are derived since the beginning of use of the Grid infrastructure for the purpose. The average time for the catalogs update is significantly shortened with respect to the time needed with stand-alone workstations. We also present basics of the Grid computing, the concepts of Grid middleware and its Workload management system. The practical steps we undertook to efficiently gridify our application are described in full detail. We present the results of a comprehensive testing of the performance of different Grid sites, and offer some practical conclusions based on the benchmark results and on our experience. Finally, we propose some possibilities for the future work.
Computer Drawing Method for Operating Characteristic Curve of PV Power Plant Array Unit

NASA Astrophysics Data System (ADS)

Tan, Jianbin

2018-02-01

According to the engineering design of large-scale grid-connected photovoltaic power stations and the research and development of many simulation and analysis systems, it is necessary to draw a good computer graphics of the operating characteristic curves of photovoltaic array elements and to propose a good segmentation non-linear interpolation algorithm. In the calculation method, Component performance parameters as the main design basis, the computer can get 5 PV module performances. At the same time, combined with the PV array series and parallel connection, the computer drawing of the performance curve of the PV array unit can be realized. At the same time, the specific data onto the module of PV development software can be calculated, and the good operation of PV array unit can be improved on practical application.
An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies.

PubMed

Zhang, Guo-Qiang; Xing, Guangming; Cui, Licong

2018-04-01

One of the basic challenges in developing structural methods for systematic audition on the quality of biomedical ontologies is the computational cost usually involved in exhaustive sub-graph analysis. We introduce ANT-LCA, a new algorithm for computing all non-trivial lowest common ancestors (LCA) of each pair of concepts in the hierarchical order induced by an ontology. The computation of LCA is a fundamental step for non-lattice approach for ontology quality assurance. Distinct from existing approaches, ANT-LCA only computes LCAs for non-trivial pairs, those having at least one common ancestor. To skip all trivial pairs that may be of no practical interest, ANT-LCA employs a simple but innovative algorithmic strategy combining topological order and dynamic programming to keep track of non-trivial pairs. We provide correctness proofs and demonstrate a substantial reduction in computational time for two largest biomedical ontologies: SNOMED CT and Gene Ontology (GO). ANT-LCA achieved an average computation time of 30 and 3 sec per version for SNOMED CT and GO, respectively, about 2 orders of magnitude faster than the best known approaches. Our algorithm overcomes a fundamental computational barrier in sub-graph based structural analysis of large ontological systems. It enables the implementation of a new breed of structural auditing methods that not only identifies potential problematic areas, but also automatically suggests changes to fix the issues. Such structural auditing methods can lead to more effective tools supporting ontology quality assurance work. Copyright © 2018 Elsevier Inc. All rights reserved.
Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanford, M.

1997-12-31

Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
Computer usage and task-switching during resident's working day: Disruptive or not?

PubMed

Méan, Marie; Garnier, Antoine; Wenger, Nathalie; Castioni, Julien; Waeber, Gérard; Marques-Vidal, Pedro

2017-01-01

Recent implementation of electronic health records (EHR) has dramatically changed medical ward organization. While residents in general internal medicine use EHR systems half of their working time, whether computer usage impacts residents' workflow remains uncertain. We aimed to observe the frequency of task-switches occurring during resident's work and to assess whether computer usage was associated with task-switching. In a large Swiss academic university hospital, we conducted, between May 26 and July 24, 2015 a time-motion study to assess how residents in general internal medicine organize their working day. We observed 49 day and 17 evening shifts of 36 residents, amounting to 697 working hours. During day shifts, residents spent 5.4 hours using a computer (mean total working time: 11.6 hours per day). On average, residents switched 15 times per hour from a task to another. Task-switching peaked between 8:00-9:00 and 16:00-17:00. Task-switching was not associated with resident's characteristics and no association was found between task-switching and extra hours (Spearman r = 0.220, p = 0.137 for day and r = 0.483, p = 0.058 for evening shifts). Computer usage occurred more frequently at the beginning or ends of day shifts and was associated with decreased overall task-switching. Task-switching occurs very frequently during resident's working day. Despite the fact that residents used a computer half of their working time, computer usage was associated with decreased task-switching. Whether frequent task-switches and computer usage impact the quality of patient care and resident's work must be evaluated in further studies.
Java Performance for Scientific Applications on LLNL Computer Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kapfer, C; Wissink, A

2002-05-10

Languages in use for high performance computing at the laboratory--Fortran (f77 and f90), C, and C++--have many years of development behind them and are generally considered the fastest available. However, Fortran and C do not readily extend to object-oriented programming models, limiting their capability for very complex simulation software. C++ facilitates object-oriented programming but is a very complex and error-prone language. Java offers a number of capabilities that these other languages do not. For instance it implements cleaner (i.e., easier to use and less prone to errors) object-oriented models than C++. It also offers networking and security as part ofmore » the language standard, and cross-platform executables that make it architecture neutral, to name a few. These features have made Java very popular for industrial computing applications. The aim of this paper is to explain the trade-offs in using Java for large-scale scientific applications at LLNL. Despite its advantages, the computational science community has been reluctant to write large-scale computationally intensive applications in Java due to concerns over its poor performance. However, considerable progress has been made over the last several years. The Java Grande Forum [1] has been promoting the use of Java for large-scale computing. Members have introduced efficient array libraries, developed fast just-in-time (JIT) compilers, and built links to existing packages used in high performance parallel computing.« less
Numerical Technology for Large-Scale Computational Electromagnetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, R; Champagne, N; White, D

The key bottleneck of implicit computational electromagnetics tools for large complex geometries is the solution of the resulting linear system of equations. The goal of this effort was to research and develop critical numerical technology that alleviates this bottleneck for large-scale computational electromagnetics (CEM). The mathematical operators and numerical formulations used in this arena of CEM yield linear equations that are complex valued, unstructured, and indefinite. Also, simultaneously applying multiple mathematical modeling formulations to different portions of a complex problem (hybrid formulations) results in a mixed structure linear system, further increasing the computational difficulty. Typically, these hybrid linear systems aremore » solved using a direct solution method, which was acceptable for Cray-class machines but does not scale adequately for ASCI-class machines. Additionally, LLNL's previously existing linear solvers were not well suited for the linear systems that are created by hybrid implicit CEM codes. Hence, a new approach was required to make effective use of ASCI-class computing platforms and to enable the next generation design capabilities. Multiple approaches were investigated, including the latest sparse-direct methods developed by our ASCI collaborators. In addition, approaches that combine domain decomposition (or matrix partitioning) with general-purpose iterative methods and special purpose pre-conditioners were investigated. Special-purpose pre-conditioners that take advantage of the structure of the matrix were adapted and developed based on intimate knowledge of the matrix properties. Finally, new operator formulations were developed that radically improve the conditioning of the resulting linear systems thus greatly reducing solution time. The goal was to enable the solution of CEM problems that are 10 to 100 times larger than our previous capability.« less
GPU acceleration of Dock6's Amber scoring computation.

PubMed

Yang, Hailong; Zhou, Qiongqiong; Li, Bo; Wang, Yongjian; Luan, Zhongzhi; Qian, Depei; Li, Hanlu

2010-01-01

Dressing the problem of virtual screening is a long-term goal in the drug discovery field, which if properly solved, can significantly shorten new drugs' R&D cycle. The scoring functionality that evaluates the fitness of the docking result is one of the major challenges in virtual screening. In general, scoring functionality in docking requires a large amount of floating-point calculations, which usually takes several weeks or even months to be finished. This time-consuming procedure is unacceptable, especially when highly fatal and infectious virus arises such as SARS and H1N1, which forces the scoring task to be done in a limited time. This paper presents how to leverage the computational power of GPU to accelerate Dock6's (http://dock.compbio.ucsf.edu/DOCK_6/) Amber (J. Comput. Chem. 25: 1157-1174, 2004) scoring with NVIDIA CUDA (NVIDIA Corporation Technical Staff, Compute Unified Device Architecture - Programming Guide, NVIDIA Corporation, 2008) (Compute Unified Device Architecture) platform. We also discuss many factors that will greatly influence the performance after porting the Amber scoring to GPU, including thread management, data transfer, and divergence hidden. Our experiments show that the GPU-accelerated Amber scoring achieves a 6.5× speedup with respect to the original version running on AMD dual-core CPU for the same problem size. This acceleration makes the Amber scoring more competitive and efficient for large-scale virtual screening problems.
Evaluation of a finite multipole expansion technique for the computation of electrostatic potentials of dibenzo-p-dioxins and related systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murray, J.S.; Grice, M.E.; Politzer, P.

1990-01-01

The electrostatic potential V(r) that the nuclei and electrons of a molecule create in the surrounding space is well established as a guide in the study of molecular reactivity, and particularly, of biological recognition processes. Its rigorous computation is, however, very demanding of computer time for large molecules, such as those of interest in recognition interactions. The authors have accordingly investigated the use of an approximate finite multicenter multipole expansion technique to determine its applicability for producing reliable electrostatic potentials of dibenzo-p-dioxins and related molecules, with significantly reduced amounts of computer time, at distances of interest in recognition studies. Amore » comparative analysis of the potentials of three dibenzo-q-dioxins and a substituted naphthalene molecule computed using both the multipole expansion technique and GAUSSIAN 82 at the STO-5G level has been carried out. Overall they found that regions of negative and positive V(r) at 1.75 A above the molecular plane are very well reproduced by the multipole expansion technique, with up to a twenty-fold improvement in computer time.« less
An assessment of the potential of PFEM-2 for solving long real-time industrial applications

NASA Astrophysics Data System (ADS)

Gimenez, Juan M.; Ramajo, Damián E.; Márquez Damián, Santiago; Nigro, Norberto M.; Idelsohn, Sergio R.

2017-07-01

The latest generation of the particle finite element method (PFEM-2) is a numerical method based on the Lagrangian formulation of the equations, which presents advantages in terms of robustness and efficiency over classical Eulerian methodologies when certain kind of flows are simulated, especially those where convection plays an important role. These situations are often encountered in real engineering problems, where very complex geometries and operating conditions require very large and long computations. The advantages that the parallelism introduced in the computational fluid dynamics making affordable computations with very fine spatial discretizations are well known. However, it is not possible to have the time parallelized, despite the effort that is being dedicated to use space-time formulations. In this sense, PFEM-2 adds a valuable feature in that its strong stability with little loss of accuracy provides an interesting way of satisfying the real-life computation needs. After having already demonstrated in previous publications its ability to achieve academic-based solutions with a good compromise between accuracy and efficiency, in this work, the method is revisited and employed to solve several nonacademic problems of technological interest, which fall into that category. Simulations concerning oil-water separation, waste-water treatment, metallurgical foundries, and safety assessment are presented. These cases are selected due to their particular requirements of long simulation times and or intensive interface treatment. Thus, large time-steps may be employed with PFEM-2 without compromising the accuracy and robustness of the simulation, as occurs with Eulerian alternatives, showing the potentiality of the methodology for solving not only academic tests but also real engineering problems.
IMPLEMENTATION OF THE IMPROVED QUASI-STATIC METHOD IN RATTLESNAKE/MOOSE FOR TIME-DEPENDENT RADIATION TRANSPORT MODELLING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zachary M. Prince; Jean C. Ragusa; Yaqi Wang

Because of the recent interest in reactor transient modeling and the restart of the Transient Reactor (TREAT) Facility, there has been a need for more efficient, robust methods in computation frameworks. This is the impetus of implementing the Improved Quasi-Static method (IQS) in the RATTLESNAKE/MOOSE framework. IQS has implemented with CFEM diffusion by factorizing flux into time-dependent amplitude and spacial- and weakly time-dependent shape. The shape evaluation is very similar to a flux diffusion solve and is computed at large (macro) time steps. While the amplitude evaluation is a PRKE solve where the parameters are dependent on the shape andmore » is computed at small (micro) time steps. IQS has been tested with a custom one-dimensional example and the TWIGL ramp benchmark. These examples prove it to be a viable and effective method for highly transient cases. More complex cases are intended to be applied to further test the method and its implementation.« less
Computational helioseismology in the frequency domain: acoustic waves in axisymmetric solar models with flows

NASA Astrophysics Data System (ADS)

Gizon, Laurent; Barucq, Hélène; Duruflé, Marc; Hanson, Chris S.; Leguèbe, Michael; Birch, Aaron C.; Chabassier, Juliette; Fournier, Damien; Hohage, Thorsten; Papini, Emanuele

2017-04-01

Context. Local helioseismology has so far relied on semi-analytical methods to compute the spatial sensitivity of wave travel times to perturbations in the solar interior. These methods are cumbersome and lack flexibility. Aims: Here we propose a convenient framework for numerically solving the forward problem of time-distance helioseismology in the frequency domain. The fundamental quantity to be computed is the cross-covariance of the seismic wavefield. Methods: We choose sources of wave excitation that enable us to relate the cross-covariance of the oscillations to the Green's function in a straightforward manner. We illustrate the method by considering the 3D acoustic wave equation in an axisymmetric reference solar model, ignoring the effects of gravity on the waves. The symmetry of the background model around the rotation axis implies that the Green's function can be written as a sum of longitudinal Fourier modes, leading to a set of independent 2D problems. We use a high-order finite-element method to solve the 2D wave equation in frequency space. The computation is embarrassingly parallel, with each frequency and each azimuthal order solved independently on a computer cluster. Results: We compute travel-time sensitivity kernels in spherical geometry for flows, sound speed, and density perturbations under the first Born approximation. Convergence tests show that travel times can be computed with a numerical precision better than one millisecond, as required by the most precise travel-time measurements. Conclusions: The method presented here is computationally efficient and will be used to interpret travel-time measurements in order to infer, e.g., the large-scale meridional flow in the solar convection zone. It allows the implementation of (full-waveform) iterative inversions, whereby the axisymmetric background model is updated at each iteration.
Extending the Binomial Checkpointing Technique for Resilience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walther, Andrea; Narayanan, Sri Hari Krishna

In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, re- quired, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algo- rithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massivemore » parallel simulations and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We de- scribe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding imple- mentation and discuss numerical results.« less
Extending the length and time scales of Gram–Schmidt Lyapunov vector computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Costa, Anthony B., E-mail: acosta@northwestern.edu; Green, Jason R., E-mail: jason.green@umb.edu; Department of Chemistry, University of Massachusetts Boston, Boston, MA 02125

Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N{sup 2} (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 betweenmore » Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.« less
Screen time and physical violence in 10 to 16-year-old Canadian youth.

PubMed

Janssen, Ian; Boyce, William F; Pickett, William

2012-04-01

To examine the independent associations between television, computer, and video game use with physical violence in youth. The study population consisted of a representative cross-sectional sample of 9,672 Canadian youth in grades 6-10 and a 1-year longitudinal sample of 1,861 youth in grades 9-10. The number of weekly hours watching television, playing video games, and using a computer was determined. Violence was defined as engagement in ≥2 physical fights in the previous year and/or perpetration of ≥2-3 monthly episodes of physical bullying. Logistic regression was used to examine associations. In the cross-sectional sample, computer use was associated with violence independent of television and video game use. Video game use was associated with violence in girls but not boys. Television use was not associated with violence after controlling for the other screen time measures. In the longitudinal sample, video game use was a significant predictor of violence after controlling for the other screen time measures. Computer and video game use were the screen time measures most strongly related to violence in this large sample of youth.
GPU-accelerated iterative reconstruction for limited-data tomography in CBCT systems.

PubMed

de Molina, Claudia; Serrano, Estefania; Garcia-Blas, Javier; Carretero, Jesus; Desco, Manuel; Abella, Monica

2018-05-15

Standard cone-beam computed tomography (CBCT) involves the acquisition of at least 360 projections rotating through 360 degrees. Nevertheless, there are cases in which only a few projections can be taken in a limited angular span, such as during surgery, where rotation of the source-detector pair is limited to less than 180 degrees. Reconstruction of limited data with the conventional method proposed by Feldkamp, Davis and Kress (FDK) results in severe artifacts. Iterative methods may compensate for the lack of data by including additional prior information, although they imply a high computational burden and memory consumption. We present an accelerated implementation of an iterative method for CBCT following the Split Bregman formulation, which reduces computational time through GPU-accelerated kernels. The implementation enables the reconstruction of large volumes (>1024 3 pixels) using partitioning strategies in forward- and back-projection operations. We evaluated the algorithm on small-animal data for different scenarios with different numbers of projections, angular span, and projection size. Reconstruction time varied linearly with the number of projections and quadratically with projection size but remained almost unchanged with angular span. Forward- and back-projection operations represent 60% of the total computational burden. Efficient implementation using parallel processing and large-memory management strategies together with GPU kernels enables the use of advanced reconstruction approaches which are needed in limited-data scenarios. Our GPU implementation showed a significant time reduction (up to 48 ×) compared to a CPU-only implementation, resulting in a total reconstruction time from several hours to few minutes.

Hierarchical Time-Lagged Independent Component Analysis: Computing Slow Modes and Reaction Coordinates for Large Molecular Systems.

PubMed

Pérez-Hernández, Guillermo; Noé, Frank

2016-12-13

Analysis of molecular dynamics, for example using Markov models, often requires the identification of order parameters that are good indicators of the rare events, i.e. good reaction coordinates. Recently, it has been shown that the time-lagged independent component analysis (TICA) finds the linear combinations of input coordinates that optimally represent the slow kinetic modes and may serve in order to define reaction coordinates between the metastable states of the molecular system. A limitation of the method is that both computing time and memory requirements scale with the square of the number of input features. For large protein systems, this exacerbates the use of extensive feature sets such as the distances between all pairs of residues or even heavy atoms. Here we derive a hierarchical TICA (hTICA) method that approximates the full TICA solution by a hierarchical, divide-and-conquer calculation. By using hTICA on distances between heavy atoms we identify previously unknown relaxation processes in the bovine pancreatic trypsin inhibitor.
Anticipation of the landing shock phenomenon in flight simulation

NASA Technical Reports Server (NTRS)

Mcfarland, Richard E.

1987-01-01

An aircraft landing may be described as a controlled crash because a runway surface is intercepted. In a simulation model the transition from aerodynamic flight to weight on wheels involves a single computational cycle during which stiff differential equations are activated; with a significant probability these initial conditions are unrealistic. This occurs because of the finite cycle time, during which large restorative forces will accompany unrealistic initial oleo compressions. This problem was recognized a few years ago at Ames Research Center during simulation studies of a supersonic transport. The mathematical model of this vehicle severely taxed computational resources, and required a large cycle time. The ground strike problem was solved by a described technique called anticipation equations. This extensively used technique has not been previously reported. The technique of anticipating a significant event is a useful tool in the general field of discrete flight simulation. For the differential equations representing a landing gear model stiffness, rate of interception and cycle time may combine to produce an unrealistic simulation of the continuum.
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size

NASA Astrophysics Data System (ADS)

Shao, Rui; Zhong, Sheng; Yan, Luxin

2015-12-01

Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
Portable parallel stochastic optimization for the design of aeropropulsion components

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Rhodes, G. S.

1994-01-01

This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Analysis of large 16S rRNA Illumina data sets: Impact of singleton read filtering on microbial community description.

PubMed

Auer, Lucas; Mariadassou, Mahendra; O'Donohue, Michael; Klopp, Christophe; Hernandez-Raquet, Guillermina

2017-11-01

Next-generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α- and β-diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α- and β-community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates. © 2017 John Wiley & Sons Ltd.
Large-eddy simulations with wall models

NASA Technical Reports Server (NTRS)

Cabot, W.

1995-01-01

The near-wall viscous and buffer regions of wall-bounded flows generally require a large expenditure of computational resources to be resolved adequately, even in large-eddy simulation (LES). Often as much as 50% of the grid points in a computational domain are devoted to these regions. The dense grids that this implies also generally require small time steps for numerical stability and/or accuracy. It is commonly assumed that the inner wall layers are near equilibrium, so that the standard logarithmic law can be applied as the boundary condition for the wall stress well away from the wall, for example, in the logarithmic region, obviating the need to expend large amounts of grid points and computational time in this region. This approach is commonly employed in LES of planetary boundary layers, and it has also been used for some simple engineering flows. In order to calculate accurately a wall-bounded flow with coarse wall resolution, one requires the wall stress as a boundary condition. The goal of this work is to determine the extent to which equilibrium and boundary layer assumptions are valid in the near-wall regions, to develop models for the inner layer based on such assumptions, and to test these modeling ideas in some relatively simple flows with different pressure gradients, such as channel flow and flow over a backward-facing step. Ultimately, models that perform adequately in these situations will be applied to more complex flow configurations, such as an airfoil.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.

PubMed

Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi

2018-01-01

Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Computer aided manual validation of mass spectrometry-based proteomic data.

PubMed

Curran, Timothy G; Bryson, Bryan D; Reigelhaupt, Michael; Johnson, Hannah; White, Forest M

2013-06-15

Advances in mass spectrometry-based proteomic technologies have increased the speed of analysis and the depth provided by a single analysis. Computational tools to evaluate the accuracy of peptide identifications from these high-throughput analyses have not kept pace with technological advances; currently the most common quality evaluation methods are based on statistical analysis of the likelihood of false positive identifications in large-scale data sets. While helpful, these calculations do not consider the accuracy of each identification, thus creating a precarious situation for biologists relying on the data to inform experimental design. Manual validation is the gold standard approach to confirm accuracy of database identifications, but is extremely time-intensive. To palliate the increasing time required to manually validate large proteomic datasets, we provide computer aided manual validation software (CAMV) to expedite the process. Relevant spectra are collected, catalogued, and pre-labeled, allowing users to efficiently judge the quality of each identification and summarize applicable quantitative information. CAMV significantly reduces the burden associated with manual validation and will hopefully encourage broader adoption of manual validation in mass spectrometry-based proteomics. Copyright © 2013 Elsevier Inc. All rights reserved.
Solving optimization problems on computational grids.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wright, S. J.; Mathematics and Computer Science

2001-05-01

Multiprocessor computing platforms, which have become more and more widely available since the mid-1980s, are now heavily used by organizations that need to solve very demanding computational problems. Parallel computing is now central to the culture of many research communities. Novel parallel approaches were developed for global optimization, network optimization, and direct-search methods for nonlinear optimization. Activity was particularly widespread in parallel branch-and-bound approaches for various problems in combinatorial and network optimization. As the cost of personal computers and low-end workstations has continued to fall, while the speed and capacity of processors and networks have increased dramatically, 'cluster' platforms havemore » become popular in many settings. A somewhat different type of parallel computing platform know as a computational grid (alternatively, metacomputer) has arisen in comparatively recent times. Broadly speaking, this term refers not to a multiprocessor with identical processing nodes but rather to a heterogeneous collection of devices that are widely distributed, possibly around the globe. The advantage of such platforms is obvious: they have the potential to deliver enormous computing power. Just as obviously, however, the complexity of grids makes them very difficult to use. The Condor team, headed by Miron Livny at the University of Wisconsin, were among the pioneers in providing infrastructure for grid computations. More recently, the Globus project has developed technologies to support computations on geographically distributed platforms consisting of high-end computers, storage and visualization devices, and other scientific instruments. In 1997, we started the metaneos project as a collaborative effort between optimization specialists and the Condor and Globus groups. Our aim was to address complex, difficult optimization problems in several areas, designing and implementing the algorithms and the software infrastructure need to solve these problems on computational grids. This article describes some of the results we have obtained during the first three years of the metaneos project. Our efforts have led to development of the runtime support library MW for implementing algorithms with master-worker control structure on Condor platforms. This work is discussed here, along with work on algorithms and codes for integer linear programming, the quadratic assignment problem, and stochastic linear programmming. Our experiences in the metaneos project have shown that cheap, powerful computational grids can be used to tackle large optimization problems of various types. In an industrial or commercial setting, the results demonstrate that one may not have to buy powerful computational servers to solve many of the large problems arising in areas such as scheduling, portfolio optimization, or logistics; the idle time on employee workstations (or, at worst, an investment in a modest cluster of PCs) may do the job. For the optimization research community, our results motivate further work on parallel, grid-enabled algorithms for solving very large problems of other types. The fact that very large problems can be solved cheaply allows researchers to better understand issues of 'practical' complexity and of the role of heuristics.« less
The TeraShake Computational Platform for Large-Scale Earthquake Simulations

NASA Astrophysics Data System (ADS)

Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
Stream computing for biomedical signal processing: A QRS complex detection case-study.

PubMed

Murphy, B M; O'Driscoll, C; Boylan, G B; Lightbody, G; Marnane, W P

2015-01-01

Recent developments in "Big Data" have brought significant gains in the ability to process large amounts of data on commodity server hardware. Stream computing is a relatively new paradigm in this area, addressing the need to process data in real time with very low latency. While this approach has been developed for dealing with large scale data from the world of business, security and finance, there is a natural overlap with clinical needs for physiological signal processing. In this work we present a case study of streams processing applied to a typical physiological signal processing problem: QRS detection from ECG data.
Software environment for implementing engineering applications on MIMD computers

NASA Technical Reports Server (NTRS)

Lopez, L. A.; Valimohamed, K. A.; Schiff, S.

1990-01-01

In this paper the concept for a software environment for developing engineering application systems for multiprocessor hardware (MIMD) is presented. The philosophy employed is to solve the largest problems possible in a reasonable amount of time, rather than solve existing problems faster. In the proposed environment most of the problems concerning parallel computation and handling of large distributed data spaces are hidden from the application program developer, thereby facilitating the development of large-scale software applications. Applications developed under the environment can be executed on a variety of MIMD hardware; it protects the application software from the effects of a rapidly changing MIMD hardware technology.
GRADIENT: Graph Analytic Approach for Discovering Irregular Events, Nascent and Temporal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogan, Emilie

2015-03-31

Finding a time-ordered signature within large graphs is a computationally complex problem due to the combinatorial explosion of potential patterns. GRADIENT is designed to search and understand that problem space.
GRADIENT: Graph Analytic Approach for Discovering Irregular Events, Nascent and Temporal

ScienceCinema

Hogan, Emilie

2018-01-16

Finding a time-ordered signature within large graphs is a computationally complex problem due to the combinatorial explosion of potential patterns. GRADIENT is designed to search and understand that problem space.
A Distributed Processing Approach to Payroll Time Reporting for a Large School District.

ERIC Educational Resources Information Center

Freeman, Raoul J.

1983-01-01

Describes a system for payroll reporting from geographically disparate locations in which data is entered, edited, and verified locally on minicomputers and then uploaded to a central computer for the standard payroll process. Communications and hardware, time-reporting software, data input techniques, system implementation, and its advantages are…
Rarefaction Wave Eliminator Concepts For A Large Blast/Thermal Simulator.

DTIC Science & Technology

1985-02-01

hard copies of the pressure-time records. Final data process- ing was completed with the computer, printer , and plotter. Plots of pressure- time records...F ATTN: Prof 0. Zinke Fayetteville, AR 72701 Cdr, CRDC, AMCCOM ATTI: 4O-SPS-IL University of California PM=-J Lawrence Livermore Lab SOM-RSP-A ATTN
Operational evaluation of high-throughput community-based mass prophylaxis using Just-in-time training.

PubMed

Spitzer, James D; Hupert, Nathaniel; Duckart, Jonathan; Xiong, Wei

2007-01-01

Community-based mass prophylaxis is a core public health operational competency, but staffing needs may overwhelm the local trained health workforce. Just-in-time (JIT) training of emergency staff and computer modeling of workforce requirements represent two complementary approaches to address this logistical problem. Multnomah County, Oregon, conducted a high-throughput point of dispensing (POD) exercise to test JIT training and computer modeling to validate POD staffing estimates. The POD had 84% non-health-care worker staff and processed 500 patients per hour. Post-exercise modeling replicated observed staff utilization levels and queue formation, including development and amelioration of a large medical evaluation queue caused by lengthy processing times and understaffing in the first half-hour of the exercise. The exercise confirmed the feasibility of using JIT training for high-throughput antibiotic dispensing clinics staffed largely by nonmedical professionals. Patient processing times varied over the course of the exercise, with important implications for both staff reallocation and future POD modeling efforts. Overall underutilization of staff revealed the opportunity for greater efficiencies and even higher future throughputs.
Perspectives on the Future of CFD

NASA Technical Reports Server (NTRS)

Kwak, Dochan

2000-01-01

This viewgraph presentation gives an overview of the future of computational fluid dynamics (CFD), which in the past has pioneered the field of flow simulation. Over time CFD has progressed as computing power. Numerical methods have been advanced as CPU and memory capacity increases. Complex configurations are routinely computed now and direct numerical simulations (DNS) and large eddy simulations (LES) are used to study turbulence. As the computing resources changed to parallel and distributed platforms, computer science aspects such as scalability (algorithmic and implementation) and portability and transparent codings have advanced. Examples of potential future (or current) challenges include risk assessment, limitations of the heuristic model, and the development of CFD and information technology (IT) tools.
Application of computational aero-acoustics to real world problems

NASA Technical Reports Server (NTRS)

Hardin, Jay C.

1996-01-01

The application of computational aeroacoustics (CAA) to real problems is discussed in relation to the analysis performed with the aim of assessing the application of the various techniques. It is considered that the applications are limited by the inability of the computational resources to resolve the large range of scales involved in high Reynolds number flows. Possible simplifications are discussed. It is considered that problems remain to be solved in relation to the efficient use of the power of parallel computers and in the development of turbulent modeling schemes. The goal of CAA is stated as being the implementation of acoustic design studies on a computer terminal with reasonable run times.

a Recursive Approach to Compute Normal Forms

NASA Astrophysics Data System (ADS)

HSU, L.; MIN, L. J.; FAVRETTO, L.

2001-06-01

Normal forms are instrumental in the analysis of dynamical systems described by ordinary differential equations, particularly when singularities close to a bifurcation are to be characterized. However, the computation of a normal form up to an arbitrary order is numerically hard. This paper focuses on the computer programming of some recursive formulas developed earlier to compute higher order normal forms. A computer program to reduce the system to its normal form on a center manifold is developed using the Maple symbolic language. However, it should be stressed that the program relies essentially on recursive numerical computations, while symbolic calculations are used only for minor tasks. Some strategies are proposed to save computation time. Examples are presented to illustrate the application of the program to obtain high order normalization or to handle systems with large dimension.
Structural Analysis Methods for Structural Health Management of Future Aerospace Vehicles

NASA Technical Reports Server (NTRS)

Tessler, Alexander

2007-01-01

Two finite element based computational methods, Smoothing Element Analysis (SEA) and the inverse Finite Element Method (iFEM), are reviewed, and examples of their use for structural health monitoring are discussed. Due to their versatility, robustness, and computational efficiency, the methods are well suited for real-time structural health monitoring of future space vehicles, large space structures, and habitats. The methods may be effectively employed to enable real-time processing of sensing information, specifically for identifying three-dimensional deformed structural shapes as well as the internal loads. In addition, they may be used in conjunction with evolutionary algorithms to design optimally distributed sensors. These computational tools have demonstrated substantial promise for utilization in future Structural Health Management (SHM) systems.
The Red Atrapa Sismos (Quake Catcher Network in Mexico): assessing performance during large and damaging earthquakes.

USGS Publications Warehouse

Dominguez, Luis A.; Yildirim, Battalgazi; Husker, Allen L.; Cochran, Elizabeth S.; Christensen, Carl; Cruz-Atienza, Victor M.

2015-01-01

Each volunteer computer monitors ground motion and communicates using the Berkeley Open Infrastructure for Network Computing (BOINC, Anderson, 2004). Using a standard short‐term average, long‐term average (STLA) algorithm (Earle and Shearer, 1994; Cochran, Lawrence, Christensen, Chung, 2009; Cochran, Lawrence, Christensen, and Jakka, 2009), volunteer computer and sensor systems detect abrupt changes in the acceleration recordings. Each time a possible trigger signal is declared, a small package of information containing sensor and ground‐motion information is streamed to one of the QCN servers (Chung et al., 2011). Trigger signals, correlated in space and time, are then processed by the QCN server to look for potential earthquakes.
A computational procedure for multibody systems including flexible beam dynamics

NASA Technical Reports Server (NTRS)

Downer, J. D.; Park, K. C.; Chiou, J. C.

1990-01-01

A computational procedure suitable for the solution of equations of motions for flexible multibody systems has been developed. The flexible beams are modeled using a fully nonlinear theory which accounts for both finite rotations and large deformations. The present formulation incorporates physical measures of conjugate Cauchy stress and covariant strain increments. As a consequence, the beam model can easily be interfaced with real-time strain measurements and feedback control systems. A distinct feature of the present work is the computational preservation of total energy for undamped systems; this is obtained via an objective strain increment/stress update procedure combined with an energy-conserving time integration algorithm which contains an accurate update of angular orientations. The procedure is demonstrated via several example problems.
A FPGA-based architecture for real-time image matching

NASA Astrophysics Data System (ADS)

Wang, Jianhui; Zhong, Sheng; Xu, Wenhui; Zhang, Weijun; Cao, Zhiguo

2013-10-01

Image matching is a fundamental task in computer vision. It is used to establish correspondence between two images taken at different viewpoint or different time from the same scene. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a single FPGA-based image matching system, which consists of SIFT feature detection, BRIEF descriptor extraction and BRIEF matching. It optimizes the FPGA architecture for the SIFT feature detection to reduce the FPGA resources utilization. Moreover, we implement BRIEF description and matching on FPGA also. The proposed system can implement image matching at 30fps (frame per second) for 1280x720 images. Its processing speed can meet the demand of most real-life computer vision applications.
Adaptive compressive ghost imaging based on wavelet trees and sparse representation.

PubMed

Yu, Wen-Kai; Li, Ming-Fei; Yao, Xu-Ri; Liu, Xue-Feng; Wu, Ling-An; Zhai, Guang-Jie

2014-03-24

Compressed sensing is a theory which can reconstruct an image almost perfectly with only a few measurements by finding its sparsest representation. However, the computation time consumed for large images may be a few hours or more. In this work, we both theoretically and experimentally demonstrate a method that combines the advantages of both adaptive computational ghost imaging and compressed sensing, which we call adaptive compressive ghost imaging, whereby both the reconstruction time and measurements required for any image size can be significantly reduced. The technique can be used to improve the performance of all computational ghost imaging protocols, especially when measuring ultra-weak or noisy signals, and can be extended to imaging applications at any wavelength.
Fast Algorithms for Mining Co-evolving Time Series

DTIC Science & Technology

2011-09-01

Keogh et al., 2001, 2004] and (b) forecasting, like an autoregressive integrated moving average model ( ARIMA ) and related meth- ods [Box et al., 1994...computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the...sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models , in particular, including Linear Dynamical
Un algorithme efficace d'intégration plastique pour un matériau obéissant au critère anisotrope de Hill

NASA Astrophysics Data System (ADS)

Titeux, Isabelle; Li, Yuming M.; Debray, Karl; Guo, Ying-Qiao

2004-11-01

This Note deals with an efficient algorithm to carry out the plastic integration and compute the stresses due to large strains for materials satisfying the Hill's anisotropic yield criterion. The classical algorithm of plastic integration such as 'Return Mapping Method' is largely used for nonlinear analyses of structures and numerical simulations of forming processes, but it requires an iterative schema and may have convergence problems. A new direct algorithm based on a scalar method is developed which allows us to directly obtain the plastic multiplier without an iteration procedure; thus the computation time is largely reduced and the numerical problems are avoided. To cite this article: I. Titeux et al., C. R. Mecanique 332 (2004).
Parenting style, the home environment, and screen time of 5-year-old children; the 'be active, eat right' study.

PubMed

Veldhuis, Lydian; van Grieken, Amy; Renders, Carry M; Hirasing, Remy A; Raat, Hein

2014-01-01

The global increase in childhood overweight and obesity has been ascribed partly to increases in children's screen time. Parents have a large influence on their children's screen time. Studies investigating parenting and early childhood screen time are limited. In this study, we investigated associations of parenting style and the social and physical home environment on watching TV and using computers or game consoles among 5-year-old children. This study uses baseline data concerning 5-year-old children (n = 3067) collected for the 'Be active, eat right' study. Children of parents with a higher score on the parenting style dimension involvement, were more likely to spend >30 min/day on computers or game consoles. Overall, families with an authoritative or authoritarian parenting style had lower percentages of children's screen time compared to families with an indulgent or neglectful style, but no significant difference in OR was found. In families with rules about screen time, children were less likely to watch TV>2 hrs/day and more likely to spend >30 min/day on computers or game consoles. The number of TVs and computers or game consoles in the household was positively associated with screen time, and children with a TV or computer or game console in their bedroom were more likely to watch TV>2 hrs/day or spend >30 min/day on computers or game consoles. The magnitude of the association between parenting style and screen time of 5-year-olds was found to be relatively modest. The associations found between the social and physical environment and children's screen time are independent of parenting style. Interventions to reduce children's screen time might be most effective when they support parents specifically with introducing family rules related to screen time and prevent the presence of a TV or computer or game console in the child's room.
Parenting Style, the Home Environment, and Screen Time of 5-Year-Old Children; The ‘Be Active, Eat Right’ Study

PubMed Central

Veldhuis, Lydian; van Grieken, Amy; Renders, Carry M.; HiraSing, Remy A.; Raat, Hein

2014-01-01

Introduction The global increase in childhood overweight and obesity has been ascribed partly to increases in children's screen time. Parents have a large influence on their children's screen time. Studies investigating parenting and early childhood screen time are limited. In this study, we investigated associations of parenting style and the social and physical home environment on watching TV and using computers or game consoles among 5-year-old children. Methods This study uses baseline data concerning 5-year-old children (n = 3067) collected for the ‘Be active, eat right’ study. Results Children of parents with a higher score on the parenting style dimension involvement, were more likely to spend >30 min/day on computers or game consoles. Overall, families with an authoritative or authoritarian parenting style had lower percentages of children's screen time compared to families with an indulgent or neglectful style, but no significant difference in OR was found. In families with rules about screen time, children were less likely to watch TV>2 hrs/day and more likely to spend >30 min/day on computers or game consoles. The number of TVs and computers or game consoles in the household was positively associated with screen time, and children with a TV or computer or game console in their bedroom were more likely to watch TV>2 hrs/day or spend >30 min/day on computers or game consoles. Conclusion The magnitude of the association between parenting style and screen time of 5-year-olds was found to be relatively modest. The associations found between the social and physical environment and children's screen time are independent of parenting style. Interventions to reduce children's screen time might be most effective when they support parents specifically with introducing family rules related to screen time and prevent the presence of a TV or computer or game console in the child's room. PMID:24533092
Efficiency and large deviations in time-asymmetric stochastic heat engines

DOE PAGES

Gingrich, Todd R.; Rotskoff, Grant M.; Vaikuntanathan, Suriyanarayanan; ...

2014-10-24

In a stochastic heat engine driven by a cyclic non-equilibrium protocol, fluctuations in work and heat give rise to a fluctuating efficiency. Using computer simulations and tools from large deviation theory, we have examined these fluctuations in detail for a model two-state engine. We find in general that the form of efficiency probability distributions is similar to those described by Verley et al (2014 Nat. Commun. 5 4721), in particular featuring a local minimum in the long-time limit. In contrast to the time-symmetric engine protocols studied previously, however, this minimum need not occur at the value characteristic of a reversible Carnot engine. Furthermore, while the local minimum may reside at the global minimum of a large deviation rate function, it does not generally correspond to the least likely efficiency measured over finite time. Lastly, we introduce a general approximation for the finite-time efficiency distribution,more » $$P(\\eta )$$, based on large deviation statistics of work and heat, that remains very accurate even when $$P(\\eta )$$ deviates significantly from its large deviation form.« less
Computer versus paper--does it make any difference in test performance?

PubMed

Karay, Yassin; Schauber, Stefan K; Stosch, Christoph; Schüttpelz-Brauns, Katrin

2015-01-01

CONSTRUCT: In this study, we examine the differences in test performance between the paper-based and the computer-based version of the Berlin formative Progress Test. In this context it is the first study that allows controlling for students' prior performance. Computer-based tests make possible a more efficient examination procedure for test administration and review. Although university staff will benefit largely from computer-based tests, the question arises if computer-based tests influence students' test performance. A total of 266 German students from the 9th and 10th semester of medicine (comparable with the 4th-year North American medical school schedule) participated in the study (paper = 132, computer = 134). The allocation of the test format was conducted as a randomized matched-pair design in which students were first sorted according to their prior test results. The organizational procedure, the examination conditions, the room, and seating arrangements, as well as the order of questions and answers, were identical in both groups. The sociodemographic variables and pretest scores of both groups were comparable. The test results from the paper and computer versions did not differ. The groups remained within the allotted time, but students using the computer version (particularly the high performers) needed significantly less time to complete the test. In addition, we found significant differences in guessing behavior. Low performers using the computer version guess significantly more than low-performing students in the paper-pencil version. Participants in computer-based tests are not at a disadvantage in terms of their test results. The computer-based test required less processing time. The reason for the longer processing time when using the paper-pencil version might be due to the time needed to write the answer down, controlling for transferring the answer correctly. It is still not known why students using the computer version (particularly low-performing students) guess at a higher rate. Further studies are necessary to understand this finding.
Challenges and opportunities of cloud computing for atmospheric sciences

NASA Astrophysics Data System (ADS)

Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.

2016-04-01

Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
Squid - a simple bioinformatics grid.

PubMed

Carvalho, Paulo C; Glória, Rafael V; de Miranda, Antonio B; Degrave, Wim M

2005-08-03

BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
Development of seismic tomography software for hybrid supercomputers

NASA Astrophysics Data System (ADS)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Computer model to simulate testing at the National Transonic Facility

NASA Technical Reports Server (NTRS)

Mineck, Raymond E.; Owens, Lewis R., Jr.; Wahls, Richard A.; Hannon, Judith A.

1995-01-01

A computer model has been developed to simulate the processes involved in the operation of the National Transonic Facility (NTF), a large cryogenic wind tunnel at the Langley Research Center. The simulation was verified by comparing the simulated results with previously acquired data from three experimental wind tunnel test programs in the NTF. The comparisons suggest that the computer model simulates reasonably well the processes that determine the liquid nitrogen (LN2) consumption, electrical consumption, fan-on time, and the test time required to complete a test plan at the NTF. From these limited comparisons, it appears that the results from the simulation model are generally within about 10 percent of the actual NTF test results. The use of actual data acquisition times in the simulation produced better estimates of the LN2 usage, as expected. Additional comparisons are needed to refine the model constants. The model will typically produce optimistic results since the times and rates included in the model are typically the optimum values. Any deviation from the optimum values will lead to longer times or increased LN2 and electrical consumption for the proposed test plan. Computer code operating instructions and listings of sample input and output files have been included.
Quantum error correction in crossbar architectures

NASA Astrophysics Data System (ADS)

Helsen, Jonas; Steudtner, Mark; Veldhorst, Menno; Wehner, Stephanie

2018-07-01

A central challenge for the scaling of quantum computing systems is the need to control all qubits in the system without a large overhead. A solution for this problem in classical computing comes in the form of so-called crossbar architectures. Recently we made a proposal for a large-scale quantum processor (Li et al arXiv:1711.03807 (2017)) to be implemented in silicon quantum dots. This system features a crossbar control architecture which limits parallel single-qubit control, but allows the scheme to overcome control scaling issues that form a major hurdle to large-scale quantum computing systems. In this work, we develop a language that makes it possible to easily map quantum circuits to crossbar systems, taking into account their architecture and control limitations. Using this language we show how to map well known quantum error correction codes such as the planar surface and color codes in this limited control setting with only a small overhead in time. We analyze the logical error behavior of this surface code mapping for estimated experimental parameters of the crossbar system and conclude that logical error suppression to a level useful for real quantum computation is feasible.
Effect of local minima on adiabatic quantum optimization.

PubMed

Amin, M H S

2008-04-04

We present a perturbative method to estimate the spectral gap for adiabatic quantum optimization, based on the structure of the energy levels in the problem Hamiltonian. We show that, for problems that have an exponentially large number of local minima close to the global minimum, the gap becomes exponentially small making the computation time exponentially long. The quantum advantage of adiabatic quantum computation may then be accessed only via the local adiabatic evolution, which requires phase coherence throughout the evolution and knowledge of the spectrum. Such problems, therefore, are not suitable for adiabatic quantum computation.
Flood damage assessment using computer-assisted analysis of color infrared photography

USGS Publications Warehouse

Anderson, William H.

1978-01-01

Use of digitized aerial photographs for flood damage assessment in agriculture is new and largely untested. However, under flooding circumstances similar to the 1975 Red River Valley flood, computer-assisted techniques can be extremely useful, especially if detailed crop damage estimates are needed within a relatively short period of time.Airphoto interpretation techniques, manual or computer-assisted, are not intended to replace conventional ground survey and sampling procedures. But their use should be considered a valuable addition to the tools currently available for assessing agricultural flood damage.
Pulsar discovery by global volunteer computing.

PubMed

Knispel, B; Allen, B; Cordes, J M; Deneva, J S; Anderson, D; Aulbert, C; Bhat, N D R; Bock, O; Bogdanov, S; Brazier, A; Camilo, F; Champion, D J; Chatterjee, S; Crawford, F; Demorest, P B; Fehrmann, H; Freire, P C C; Gonzalez, M E; Hammer, D; Hessels, J W T; Jenet, F A; Kasian, L; Kaspi, V M; Kramer, M; Lazarus, P; van Leeuwen, J; Lorimer, D R; Lyne, A G; Machenschalk, B; McLaughlin, M A; Messenger, C; Nice, D J; Papa, M A; Pletsch, H J; Prix, R; Ransom, S M; Siemens, X; Stairs, I H; Stappers, B W; Stovall, K; Venkataraman, A

2010-09-10

Einstein@Home aggregates the computer power of hundreds of thousands of volunteers from 192 countries to mine large data sets. It has now found a 40.8-hertz isolated pulsar in radio survey data from the Arecibo Observatory taken in February 2007. Additional timing observations indicate that this pulsar is likely a disrupted recycled pulsar. PSR J2007+2722's pulse profile is remarkably wide with emission over almost the entire spin period; the pulsar likely has closely aligned magnetic and spin axes. The massive computing power provided by volunteers should enable many more such discoveries.

Self-Calibrated In-Process Photogrammetry for Large Raw Part Measurement and Alignment before Machining

PubMed Central

Mendikute, Alberto; Zatarain, Mikel; Bertelsen, Álvaro; Leizea, Ibai

2017-01-01

Photogrammetry methods are being used more and more as a 3D technique for large scale metrology applications in industry. Optical targets are placed on an object and images are taken around it, where measuring traceability is provided by precise off-process pre-calibrated digital cameras and scale bars. According to the 2D target image coordinates, target 3D coordinates and camera views are jointly computed. One of the applications of photogrammetry is the measurement of raw part surfaces prior to its machining. For this application, post-process bundle adjustment has usually been adopted for computing the 3D scene. With that approach, a high computation time is observed, leading in practice to time consuming and user dependent iterative review and re-processing procedures until an adequate set of images is taken, limiting its potential for fast, easy-to-use, and precise measurements. In this paper, a new efficient procedure is presented for solving the bundle adjustment problem in portable photogrammetry. In-process bundle computing capability is demonstrated on a consumer grade desktop PC, enabling quasi real time 2D image and 3D scene computing. Additionally, a method for the self-calibration of camera and lens distortion has been integrated into the in-process approach due to its potential for highest precision when using low cost non-specialized digital cameras. Measurement traceability is set only by scale bars available in the measuring scene, avoiding the uncertainty contribution of off-process camera calibration procedures or the use of special purpose calibration artifacts. The developed self-calibrated in-process photogrammetry has been evaluated both in a pilot case scenario and in industrial scenarios for raw part measurement, showing a total in-process computing time typically below 1 s per image up to a maximum of 2 s during the last stages of the computed industrial scenes, along with a relative precision of 1/10,000 (e.g., 0.1 mm error in 1 m) with an error RMS below 0.2 pixels at image plane, ranging at the same performance reported for portable photogrammetry with precise off-process pre-calibrated cameras. PMID:28891946
Self-Calibrated In-Process Photogrammetry for Large Raw Part Measurement and Alignment before Machining.

PubMed

Mendikute, Alberto; Yagüe-Fabra, José A; Zatarain, Mikel; Bertelsen, Álvaro; Leizea, Ibai

2017-09-09

Photogrammetry methods are being used more and more as a 3D technique for large scale metrology applications in industry. Optical targets are placed on an object and images are taken around it, where measuring traceability is provided by precise off-process pre-calibrated digital cameras and scale bars. According to the 2D target image coordinates, target 3D coordinates and camera views are jointly computed. One of the applications of photogrammetry is the measurement of raw part surfaces prior to its machining. For this application, post-process bundle adjustment has usually been adopted for computing the 3D scene. With that approach, a high computation time is observed, leading in practice to time consuming and user dependent iterative review and re-processing procedures until an adequate set of images is taken, limiting its potential for fast, easy-to-use, and precise measurements. In this paper, a new efficient procedure is presented for solving the bundle adjustment problem in portable photogrammetry. In-process bundle computing capability is demonstrated on a consumer grade desktop PC, enabling quasi real time 2D image and 3D scene computing. Additionally, a method for the self-calibration of camera and lens distortion has been integrated into the in-process approach due to its potential for highest precision when using low cost non-specialized digital cameras. Measurement traceability is set only by scale bars available in the measuring scene, avoiding the uncertainty contribution of off-process camera calibration procedures or the use of special purpose calibration artifacts. The developed self-calibrated in-process photogrammetry has been evaluated both in a pilot case scenario and in industrial scenarios for raw part measurement, showing a total in-process computing time typically below 1 s per image up to a maximum of 2 s during the last stages of the computed industrial scenes, along with a relative precision of 1/10,000 (e.g. 0.1 mm error in 1 m) with an error RMS below 0.2 pixels at image plane, ranging at the same performance reported for portable photogrammetry with precise off-process pre-calibrated cameras.
Parallel Computational Protein Design.

PubMed

Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang

2017-01-01

Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
Hybrid techniques for the digital control of mechanical and optical systems

NASA Astrophysics Data System (ADS)

Acernese, Fausto; Barone, Fabrizio; De Rosa, Rosario; Eleuteri, Antonio; Milano, Leopoldo; Pardi, Silvio; Ricciardi, Iolanda; Russo, Guido

2004-07-01

One of the main requirements of a digital system for the control of interferometric detectors of gravitational waves is the computing power, that is a direct consequence of the increasing complexity of the digital algorithms necessary for the control signals generation. For this specific task many specialised non standard real-time architectures have been developed, often very expensive and difficult to upgrade. On the other hand, such computing power is generally fully available for off-line applications on standard Pc based systems. Therefore, a possible and obvious solution may be provided by the integration of both the the real-time and off-line architecture resulting in a hybrid control system architecture based on standards available components, trying to get both the advantages of the perfect data synchronization provided by the real-time systems and by the large computing power available on Pc based systems. Such integration may be provided by the implementation of the link between the two different architectures through the standard Ethernet network, whose data transfer speed is largely increasing in these years, using the TCP/IP and UDP protocols. In this paper we describe the architecture of an hybrid Ethernet based real-time control system protoype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss a possible application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection (IDGW-3P) operational in Napoli.
Classification of large-scale fundus image data sets: a cloud-computing framework.

PubMed

Roychowdhury, Sohini

2016-08-01

Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
Robot vibration control using inertial damping forces

NASA Technical Reports Server (NTRS)

Lee, Soo Han; Book, Wayne J.

1991-01-01

This paper concerns the suppression of the vibration of a large flexible robot by inertial forces of a small robot which is located at the tip of the large robot. A controller for generating damping forces to a large robot is designed based on the two time scale model. The controller does not need to calculate the quasi-steady variables and is efficient in computation. Simulation results show the effectiveness of the inertial forces and the controller designed.
Robot vibration control using inertial damping forces

NASA Technical Reports Server (NTRS)

Lee, Soo Han; Book, Wayne J.

1989-01-01

The suppression is examined of the vibration of a large flexible robot by inertial forces of a small robot which is located at the tip of the large robot. A controller for generating damping forces to a large robot is designed based on the two time scale mode. The controller does not need to calculate the quasi-steady state variables and is efficient in computation. Simulation results show the effectiveness of the inertial forces and the controller designed.
The role of the host in a cooperating mainframe and workstation environment, volumes 1 and 2

NASA Technical Reports Server (NTRS)

Kusmanoff, Antone; Martin, Nancy L.

1989-01-01

In recent years, advancements made in computer systems have prompted a move from centralized computing based on timesharing a large mainframe computer to distributed computing based on a connected set of engineering workstations. A major factor in this advancement is the increased performance and lower cost of engineering workstations. The shift to distributed computing from centralized computing has led to challenges associated with the residency of application programs within the system. In a combined system of multiple engineering workstations attached to a mainframe host, the question arises as to how does a system designer assign applications between the larger mainframe host and the smaller, yet powerful, workstation. The concepts related to real time data processing are analyzed and systems are displayed which use a host mainframe and a number of engineering workstations interconnected by a local area network. In most cases, distributed systems can be classified as having a single function or multiple functions and as executing programs in real time or nonreal time. In a system of multiple computers, the degree of autonomy of the computers is important; a system with one master control computer generally differs in reliability, performance, and complexity from a system in which all computers share the control. This research is concerned with generating general criteria principles for software residency decisions (host or workstation) for a diverse yet coupled group of users (the clustered workstations) which may need the use of a shared resource (the mainframe) to perform their functions.
Cone-Beam Computed Tomography–Guided Positioning of Laryngeal Cancer Patients with Large Interfraction Time Trends in Setup and Nonrigid Anatomy Variations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gangsaas, Anne, E-mail: a.gangsaas@erasmusmc.nl; Astreinidou, Eleftheria; Quint, Sandra

2013-10-01

Purpose: To investigate interfraction setup variations of the primary tumor, elective nodes, and vertebrae in laryngeal cancer patients and to validate protocols for cone beam computed tomography (CBCT)-guided correction. Methods and Materials: For 30 patients, CBCT-measured displacements in fractionated treatments were used to investigate population setup errors and to simulate residual setup errors for the no action level (NAL) offline protocol, the extended NAL (eNAL) protocol, and daily CBCT acquisition with online analysis and repositioning. Results: Without corrections, 12 of 26 patients treated with radical radiation therapy would have experienced a gradual change (time trend) in primary tumor setup ≥4more » mm in the craniocaudal (CC) direction during the fractionated treatment (11/12 in caudal direction, maximum 11 mm). Due to these trends, correction of primary tumor displacements with NAL resulted in large residual CC errors (required margin 6.7 mm). With the weekly correction vector adjustments in eNAL, the trends could be largely compensated (CC margin 3.5 mm). Correlation between movements of the primary and nodal clinical target volumes (CTVs) in the CC direction was poor (r{sup 2}=0.15). Therefore, even with online setup corrections of the primary CTV, the required CC margin for the nodal CTV was as large as 6.8 mm. Also for the vertebrae, large time trends were observed for some patients. Because of poor CC correlation (r{sup 2}=0.19) between displacements of the primary CTV and the vertebrae, even with daily online repositioning of the vertebrae, the required CC margin around the primary CTV was 6.9 mm. Conclusions: Laryngeal cancer patients showed substantial interfraction setup variations, including large time trends, and poor CC correlation between primary tumor displacements and motion of the nodes and vertebrae (internal tumor motion). These trends and nonrigid anatomy variations have to be considered in the choice of setup verification protocol and planning target volume margins. eNAL could largely compensate time trends with minor prolongation of fraction time.« less
MIDAS, prototype Multivariate Interactive Digital Analysis System, phase 1. Volume 1: System description

NASA Technical Reports Server (NTRS)

Kriegler, F. J.

1974-01-01

The MIDAS System is described as a third-generation fast multispectral recognition system able to keep pace with the large quantity and high rates of data acquisition from present and projected sensors. A principal objective of the MIDAS program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turnaround time and significant gains in throughput. The hardware and software are described. The system contains a mini-computer to control the various high-speed processing elements in the data path, and a classifier which implements an all-digital prototype multivariate-Gaussian maximum likelihood decision algorithm operating at 200,000 pixels/sec. Sufficient hardware was developed to perform signature extraction from computer-compatible tapes, compute classifier coefficients, control the classifier operation, and diagnose operation.
The nonequilibrium quantum many-body problem as a paradigm for extreme data science

NASA Astrophysics Data System (ADS)

Freericks, J. K.; Nikolić, B. K.; Frieder, O.

2014-12-01

Generating big data pervades much of physics. But some problems, which we call extreme data problems, are too large to be treated within big data science. The nonequilibrium quantum many-body problem on a lattice is just such a problem, where the Hilbert space grows exponentially with system size and rapidly becomes too large to fit on any computer (and can be effectively thought of as an infinite-sized data set). Nevertheless, much progress has been made with computational methods on this problem, which serve as a paradigm for how one can approach and attack extreme data problems. In addition, viewing these physics problems from a computer-science perspective leads to new approaches that can be tried to solve more accurately and for longer times. We review a number of these different ideas here.
Least Squares Shadowing Sensitivity Analysis of Chaotic Flow Around a Two-Dimensional Airfoil

NASA Technical Reports Server (NTRS)

Blonigan, Patrick J.; Wang, Qiqi; Nielsen, Eric J.; Diskin, Boris

2016-01-01

Gradient-based sensitivity analysis has proven to be an enabling technology for many applications, including design of aerospace vehicles. However, conventional sensitivity analysis methods break down when applied to long-time averages of chaotic systems. This breakdown is a serious limitation because many aerospace applications involve physical phenomena that exhibit chaotic dynamics, most notably high-resolution large-eddy and direct numerical simulations of turbulent aerodynamic flows. A recently proposed methodology, Least Squares Shadowing (LSS), avoids this breakdown and advances the state of the art in sensitivity analysis for chaotic flows. The first application of LSS to a chaotic flow simulated with a large-scale computational fluid dynamics solver is presented. The LSS sensitivity computed for this chaotic flow is verified and shown to be accurate, but the computational cost of the current LSS implementation is high.
HPC enabled real-time remote processing of laparoscopic surgery

NASA Astrophysics Data System (ADS)

Ronaghi, Zahra; Sapra, Karan; Izard, Ryan; Duffy, Edward; Smith, Melissa C.; Wang, Kuang-Ching; Kwartowitz, David M.

2016-03-01

Laparoscopic surgery is a minimally invasive surgical technique. The benefit of small incisions has a disadvantage of limited visualization of subsurface tissues. Image-guided surgery (IGS) uses pre-operative and intra-operative images to map subsurface structures. One particular laparoscopic system is the daVinci-si robotic surgical system. The video streams generate approximately 360 megabytes of data per second. Real-time processing this large stream of data on a bedside PC, single or dual node setup, has become challenging and a high-performance computing (HPC) environment may not always be available at the point of care. To process this data on remote HPC clusters at the typical 30 frames per second rate, it is required that each 11.9 MB video frame be processed by a server and returned within 1/30th of a second. We have implement and compared performance of compression, segmentation and registration algorithms on Clemson's Palmetto supercomputer using dual NVIDIA K40 GPUs per node. Our computing framework will also enable reliability using replication of computation. We will securely transfer the files to remote HPC clusters utilizing an OpenFlow-based network service, Steroid OpenFlow Service (SOS) that can increase performance of large data transfers over long-distance and high bandwidth networks. As a result, utilizing high-speed OpenFlow- based network to access computing clusters with GPUs will improve surgical procedures by providing real-time medical image processing and laparoscopic data.
Building A Community Focused Data and Modeling Collaborative platform with Hardware Virtualization Technology

NASA Astrophysics Data System (ADS)

Michaelis, A.; Wang, W.; Melton, F. S.; Votava, P.; Milesi, C.; Hashimoto, H.; Nemani, R. R.; Hiatt, S. H.

2009-12-01

As the length and diversity of the global earth observation data records grow, modeling and analyses of biospheric conditions increasingly requires multiple terabytes of data from a diversity of models and sensors. With network bandwidth beginning to flatten, transmission of these data from centralized data archives presents an increasing challenge, and costs associated with local storage and management of data and compute resources are often significant for individual research and application development efforts. Sharing community valued intermediary data sets, results and codes from individual efforts with others that are not in direct funded collaboration can also be a challenge with respect to time, cost and expertise. We purpose a modeling, data and knowledge center that houses NASA satellite data, climate data and ancillary data where a focused community may come together to share modeling and analysis codes, scientific results, knowledge and expertise on a centralized platform, named Ecosystem Modeling Center (EMC). With the recent development of new technologies for secure hardware virtualization, an opportunity exists to create specific modeling, analysis and compute environments that are customizable, “archiveable” and transferable. Allowing users to instantiate such environments on large compute infrastructures that are directly connected to large data archives may significantly reduce costs and time associated with scientific efforts by alleviating users from redundantly retrieving and integrating data sets and building modeling analysis codes. The EMC platform also provides the possibility for users receiving indirect assistance from expertise through prefabricated compute environments, potentially reducing study “ramp up” times.
Biocellion: accelerating computer simulation of multicellular biological system models.

PubMed

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-11-01

Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Use of DAGMan in CRAB3 to Improve the Splitting of CMS User Jobs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, M.; Mascheroni, M.; Woodard, A.

CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distributemore » the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.« less
Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs

NASA Astrophysics Data System (ADS)

Wolf, M.; Mascheroni, M.; Woodard, A.; Belforte, S.; Bockelman, B.; Hernandez, J. M.; Vaandering, E.

2017-10-01

CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distribute the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.
Using an object-based grid system to evaluate a newly developed EP approach to formulate SVMs as applied to the classification of organophosphate nerve agents

NASA Astrophysics Data System (ADS)

Land, Walker H., Jr.; Lewis, Michael; Sadik, Omowunmi; Wong, Lut; Wanekaya, Adam; Gonzalez, Richard J.; Balan, Arun

2004-04-01

This paper extends the classification approaches described in reference [1] in the following way: (1.) developing and evaluating a new method for evolving organophosphate nerve agent Support Vector Machine (SVM) classifiers using Evolutionary Programming, (2.) conducting research experiments using a larger database of organophosphate nerve agents, and (3.) upgrading the architecture to an object-based grid system for evaluating the classification of EP derived SVMs. Due to the increased threats of chemical and biological weapons of mass destruction (WMD) by international terrorist organizations, a significant effort is underway to develop tools that can be used to detect and effectively combat biochemical warfare. This paper reports the integration of multi-array sensors with Support Vector Machines (SVMs) for the detection of organophosphates nerve agents using a grid computing system called Legion. Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. Finally, preliminary results using EP derived support vector machines designed to operate on distributed systems have provided accurate classification results. In addition, distributed training time architectures are 50 times faster when compared to standard iterative training time methods.
Tools and Techniques for Measuring and Improving Grid Performance

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Frumkin, M.; Smith, W.; VanderWijngaart, R.; Wong, P.; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.
Computer hardware and software for robotic control

NASA Technical Reports Server (NTRS)

Davis, Virgil Leon

1987-01-01

The KSC has implemented an integrated system that coordinates state-of-the-art robotic subsystems. It is a sensor based real-time robotic control system performing operations beyond the capability of an off-the-shelf robot. The integrated system provides real-time closed loop adaptive path control of position and orientation of all six axes of a large robot; enables the implementation of a highly configurable, expandable testbed for sensor system development; and makes several smart distributed control subsystems (robot arm controller, process controller, graphics display, and vision tracking) appear as intelligent peripherals to a supervisory computer coordinating the overall systems.

Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline

DOE PAGES

Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.

2016-09-28

A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.

A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
HIFiRE-1 Turbulent Shock Boundary Layer Interaction - Flight Data and Computations

NASA Technical Reports Server (NTRS)

Kimmel, Roger L.; Prabhu, Dinesh

2015-01-01

The Hypersonic International Flight Research Experimentation (HIFiRE) program is a hypersonic flight test program executed by the Air Force Research Laboratory (AFRL) and Australian Defence Science and Technology Organisation (DSTO). This flight contained a cylinder-flare induced shock boundary layer interaction (SBLI). Computations of the interaction were conducted for a number of times during the ascent. The DPLR code used for predictions was calibrated against ground test data prior to exercising the code at flight conditions. Generally, the computations predicted the upstream influence and interaction pressures very well. Plateau pressures on the cylinder were predicted well at all conditions. Although the experimental heat transfer showed a large amount of scatter, especially at low heating levels, the measured heat transfer agreed well with computations. The primary discrepancy between the experiment and computation occurred in the pressures measured on the flare during second stage burn. Measured pressures exhibited large overshoots late in the second stage burn, the mechanism of which is unknown. The good agreement between flight measurements and CFD helps validate the philosophy of calibrating CFD against ground test, prior to exercising it at flight conditions.
Improving User Notification on Frequently Changing HPC Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fuson, Christopher B; Renaud, William A

2016-01-01

Today s HPC centers user environments can be very complex. Centers often contain multiple large complicated computational systems each with their own user environment. Changes to a system s environment can be very impactful; however, a center s user environment is, in one-way or another, frequently changing. Because of this, it is vital for centers to notify users of change. For users, untracked changes can be costly, resulting in unnecessary debug time as well as wasting valuable compute allocations and research time. Communicating frequent change to diverse user communities is a common and ongoing task for HPC centers. This papermore » will cover the OLCF s current processes and methods used to communicate change to users of the center s large Cray systems and supporting resources. The paper will share lessons learned and goals as well as practices, tools, and methods used to continually improve and reach members of the OLCF user community.« less
Optical computed tomography in PRESAGE® three-dimensional dosimetry: Challenges and prospective.

PubMed

Khezerloo, Davood; Nedaie, Hassan Ali; Farhood, Bagher; Zirak, Alireza; Takavar, Abbas; Banaee, Nooshin; Ahmadalidokht, Isa; Kron, Tomas

2017-01-01

With the advent of new complex but precise radiotherapy techniques, the demands for an accurate, feasible three-dimensional (3D) dosimetry system have been increased. A 3D dosimeter system generally should not only have accurate and precise results but should also feasible, inexpensive, and time consuming. Recently, one of the new candidates for 3D dosimetry is optical computed tomography (CT) with a radiochromic dosimeter such as PRESAGE®. Several generations of optical CT have been developed since the 90s. At the same time, a large attempt has been also done to introduce the robust dosimeters that compatible with optical CT scanners. In 2004, PRESAGE® dosimeter as a new radiochromic solid plastic dosimeters was introduced. In this decade, a large number of efforts have been carried out to enhance optical scanning methods. This article attempts to review and reflect on the results of these investigations.
Automatic measurements and computations for radiochemical analyses

USGS Publications Warehouse

Rosholt, J.N.; Dooley, J.R.

1960-01-01

In natural radioactive sources the most important radioactive daughter products useful for geochemical studies are protactinium-231, the alpha-emitting thorium isotopes, and the radium isotopes. To resolve the abundances of these thorium and radium isotopes by their characteristic decay and growth patterns, a large number of repeated alpha activity measurements on the two chemically separated elements were made over extended periods of time. Alpha scintillation counting with automatic measurements and sample changing is used to obtain the basic count data. Generation of the required theoretical decay and growth functions, varying with time, and the least squares solution of the overdetermined simultaneous count rate equations are done with a digital computer. Examples of the complex count rate equations which may be solved and results of a natural sample containing four ??-emitting isotopes of thorium are illustrated. These methods facilitate the determination of the radioactive sources on the large scale required for many geochemical investigations.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

PubMed

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Large-Eddy Simulation of Aeroacoustic Applications

NASA Technical Reports Server (NTRS)

Pruett, C. David; Sochacki, James S.

1999-01-01

This report summarizes work accomplished under a one-year NASA grant from NASA Langley Research Center (LaRC). The effort culminates three years of NASA-supported research under three consecutive one-year grants. The period of support was April 6, 1998, through April 5, 1999. By request, the grant period was extended at no-cost until October 6, 1999. Its predecessors have been directed toward adapting the numerical tool of large-eddy simulation (LES) to aeroacoustic applications, with particular focus on noise suppression in subsonic round jets. In LES, the filtered Navier-Stokes equations are solved numerically on a relatively coarse computational grid. Residual stresses, generated by scales of motion too small to be resolved on the coarse grid, are modeled. Although most LES incorporate spatial filtering, time-domain filtering affords certain conceptual and computational advantages, particularly for aeroacoustic applications. Consequently, this work has focused on the development of subgrid-scale (SGS) models that incorporate time-domain filters.
Time-dependent compressibility of poly (methyl methacrylate) (PMMA) : an experimental and molecular dynamics investigation

NASA Astrophysics Data System (ADS)

Sane, Sandeep Bhalchandra

This thesis contains three chapters, which describe different aspects of an investigation of the bulk response of Poly(Methyl Methacrylate) (PMMA). The first chapter describes the physical measurements by means of a Belcher/McKinney-type apparatus. Used earlier for the measurement of the bulk response of Poly(Vinyl Acetate), it was now adapted for making measurements at higher temperatures commensurate with the glass transition temperature of PMMA. The dynamic bulk compliance of PMMA was measured at atmospheric pressure over a wide range of temperatures and frequencies, from which the master curves for the bulk compliance were generated by means of the time-temperature superposition principle. It was found that the extent of the transition ranges for the bulk and shear response were comparable. Comparison of the shift factors for bulk and shear responses supports the idea that different molecular mechanisms contribute to shear and bulk deformations. The second chapter delineates molecular dynamics computations for the bulk response for a range of pressures and temperatures. The model(s) consisted of 2256 atoms formed into three polymer chains with fifty monomer units per chain per unit cell. The time scales accessed were limited to tens of pico seconds. It was found that, in addition to the typical energy minimization and temperature annealing cycles for establishing equilibrium models, it is advantageous to subject the model samples to a cycle of relatively large pressures (GPa-range) for improving the equilibrium state. On comparing the computations with the experimentally determined "glassy" behavior, one finds that, although the computations were limited to small samples in a physical sense, the primary limitation rests in the very short times (pico seconds). The molecular dynamics computations do not model the physically observed temperature sensitivity of PMMA, even if one employs a hypothetical time-temperature shift to account for the large difference in time scales between experiment and computation. The values computed by the molecular dynamics method do agree with the values measured at the coldest temperature and at the highest frequency of one kiloHertz. The third chapter draws on measurements of uniaxial, shear and Poisson response conducted previously in our laboratory. With the availability of four time or frequency-dependent material functions for the same material, the process of interconversion between different material functions was investigated. Computed material functions were evaluated against the direct experimental measurements and the limitations imposed on successful interconversion due to the experimental errors in the underlying physical data were explored. Differences were observed that are larger than the experimental errors would suggest.
Numerical Simulation of Forced and Free-to-Roll Delta-Wing Motions

NASA Technical Reports Server (NTRS)

Chaderjian, Neal M.; Schiff, Lewis B.

1996-01-01

The three-dimensional, Reynolds-averaged, Navier-Stokes (RANS) equations are used to numerically simulate nonsteady vortical flow about a 65-deg sweep delta wing at 30-deg angle of attack. Two large-amplitude, high-rate, forced-roll motions, and a damped free-to-roll motion are presented. The free-to-roll motion is computed by coupling the time-dependent RANS equations to the flight dynamic equation of motion. The computed results are in good agreement with the forces, moments, and roll-angle time histories. Vortex breakdown is present in each case. Significant time lags in the vortex breakdown motions relative to the body motions strongly influence the dynamic forces and moments.
Trends in life science grid: from computing grid to knowledge grid.

PubMed

Konagaya, Akihiko

2006-12-18

Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Trends in life science grid: from computing grid to knowledge grid

PubMed Central

Konagaya, Akihiko

2006-01-01

Background Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. Results This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Conclusion Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community. PMID:17254294
Numerical computation of spherical harmonics of arbitrary degree and order by extending exponent of floating point numbers

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

2012-04-01

By extending the exponent of floating point numbers with an additional integer as the power index of a large radix, we compute fully normalized associated Legendre functions (ALF) by recursion without underflow problem. The new method enables us to evaluate ALFs of extremely high degree as 232 = 4,294,967,296, which corresponds to around 1 cm resolution on the Earth's surface. By limiting the application of exponent extension to a few working variables in the recursion, choosing a suitable large power of 2 as the radix, and embedding the contents of the basic arithmetic procedure of floating point numbers with the exponent extension directly in the program computing the recurrence formulas, we achieve the evaluation of ALFs in the double-precision environment at the cost of around 10% increase in computational time per single ALF. This formulation realizes meaningful execution of the spherical harmonic synthesis and/or analysis of arbitrary degree and order.
A large-scale solar dynamics observatory image dataset for computer vision applications.

PubMed

Kucuk, Ahmet; Banda, Juan M; Angryk, Rafal A

2017-01-01

The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
Computational tools for exact conditional logistic regression.

PubMed

Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P

Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
Accelerating three-dimensional FDTD calculations on GPU clusters for electromagnetic field simulation.

PubMed

Nagaoka, Tomoaki; Watanabe, Soichi

2012-01-01

Electromagnetic simulation with anatomically realistic computational human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the computational human model, we adapt three-dimensional FDTD code to a multi-GPU cluster environment with Compute Unified Device Architecture and Message Passing Interface. Our multi-GPU cluster system consists of three nodes. The seven GPU boards (NVIDIA Tesla C2070) are mounted on each node. We examined the performance of the FDTD calculation on multi-GPU cluster environment. We confirmed that the FDTD calculation on the multi-GPU clusters is faster than that on a multi-GPU (a single workstation), and we also found that the GPU cluster system calculate faster than a vector supercomputer. In addition, our GPU cluster system allowed us to perform the large-scale FDTD calculation because were able to use GPU memory of over 100 GB.
A Parallel Pipelined Renderer for the Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Chiueh, Tzi-Cker; Ma, Kwan-Liu

1997-01-01

This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
Design of a high-speed digital processing element for parallel simulation

NASA Technical Reports Server (NTRS)

Milner, E. J.; Cwynar, D. S.

1983-01-01

A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
Computational singular perturbation analysis of stochastic chemical systems with stiffness

NASA Astrophysics Data System (ADS)

Wang, Lijin; Han, Xiaoying; Cao, Yanzhao; Najm, Habib N.

2017-04-01

Computational singular perturbation (CSP) is a useful method for analysis, reduction, and time integration of stiff ordinary differential equation systems. It has found dominant utility, in particular, in chemical reaction systems with a large range of time scales at continuum and deterministic level. On the other hand, CSP is not directly applicable to chemical reaction systems at micro or meso-scale, where stochasticity plays an non-negligible role and thus has to be taken into account. In this work we develop a novel stochastic computational singular perturbation (SCSP) analysis and time integration framework, and associated algorithm, that can be used to not only construct accurately and efficiently the numerical solutions to stiff stochastic chemical reaction systems, but also analyze the dynamics of the reduced stochastic reaction systems. The algorithm is illustrated by an application to a benchmark stochastic differential equation model, and numerical experiments are carried out to demonstrate the effectiveness of the construction.
Viscous-inviscid interaction method including wake effects for three-dimensional wing-body configurations

NASA Technical Reports Server (NTRS)

Streett, C. L.

1981-01-01

A viscous-inviscid interaction method has been developed by using a three-dimensional integral boundary-layer method which produces results in good agreement with a finite-difference method in a fraction of the computer time. The integral method is stable and robust and incorporates a model for computation in a small region of streamwise separation. A locally two-dimensional wake model, accounting for thickness and curvature effects, is also included in the interaction procedure. Computation time spent in converging an interacted result is, many times, only slightly greater than that required to converge an inviscid calculation. Results are shown from the interaction method, run at experimental angle of attack, Reynolds number, and Mach number, on a wing-body test case for which viscous effects are large. Agreement with experiment is good; in particular, the present wake model improves prediction of the spanwise lift distribution and lower surface cove pressure.

Boundary Conditions for Jet Flow Computations

NASA Technical Reports Server (NTRS)

Hayder, M. E.; Turkel, E.

1994-01-01

Ongoing activities are focused on capturing the sound source in a supersonic jet through careful large eddy simulation (LES). One issue that is addressed is the effect of the boundary conditions, both inflow and outflow, on the predicted flow fluctuations, which represent the sound source. In this study, we examine the accuracy of several boundary conditions to determine their suitability for computations of time-dependent flows. Various boundary conditions are used to compute the flow field of a laminar axisymmetric jet excited at the inflow by a disturbance given by the corresponding eigenfunction of the linearized stability equations. We solve the full time dependent Navier-Stokes equations by a high order numerical scheme. For very small excitations, the computed growth of the modes closely corresponds to that predicted by the linear theory. We then vary the excitation level to see the effect of the boundary conditions in the nonlinear flow regime.
Real-time depth processing for embedded platforms

NASA Astrophysics Data System (ADS)

Rahnama, Oscar; Makarov, Aleksej; Torr, Philip

2017-05-01

Obtaining depth information of a scene is an important requirement in many computer-vision and robotics applications. For embedded platforms, passive stereo systems have many advantages over their active counterparts (i.e. LiDAR, Infrared). They are power efficient, cheap, robust to lighting conditions and inherently synchronized to the RGB images of the scene. However, stereo depth estimation is a computationally expensive task that operates over large amounts of data. For embedded applications which are often constrained by power consumption, obtaining accurate results in real-time is a challenge. We demonstrate a computationally and memory efficient implementation of a stereo block-matching algorithm in FPGA. The computational core achieves a throughput of 577 fps at standard VGA resolution whilst consuming less than 3 Watts of power. The data is processed using an in-stream approach that minimizes memory-access bottlenecks and best matches the raster scan readout of modern digital image sensors.
Bioinspired principles for large-scale networked sensor systems: an overview.

PubMed

Jacobsen, Rune Hylsberg; Zhang, Qi; Toftegaard, Thomas Skjødeberg

2011-01-01

Biology has often been used as a source of inspiration in computer science and engineering. Bioinspired principles have found their way into network node design and research due to the appealing analogies between biological systems and large networks of small sensors. This paper provides an overview of bioinspired principles and methods such as swarm intelligence, natural time synchronization, artificial immune system and intercellular information exchange applicable for sensor network design. Bioinspired principles and methods are discussed in the context of routing, clustering, time synchronization, optimal node deployment, localization and security and privacy.
Recent Scientific Evidence and Technical Developments in Cardiovascular Computed Tomography.

PubMed

Marcus, Roy; Ruff, Christer; Burgstahler, Christof; Notohamiprodjo, Mike; Nikolaou, Konstantin; Geisler, Tobias; Schroeder, Stephen; Bamberg, Fabian

2016-05-01

In recent years, coronary computed tomography angiography has become an increasingly safe and noninvasive modality for the evaluation of the anatomical structure of the coronary artery tree with diagnostic benefits especially in patients with a low-to-intermediate pretest probability of disease. Currently, increasing evidence from large randomized diagnostic trials is accumulating on the diagnostic impact of computed tomography angiography for the management of patients with acute and stable chest pain syndrome. At the same time, technical advances have substantially reduced adverse effects and limiting factors, such as radiation exposure, the amount of iodinated contrast agent, and scanning time, rendering the technique appropriate for broader clinical applications. In this work, we review the latest developments in computed tomography technology and describe the scientific evidence on the use of cardiac computed tomography angiography to evaluate patients with acute and stable chest pain syndrome. Copyright © 2016 Sociedad Española de Cardiología. Published by Elsevier España, S.L.U. All rights reserved.
Distributed Issues for Ada Real-Time Systems

DTIC Science & Technology

1990-07-23

NUMBERS Distributed Issues for Ada Real - Time Systems MDA 903-87- C- 0056 S. AUTHOR(S) Thomas E. Griest 7. PERFORMING ORGANiZATION NAME(S) AND ADORESS(ES) 8...considerations. I Adding to the problem of distributed real - time systems is the issue of maintaining a common sense of time among all of the processors...because -omeone is waiting for the final output of a very large set of computations. However in real - time systems , consistent meeting of short-term
Evaluation of Kirkwood-Buff integrals via finite size scaling: a large scale molecular dynamics study

NASA Astrophysics Data System (ADS)

Dednam, W.; Botha, A. E.

2015-01-01

Solvation of bio-molecules in water is severely affected by the presence of co-solvent within the hydration shell of the solute structure. Furthermore, since solute molecules can range from small molecules, such as methane, to very large protein structures, it is imperative to understand the detailed structure-function relationship on the microscopic level. For example, it is useful know the conformational transitions that occur in protein structures. Although such an understanding can be obtained through large-scale molecular dynamic simulations, it is often the case that such simulations would require excessively large simulation times. In this context, Kirkwood-Buff theory, which connects the microscopic pair-wise molecular distributions to global thermodynamic properties, together with the recently developed technique, called finite size scaling, may provide a better method to reduce system sizes, and hence also the computational times. In this paper, we present molecular dynamics trial simulations of biologically relevant low-concentration solvents, solvated by aqueous co-solvent solutions. In particular we compare two different methods of calculating the relevant Kirkwood-Buff integrals. The first (traditional) method computes running integrals over the radial distribution functions, which must be obtained from large system-size NVT or NpT simulations. The second, newer method, employs finite size scaling to obtain the Kirkwood-Buff integrals directly by counting the particle number fluctuations in small, open sub-volumes embedded within a larger reservoir that can be well approximated by a much smaller simulation cell. In agreement with previous studies, which made a similar comparison for aqueous co-solvent solutions, without the additional solvent, we conclude that the finite size scaling method is also applicable to the present case, since it can produce computationally more efficient results which are equivalent to the more costly radial distribution function method.
Towards Large-area Field-scale Operational Evapotranspiration for Water Use Mapping

NASA Astrophysics Data System (ADS)

Senay, G. B.; Friedrichs, M.; Morton, C.; Huntington, J. L.; Verdin, J.

2017-12-01

Field-scale evapotranspiration (ET) estimates are needed for improving surface and groundwater use and water budget studies. Ideally, field-scale ET estimates would be at regional to national levels and cover long time periods. As a result of large data storage and computational requirements associated with processing field-scale satellite imagery such as Landsat, numerous challenges remain to develop operational ET estimates over large areas for detailed water use and availability studies. However, the combination of new science, data availability, and cloud computing technology is enabling unprecedented capabilities for ET mapping. To demonstrate this capability, we used Google's Earth Engine cloud computing platform to create nationwide annual ET estimates with 30-meter resolution Landsat ( 16,000 images) and gridded weather data using the Operational Simplified Surface Energy Balance (SSEBop) model in support of the National Water Census, a USGS research program designed to build decision support capacity for water management agencies and other natural resource managers. By leveraging Google's Earth Engine Application Programming Interface (API) and developing software in a collaborative, open-platform environment, we rapidly advance from research towards applications for large-area field-scale ET mapping. Cloud computing of the Landsat image archive combined with other satellite, climate, and weather data, is creating never imagined opportunities for assessing ET model behavior and uncertainty, and ultimately providing the ability for more robust operational monitoring and assessment of water use at field-scales.
Using Amazon's Elastic Compute Cloud to dynamically scale CMS computational resources

NASA Astrophysics Data System (ADS)

Evans, D.; Fisk, I.; Holzman, B.; Melo, A.; Metson, S.; Pordes, R.; Sheldon, P.; Tiradani, A.

2011-12-01

Large international scientific collaborations such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider have traditionally addressed their data reduction and analysis needs by building and maintaining dedicated computational infrastructure. Emerging cloud computing services such as Amazon's Elastic Compute Cloud (EC2) offer short-term CPU and storage resources with costs based on usage. These services allow experiments to purchase computing resources as needed, without significant prior planning and without long term investments in facilities and their management. We have demonstrated that services such as EC2 can successfully be integrated into the production-computing model of CMS, and find that they work very well as worker nodes. The cost-structure and transient nature of EC2 services makes them inappropriate for some CMS production services and functions. We also found that the resources are not truely "on-demand" as limits and caps on usage are imposed. Our trial workflows allow us to make a cost comparison between EC2 resources and dedicated CMS resources at a University, and conclude that it is most cost effective to purchase dedicated resources for the "base-line" needs of experiments such as CMS. However, if the ability to use cloud computing resources is built into an experiment's software framework before demand requires their use, cloud computing resources make sense for bursting during times when spikes in usage are required.
Acceleration of discrete stochastic biochemical simulation using GPGPU.

PubMed

Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

2015-01-01

For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130.
Acceleration of discrete stochastic biochemical simulation using GPGPU

PubMed Central

Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

2015-01-01

For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130. PMID:25762936
Fast and Accurate Simulation Technique for Large Irregular Arrays

NASA Astrophysics Data System (ADS)

Bui-Van, Ha; Abraham, Jens; Arts, Michel; Gueuning, Quentin; Raucy, Christopher; Gonzalez-Ovejero, David; de Lera Acedo, Eloy; Craeye, Christophe

2018-04-01

A fast full-wave simulation technique is presented for the analysis of large irregular planar arrays of identical 3-D metallic antennas. The solution method relies on the Macro Basis Functions (MBF) approach and an interpolatory technique to compute the interactions between MBFs. The Harmonic-polynomial (HARP) model is established for the near-field interactions in a modified system of coordinates. For extremely large arrays made of complex antennas, two approaches assuming a limited radius of influence for mutual coupling are considered: one is based on a sparse-matrix LU decomposition and the other one on a tessellation of the array in the form of overlapping sub-arrays. The computation of all embedded element patterns is sped up with the help of the non-uniform FFT algorithm. Extensive validations are shown for arrays of log-periodic antennas envisaged for the low-frequency SKA (Square Kilometer Array) radio-telescope. The analysis of SKA stations with such a large number of elements has not been treated yet in the literature. Validations include comparison with results obtained with commercial software and with experiments. The proposed method is particularly well suited to array synthesis, in which several orders of magnitude can be saved in terms of computation time.
A transient response analysis of the space shuttle vehicle during liftoff

NASA Technical Reports Server (NTRS)

Brunty, J. A.

1990-01-01

A proposed transient response method is formulated for the liftoff analysis of the space shuttle vehicles. It uses a power series approximation with unknown coefficients for the interface forces between the space shuttle and mobile launch platform. This allows the equation of motion of the two structures to be solved separately with the unknown coefficients at the end of each step. These coefficients are obtained by enforcing the interface compatibility conditions between the two structures. Once the unknown coefficients are determined, the total response is computed for that time step. The method is validated by a numerical example of a cantilevered beam and by the liftoff analysis of the space shuttle vehicles. The proposed method is compared to an iterative transient response analysis method used by Martin Marietta for their space shuttle liftoff analysis. It is shown that the proposed method uses less computer time than the iterative method and does not require as small a time step for integration. The space shuttle vehicle model is reduced using two different types of component mode synthesis (CMS) methods, the Lanczos method and the Craig and Bampton CMS method. By varying the cutoff frequency in the Craig and Bampton method it was shown that the space shuttle interface loads can be computed with reasonable accuracy. Both the Lanczos CMS method and Craig and Bampton CMS method give similar results. A substantial amount of computer time is saved using the Lanczos CMS method over that of the Craig and Bampton method. However, when trying to compute a large number of Lanczos vectors, input/output computer time increased and increased the overall computer time. The application of several liftoff release mechanisms that can be adapted to the proposed method are discussed.
AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams.

PubMed

Chen, Qiuwen; Luley, Ryan; Wu, Qing; Bishop, Morgan; Linderman, Richard W; Qiu, Qinru

2018-05-01

The evolution of high performance computing technologies has enabled the large-scale implementation of neuromorphic models and pushed the research in computational intelligence into a new era. Among the machine learning applications, unsupervised detection of anomalous streams is especially challenging due to the requirements of detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research topic. In this paper, we propose anomaly recognition and detection (AnRAD), a bioinspired detection framework that performs probabilistic inferences. We analyze the feature dependency and develop a self-structuring method that learns an efficient confabulation network using unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base using streaming data. Compared with several existing anomaly detection approaches, our method provides competitive detection quality. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementations of the detection algorithm on the graphic processing unit and the Xeon Phi coprocessor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor. The framework provides real-time service to concurrent data streams within diversified knowledge contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle behavior detection, the framework is able to monitor up to 16000 vehicles (data streams) and their interactions in real time with a single commodity coprocessor, and uses less than 0.2 ms for one testing subject. Finally, the detection network is ported to our spiking neural network simulator to show the potential of adapting to the emerging neuromorphic architectures.
Computational study of noise in a large signal transduction network.

PubMed

Intosalmi, Jukka; Manninen, Tiina; Ruohonen, Keijo; Linne, Marja-Leena

2011-06-21

Biochemical systems are inherently noisy due to the discrete reaction events that occur in a random manner. Although noise is often perceived as a disturbing factor, the system might actually benefit from it. In order to understand the role of noise better, its quality must be studied in a quantitative manner. Computational analysis and modeling play an essential role in this demanding endeavor. We implemented a large nonlinear signal transduction network combining protein kinase C, mitogen-activated protein kinase, phospholipase A2, and β isoform of phospholipase C networks. We simulated the network in 300 different cellular volumes using the exact Gillespie stochastic simulation algorithm and analyzed the results in both the time and frequency domain. In order to perform simulations in a reasonable time, we used modern parallel computing techniques. The analysis revealed that time and frequency domain characteristics depend on the system volume. The simulation results also indicated that there are several kinds of noise processes in the network, all of them representing different kinds of low-frequency fluctuations. In the simulations, the power of noise decreased on all frequencies when the system volume was increased. We concluded that basic frequency domain techniques can be applied to the analysis of simulation results produced by the Gillespie stochastic simulation algorithm. This approach is suited not only to the study of fluctuations but also to the study of pure noise processes. Noise seems to have an important role in biochemical systems and its properties can be numerically studied by simulating the reacting system in different cellular volumes. Parallel computing techniques make it possible to run massive simulations in hundreds of volumes and, as a result, accurate statistics can be obtained from computational studies. © 2011 Intosalmi et al; licensee BioMed Central Ltd.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

1992-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine

PubMed Central

Bao, Shunxing; Weitendorf, Frederick D.; Plassard, Andrew J.; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.

2016-01-01

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., “short” processing times and/or “large” datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply “large scale” processing transitions into “big data” and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging. PMID:28736473
Time delay in Swiss cheese gravitational lensing

NASA Astrophysics Data System (ADS)

Chen, B.; Kantowski, R.; Dai, X.

2010-08-01

We compute time delays for gravitational lensing in a flat Λ dominated cold dark matter Swiss cheese universe. We assume a primary and secondary pair of light rays are deflected by a single point mass condensation described by a Kottler metric (Schwarzschild with Λ) embedded in an otherwise homogeneous cosmology. We find that the cosmological constant’s effect on the difference in arrival times is nonlinear and at most around 0.002% for a large cluster lens; however, we find differences from time delays predicted by conventional linear lensing theory that can reach ˜4% for these large lenses. The differences in predicted delay times are due to the failure of conventional lensing to incorporate the lensing mass into the mean mass density of the universe.
Sound production due to large-scale coherent structures

NASA Technical Reports Server (NTRS)

Gatski, T. B.

1979-01-01

The acoustic pressure fluctuations due to large-scale finite amplitude disturbances in a free turbulent shear flow are calculated. The flow is decomposed into three component scales; the mean motion, the large-scale wave-like disturbance, and the small-scale random turbulence. The effect of the large-scale structure on the flow is isolated by applying both a spatial and phase average on the governing differential equations and by initially taking the small-scale turbulence to be in energetic equilibrium with the mean flow. The subsequent temporal evolution of the flow is computed from global energetic rate equations for the different component scales. Lighthill's theory is then applied to the region with the flowfield as the source and an observer located outside the flowfield in a region of uniform velocity. Since the time history of all flow variables is known, a minimum of simplifying assumptions for the Lighthill stress tensor is required, including no far-field approximations. A phase average is used to isolate the pressure fluctuations due to the large-scale structure, and also to isolate the dynamic process responsible. Variation of mean square pressure with distance from the source is computed to determine the acoustic far-field location and decay rate, and, in addition, spectra at various acoustic field locations are computed and analyzed. Also included are the effects of varying the growth and decay of the large-scale disturbance on the sound produced.
A weight based genetic algorithm for selecting views

NASA Astrophysics Data System (ADS)

Talebian, Seyed H.; Kareem, Sameem A.

2013-03-01

Data warehouse is a technology designed for supporting decision making. Data warehouse is made by extracting large amount of data from different operational systems; transforming it to a consistent form and loading it to the central repository. The type of queries in data warehouse environment differs from those in operational systems. In contrast to operational systems, the analytical queries that are issued in data warehouses involve summarization of large volume of data and therefore in normal circumstance take a long time to be answered. On the other hand, the result of these queries must be answered in a short time to enable managers to make decisions as short time as possible. As a result, an essential need in this environment is in improving the performances of queries. One of the most popular methods to do this task is utilizing pre-computed result of queries. In this method, whenever a new query is submitted by the user instead of calculating the query on the fly through a large underlying database, the pre-computed result or views are used to answer the queries. Although, the ideal option would be pre-computing and saving all possible views, but, in practice due to disk space constraint and overhead due to view updates it is not considered as a feasible choice. Therefore, we need to select a subset of possible views to save on disk. The problem of selecting the right subset of views is considered as an important challenge in data warehousing. In this paper we suggest a Weighted Based Genetic Algorithm (WBGA) for solving the view selection problem with two objectives.
Markets and Models for Large-Scale Courseware Development.

ERIC Educational Resources Information Center

Bunderson, C. Victor

Computer-assisted instruction (CAI) is not making an important, visible impact on the educational system of this country. Though its instructional value has been proven time after time, the high cost of the hardware and the lack of quality courseware is preventing CAI from becoming a market success. In order for CAI to reach its market potential…

Neural Computations in a Dynamical System with Multiple Time Scales.

PubMed

Mi, Yuanyuan; Lin, Xiaohan; Wu, Si

2016-01-01

Neural systems display rich short-term dynamics at various levels, e.g., spike-frequency adaptation (SFA) at the single-neuron level, and short-term facilitation (STF) and depression (STD) at the synapse level. These dynamical features typically cover a broad range of time scales and exhibit large diversity in different brain regions. It remains unclear what is the computational benefit for the brain to have such variability in short-term dynamics. In this study, we propose that the brain can exploit such dynamical features to implement multiple seemingly contradictory computations in a single neural circuit. To demonstrate this idea, we use continuous attractor neural network (CANN) as a working model and include STF, SFA and STD with increasing time constants in its dynamics. Three computational tasks are considered, which are persistent activity, adaptation, and anticipative tracking. These tasks require conflicting neural mechanisms, and hence cannot be implemented by a single dynamical feature or any combination with similar time constants. However, with properly coordinated STF, SFA and STD, we show that the network is able to implement the three computational tasks concurrently. We hope this study will shed light on the understanding of how the brain orchestrates its rich dynamics at various levels to realize diverse cognitive functions.
Application of multi-grid method on the simulation of incremental forging processes

NASA Astrophysics Data System (ADS)

Ramadan, Mohamad; Khaled, Mahmoud; Fourment, Lionel

2016-10-01

Numerical simulation becomes essential in manufacturing large part by incremental forging processes. It is a splendid tool allowing to show physical phenomena however behind the scenes, an expensive bill should be paid, that is the computational time. That is why many techniques are developed to decrease the computational time of numerical simulation. Multi-Grid method is a numerical procedure that permits to reduce computational time of numerical calculation by performing the resolution of the system of equations on several mesh of decreasing size which allows to smooth faster the low frequency of the solution as well as its high frequency. In this paper a Multi-Grid method is applied to cogging process in the software Forge 3. The study is carried out using increasing number of degrees of freedom. The results shows that calculation time is divide by two for a mesh of 39,000 nodes. The method is promising especially if coupled with Multi-Mesh method.
The reliable solution and computation time of variable parameters logistic model

NASA Astrophysics Data System (ADS)

Wang, Pengfei; Pan, Xinnong

2018-05-01

The study investigates the reliable computation time (RCT, termed as T c) by applying a double-precision computation of a variable parameters logistic map (VPLM). Firstly, by using the proposed method, we obtain the reliable solutions for the logistic map. Secondly, we construct 10,000 samples of reliable experiments from a time-dependent non-stationary parameters VPLM and then calculate the mean T c. The results indicate that, for each different initial value, the T cs of the VPLM are generally different. However, the mean T c trends to a constant value when the sample number is large enough. The maximum, minimum, and probable distribution functions of T c are also obtained, which can help us to identify the robustness of applying a nonlinear time series theory to forecasting by using the VPLM output. In addition, the T c of the fixed parameter experiments of the logistic map is obtained, and the results suggest that this T c matches the theoretical formula-predicted value.
Optical signal processing using photonic reservoir computing

NASA Astrophysics Data System (ADS)

Salehi, Mohammad Reza; Dehyadegari, Louiza

2014-10-01

As a new approach to recognition and classification problems, photonic reservoir computing has such advantages as parallel information processing, power efficient and high speed. In this paper, a photonic structure has been proposed for reservoir computing which is investigated using a simple, yet, non-partial noisy time series prediction task. This study includes the application of a suitable topology with self-feedbacks in a network of SOA's - which lends the system a strong memory - and leads to adjusting adequate parameters resulting in perfect recognition accuracy (100%) for noise-free time series, which shows a 3% improvement over previous results. For the classification of noisy time series, the rate of accuracy showed a 4% increase and amounted to 96%. Furthermore, an analytical approach was suggested to solve rate equations which led to a substantial decrease in the simulation time, which is an important parameter in classification of large signals such as speech recognition, and better results came up compared with previous works.
A Fast Method for the Segmentation of Synaptic Junctions and Mitochondria in Serial Electron Microscopic Images of the Brain.

PubMed

Márquez Neila, Pablo; Baumela, Luis; González-Soriano, Juncal; Rodríguez, Jose-Rodrigo; DeFelipe, Javier; Merchán-Pérez, Ángel

2016-04-01

Recent electron microscopy (EM) imaging techniques permit the automatic acquisition of a large number of serial sections from brain samples. Manual segmentation of these images is tedious, time-consuming and requires a high degree of user expertise. Therefore, there is considerable interest in developing automatic segmentation methods. However, currently available methods are computationally demanding in terms of computer time and memory usage, and to work properly many of them require image stacks to be isotropic, that is, voxels must have the same size in the X, Y and Z axes. We present a method that works with anisotropic voxels and that is computationally efficient allowing the segmentation of large image stacks. Our approach involves anisotropy-aware regularization via conditional random field inference and surface smoothing techniques to improve the segmentation and visualization. We have focused on the segmentation of mitochondria and synaptic junctions in EM stacks from the cerebral cortex, and have compared the results to those obtained by other methods. Our method is faster than other methods with similar segmentation results. Our image regularization procedure introduces high-level knowledge about the structure of labels. We have also reduced memory requirements with the introduction of energy optimization in overlapping partitions, which permits the regularization of very large image stacks. Finally, the surface smoothing step improves the appearance of three-dimensional renderings of the segmented volumes.
Computer Science Techniques Applied to Parallel Atomistic Simulation

NASA Astrophysics Data System (ADS)

Nakano, Aiichiro

1998-03-01

Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Efficient Control Law Simulation for Multiple Mobile Robots

DOE Office of Scientific and Technical Information (OSTI.GOV)

Driessen, B.J.; Feddema, J.T.; Kotulski, J.D.

1998-10-06

In this paper we consider the problem of simulating simple control laws involving large numbers of mobile robots. Such simulation can be computationally prohibitive if the number of robots is large enough, say 1 million, due to the 0(N2 ) cost of each time step. This work therefore uses hierarchical tree-based methods for calculating the control law. These tree-based approaches have O(NlogN) cost per time step, thus allowing for efficient simulation involving a large number of robots. For concreteness, a decentralized control law which involves only the distance and bearing to the closest neighbor robot will be considered. The timemore » to calculate the control law for each robot at each time step is demonstrated to be O(logN).« less
ALMA test interferometer control system: past experiences and future developments

NASA Astrophysics Data System (ADS)

Marson, Ralph G.; Pokorny, Martin; Kern, Jeff; Stauffer, Fritz; Perrigouard, Alain; Gustafsson, Birger; Ramey, Ken

2004-09-01

The Atacama Large Millimeter Array (ALMA) will, when it is completed in 2012, be the world's largest millimeter & sub-millimeter radio telescope. It will consist of 64 antennas, each one 12 meters in diameter, connected as an interferometer. The ALMA Test Interferometer Control System (TICS) was developed as a prototype for the ALMA control system. Its initial task was to provide sufficient functionality for the evaluation of the prototype antennas. The main antenna evaluation tasks include surface measurements via holography and pointing accuracy, measured at both optical and millimeter wavelengths. In this paper we will present the design of TICS, which is a distributed computing environment. In the test facility there are four computers: three real-time computers running VxWorks (one on each antenna and a central one) and a master computer running Linux. These computers communicate via Ethernet, and each of the real-time computers is connected to the hardware devices via an extension of the CAN bus. We will also discuss our experience with this system and outline changes we are making in light of our experiences.
77 FR 40494 - Procedures for the Handling of Retaliation Complaints Under Section 219 of the Consumer Product...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-10

... cases involving important or novel legal issues, large numbers of employees, alleged violations that... computation of time before those tribunals and express filing deadlines as days rather than business days...
Efficient mapping algorithms for scheduling robot inverse dynamics computation on a multiprocessor system

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Chen, C. L.

1989-01-01

Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
A Science Cloud: OneSpaceNet

NASA Astrophysics Data System (ADS)

Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.

2010-12-01

Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

NASA Technical Reports Server (NTRS)

O'Connell, Matthew; Karman, Steve L.

2015-01-01

Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
A large-grain mapping approach for multiprocessor systems through data flow model. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Kim, Hwa-Soo

1991-01-01

A large-grain level mapping method is presented of numerical oriented applications onto multiprocessor systems. The method is based on the large-grain data flow representation of the input application and it assumes a general interconnection topology of the multiprocessor system. The large-grain data flow model was used because such representation best exhibits inherited parallelism in many important applications, e.g., CFD models based on partial differential equations can be presented in large-grain data flow format, very effectively. A generalized interconnection topology of the multiprocessor architecture is considered, including such architectural issues as interprocessor communication cost, with the aim to identify the 'best matching' between the application and the multiprocessor structure. The objective is to minimize the total execution time of the input algorithm running on the target system. The mapping strategy consists of the following: (1) large-grain data flow graph generation from the input application using compilation techniques; (2) data flow graph partitioning into basic computation blocks; and (3) physical mapping onto the target multiprocessor using a priority allocation scheme for the computation blocks.
Users' manual for computer program for three-dimensional analysis of coupler-cavity traveling wave tubes

NASA Technical Reports Server (NTRS)

Omalley, T. A.

1984-01-01

The use of the coupled cavity traveling wave tube for space communications has led to an increased interest in improving the efficiency of the basic interaction process in these devices through velocity resynchronization and other methods. A flexible, three dimensional, axially symmetric, large signal computer program was developed for use on the IBM 370 time sharing system. A users' manual for this program is included.
Computing Systems Configuration for Highly Integrated Guidance and Control Systems

DTIC Science & Technology

1988-06-01

conmmunication ear lea imlustrielaiservenant dais an projet. Cela eat renda , possible entre auies par l’adoption dene mibodologie do travai coammune, par...computed graph results to data processors for post processing, or commnicating with system I/O modules. The ESU PI- Bus interface logic includes extra ...the extra constraint checking helps to find more problems at compile time), and it is especially well- suited for large software systems written by a
Rational calculation accuracy in acousto-optical matrix-vector processor

NASA Astrophysics Data System (ADS)

Oparin, V. V.; Tigin, Dmitry V.

1994-01-01

The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Data Intensive Analysis of Biomolecular Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Straatsma, TP; Soares, Thereza A.

2007-12-01

The advances in biomolecular modeling and simulation made possible by the availability of increasingly powerful high performance computing resources is extending molecular simulations to biological more relevant system size and time scales. At the same time, advances in simulation methodologies are allowing more complex processes to be described more accurately. These developments make a systems approach to computational structural biology feasible, but this will require a focused emphasis on the comparative analysis of the increasing number of molecular simulations that are being carried out for biomolecular systems with more realistic models, multi-component environments, and for longer simulation times. Just asmore » in the case of the analysis of the large data sources created by the new high-throughput experimental technologies, biomolecular computer simulations contribute to the progress in biology through comparative analysis. The continuing increase in available protein structures allows the comparative analysis of the role of structure and conformational flexibility in protein function, and is the foundation of the discipline of structural bioinformatics. This creates the opportunity to derive general findings from the comparative analysis of molecular dynamics simulations of a wide range of proteins, protein-protein complexes and other complex biological systems. Because of the importance of protein conformational dynamics for protein function, it is essential that the analysis of molecular trajectories is carried out using a novel, more integrative and systematic approach. We are developing a much needed rigorous computer science based framework for the efficient analysis of the increasingly large data sets resulting from molecular simulations. Such a suite of capabilities will also provide the required tools for access and analysis of a distributed library of generated trajectories. Our research is focusing on the following areas: (1) the development of an efficient analysis framework for very large scale trajectories on massively parallel architectures, (2) the development of novel methodologies that allow automated detection of events in these very large data sets, and (3) the efficient comparative analysis of multiple trajectories. The goal of the presented work is the development of new algorithms that will allow biomolecular simulation studies to become an integral tool to address the challenges of post-genomic biological research. The strategy to deliver the required data intensive computing applications that can effectively deal with the volume of simulation data that will become available is based on taking advantage of the capabilities offered by the use of large globally addressable memory architectures. The first requirement is the design of a flexible underlying data structure for single large trajectories that will form an adaptable framework for a wide range of analysis capabilities. The typical approach to trajectory analysis is to sequentially process trajectories time frame by time frame. This is the implementation found in molecular simulation codes such as NWChem, and has been designed in this way to be able to run on workstation computers and other architectures with an aggregate amount of memory that would not allow entire trajectories to be held in core. The consequence of this approach is an I/O dominated solution that scales very poorly on parallel machines. We are currently using an approach of developing tools specifically intended for use on large scale machines with sufficient main memory that entire trajectories can be held in core. This greatly reduces the cost of I/O as trajectories are read only once during the analysis. In our current Data Intensive Analysis (DIANA) implementation, each processor determines and skips to the entry within the trajectory that typically will be available in multiple files and independently from all other processors read the appropriate frames.« less
Data Intensive Analysis of Biomolecular Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Straatsma, TP

2008-03-01

The advances in biomolecular modeling and simulation made possible by the availability of increasingly powerful high performance computing resources is extending molecular simulations to biological more relevant system size and time scales. At the same time, advances in simulation methodologies are allowing more complex processes to be described more accurately. These developments make a systems approach to computational structural biology feasible, but this will require a focused emphasis on the comparative analysis of the increasing number of molecular simulations that are being carried out for biomolecular systems with more realistic models, multi-component environments, and for longer simulation times. Just asmore » in the case of the analysis of the large data sources created by the new high-throughput experimental technologies, biomolecular computer simulations contribute to the progress in biology through comparative analysis. The continuing increase in available protein structures allows the comparative analysis of the role of structure and conformational flexibility in protein function, and is the foundation of the discipline of structural bioinformatics. This creates the opportunity to derive general findings from the comparative analysis of molecular dynamics simulations of a wide range of proteins, protein-protein complexes and other complex biological systems. Because of the importance of protein conformational dynamics for protein function, it is essential that the analysis of molecular trajectories is carried out using a novel, more integrative and systematic approach. We are developing a much needed rigorous computer science based framework for the efficient analysis of the increasingly large data sets resulting from molecular simulations. Such a suite of capabilities will also provide the required tools for access and analysis of a distributed library of generated trajectories. Our research is focusing on the following areas: (1) the development of an efficient analysis framework for very large scale trajectories on massively parallel architectures, (2) the development of novel methodologies that allow automated detection of events in these very large data sets, and (3) the efficient comparative analysis of multiple trajectories. The goal of the presented work is the development of new algorithms that will allow biomolecular simulation studies to become an integral tool to address the challenges of post-genomic biological research. The strategy to deliver the required data intensive computing applications that can effectively deal with the volume of simulation data that will become available is based on taking advantage of the capabilities offered by the use of large globally addressable memory architectures. The first requirement is the design of a flexible underlying data structure for single large trajectories that will form an adaptable framework for a wide range of analysis capabilities. The typical approach to trajectory analysis is to sequentially process trajectories time frame by time frame. This is the implementation found in molecular simulation codes such as NWChem, and has been designed in this way to be able to run on workstation computers and other architectures with an aggregate amount of memory that would not allow entire trajectories to be held in core. The consequence of this approach is an I/O dominated solution that scales very poorly on parallel machines. We are currently using an approach of developing tools specifically intended for use on large scale machines with sufficient main memory that entire trajectories can be held in core. This greatly reduces the cost of I/O as trajectories are read only once during the analysis. In our current Data Intensive Analysis (DIANA) implementation, each processor determines and skips to the entry within the trajectory that typically will be available in multiple files and independently from all other processors read the appropriate frames.« less
Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

NASA Astrophysics Data System (ADS)

Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Rich client data exploration and research prototyping for NOAA

NASA Astrophysics Data System (ADS)

Grossberg, Michael; Gladkova, Irina; Guch, Ingrid; Alabi, Paul; Shahriar, Fazlul; Bonev, George; Aizenman, Hannah

2009-08-01

Data from satellites and model simulations is increasing exponentially as observations and model computing power improve rapidly. Not only is technology producing more data, but it often comes from sources all over the world. Researchers and scientists who must collaborate are also located globally. This work presents a software design and technologies which will make it possible for groups of researchers to explore large data sets visually together without the need to download these data sets locally. The design will also make it possible to exploit high performance computing remotely and transparently to analyze and explore large data sets. Computer power, high quality sensing, and data storage capacity have improved at a rate that outstrips our ability to develop software applications that exploit these resources. It is impractical for NOAA scientists to download all of the satellite and model data that may be relevant to a given problem and the computing environments available to a given researcher range from supercomputers to only a web browser. The size and volume of satellite and model data are increasing exponentially. There are at least 50 multisensor satellite platforms collecting Earth science data. On the ground and in the sea there are sensor networks, as well as networks of ground based radar stations, producing a rich real-time stream of data. This new wealth of data would have limited use were it not for the arrival of large-scale high-performance computation provided by parallel computers, clusters, grids, and clouds. With these computational resources and vast archives available, it is now possible to analyze subtle relationships which are global, multi-modal and cut across many data sources. Researchers, educators, and even the general public, need tools to access, discover, and use vast data center archives and high performance computing through a simple yet flexible interface.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.