Sample records for cpu time spent

  1. The Creation of a CPU Timer for High Fidelity Programs

    NASA Technical Reports Server (NTRS)

    Dick, Aidan A.

    2011-01-01

    Using C and C++ programming languages, a tool was developed that measures the efficiency of a program by recording the amount of CPU time that various functions consume. By inserting the tool between lines of code in the program, one can receive a detailed report of the absolute and relative time consumption associated with each section. After adapting the generic tool for a high-fidelity launch vehicle simulation program called MAVERIC, the components of a frequently used function called "derivatives ( )" were measured. Out of the 34 sub-functions in "derivatives ( )", it was found that the top 8 sub-functions made up 83.1% of the total time spent. In order to decrease the overall run time of MAVERIC, a launch vehicle simulation program, a change was implemented in the sub-function "Event_Controller ( )". Reformatting "Event_Controller ( )" led to a 36.9% decrease in the total CPU time spent by that sub-function, and a 3.2% decrease in the total CPU time spent by the overarching function "derivatives ( )".

  2. 32 CFR 701.53 - FOIA fee schedule.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...

  3. 32 CFR 701.53 - FOIA fee schedule.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...

  4. 32 CFR 701.53 - FOIA fee schedule.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...

  5. Symptoms of problematic cellular phone use, functional impairment and its association with depression among adolescents in Southern Taiwan.

    PubMed

    Yen, Cheng-Fang; Tang, Tze-Chun; Yen, Ju-Yu; Lin, Huang-Chi; Huang, Chi-Fen; Liu, Shu-Chun; Ko, Chih-Hung

    2009-08-01

    The aims of this study were: (1) to examine the prevalence of symptoms of problematic cellular phone use (CPU); (2) to examine the associations between the symptoms of problematic CPU, functional impairment caused by CPU and the characteristics of CPU; (3) to establish the optimal cut-off point of the number of symptoms for functional impairment caused by CPU; and (4) to examine the association between problematic CPU and depression in adolescents. A total of 10,191 adolescent students in Southern Taiwan were recruited into this study. Participants' self-reported symptoms of problematic CPU and functional impairments caused by CPU were collected. The associations of symptoms of problematic CPU with functional impairments and with the characteristics of CPU were examined. The cut-off point of the number of symptoms for functional impairment was also determined. The association between problematic CPU and depression was examined by logistic regression analysis. The results indicated that the symptoms of problematic CPU were prevalent in adolescents. The adolescents who had any one of the symptoms of problematic CPU were more likely to report at least one dimension of functional impairment caused by CPU, called more on cellular phones, sent more text messages, or spent more time and higher fees on CPU. Having four or more symptoms of problematic CPU had the highest potential to differentiate between the adolescents with and without functional impairment caused by CPU. Adolescents who had significant depression were more likely to have four or more symptoms of problematic CPU. The results of this study may provide a basis for detecting symptoms of problematic CPU in adolescents.

  6. Machine learning based job status prediction in scientific clusters

    DOE PAGES

    Yoo, Wucherl; Sim, Alex; Wu, Kesheng

    2016-09-01

    Large high-performance computing systems are built with increasing number of components with more CPU cores, more memory, and more storage space. At the same time, scientific applications have been growing in complexity. Together, they are leading to more frequent unsuccessful job statuses on HPC systems. From measured job statuses, 23.4% of CPU time was spent to the unsuccessful jobs. Here, we set out to study whether these unsuccessful job statuses could be anticipated from known job characteristics. To explore this possibility, we have developed a job status prediction method for the execution of jobs on scientific clusters. The Random Forestsmore » algorithm was applied to extract and characterize the patterns of unsuccessful job statuses. Experimental results show that our method can predict the unsuccessful job statuses from the monitored ongoing job executions in 99.8% the cases with 83.6% recall and 94.8% precision. Lastly, this prediction accuracy can be sufficiently high that it can be used to mitigation procedures of predicted failures.« less

  7. Machine learning based job status prediction in scientific clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoo, Wucherl; Sim, Alex; Wu, Kesheng

    Large high-performance computing systems are built with increasing number of components with more CPU cores, more memory, and more storage space. At the same time, scientific applications have been growing in complexity. Together, they are leading to more frequent unsuccessful job statuses on HPC systems. From measured job statuses, 23.4% of CPU time was spent to the unsuccessful jobs. Here, we set out to study whether these unsuccessful job statuses could be anticipated from known job characteristics. To explore this possibility, we have developed a job status prediction method for the execution of jobs on scientific clusters. The Random Forestsmore » algorithm was applied to extract and characterize the patterns of unsuccessful job statuses. Experimental results show that our method can predict the unsuccessful job statuses from the monitored ongoing job executions in 99.8% the cases with 83.6% recall and 94.8% precision. Lastly, this prediction accuracy can be sufficiently high that it can be used to mitigation procedures of predicted failures.« less

  8. Multi-GPU Jacobian accelerated computing for soft-field tomography.

    PubMed

    Borsic, A; Attardo, E A; Halter, R J

    2012-10-01

    Image reconstruction in soft-field tomography is based on an inverse problem formulation, where a forward model is fitted to the data. In medical applications, where the anatomy presents complex shapes, it is common to use finite element models (FEMs) to represent the volume of interest and solve a partial differential equation that models the physics of the system. Over the last decade, there has been a shifting interest from 2D modeling to 3D modeling, as the underlying physics of most problems are 3D. Although the increased computational power of modern computers allows working with much larger FEM models, the computational time required to reconstruct 3D images on a fine 3D FEM model can be significant, on the order of hours. For example, in electrical impedance tomography (EIT) applications using a dense 3D FEM mesh with half a million elements, a single reconstruction iteration takes approximately 15-20 min with optimized routines running on a modern multi-core PC. It is desirable to accelerate image reconstruction to enable researchers to more easily and rapidly explore data and reconstruction parameters. Furthermore, providing high-speed reconstructions is essential for some promising clinical application of EIT. For 3D problems, 70% of the computing time is spent building the Jacobian matrix, and 25% of the time in forward solving. In this work, we focus on accelerating the Jacobian computation by using single and multiple GPUs. First, we discuss an optimized implementation on a modern multi-core PC architecture and show how computing time is bounded by the CPU-to-memory bandwidth; this factor limits the rate at which data can be fetched by the CPU. Gains associated with the use of multiple CPU cores are minimal, since data operands cannot be fetched fast enough to saturate the processing power of even a single CPU core. GPUs have much faster memory bandwidths compared to CPUs and better parallelism. We are able to obtain acceleration factors of 20 times on a single NVIDIA S1070 GPU, and of 50 times on four GPUs, bringing the Jacobian computing time for a fine 3D mesh from 12 min to 14 s. We regard this as an important step toward gaining interactive reconstruction times in 3D imaging, particularly when coupled in the future with acceleration of the forward problem. While we demonstrate results for EIT, these results apply to any soft-field imaging modality where the Jacobian matrix is computed with the adjoint method.

  9. Multi-GPU Jacobian Accelerated Computing for Soft Field Tomography

    PubMed Central

    Borsic, A.; Attardo, E. A.; Halter, R. J.

    2012-01-01

    Image reconstruction in soft-field tomography is based on an inverse problem formulation, where a forward model is fitted to the data. In medical applications, where the anatomy presents complex shapes, it is common to use Finite Element Models to represent the volume of interest and to solve a partial differential equation that models the physics of the system. Over the last decade, there has been a shifting interest from 2D modeling to 3D modeling, as the underlying physics of most problems are three-dimensional. Though the increased computational power of modern computers allows working with much larger FEM models, the computational time required to reconstruct 3D images on a fine 3D FEM model can be significant, on the order of hours. For example, in Electrical Impedance Tomography applications using a dense 3D FEM mesh with half a million elements, a single reconstruction iteration takes approximately 15 to 20 minutes with optimized routines running on a modern multi-core PC. It is desirable to accelerate image reconstruction to enable researchers to more easily and rapidly explore data and reconstruction parameters. Further, providing high-speed reconstructions are essential for some promising clinical application of EIT. For 3D problems 70% of the computing time is spent building the Jacobian matrix, and 25% of the time in forward solving. In the present work, we focus on accelerating the Jacobian computation by using single and multiple GPUs. First, we discuss an optimized implementation on a modern multi-core PC architecture and show how computing time is bounded by the CPU-to-memory bandwidth; this factor limits the rate at which data can be fetched by the CPU. Gains associated with use of multiple CPU cores are minimal, since data operands cannot be fetched fast enough to saturate the processing power of even a single CPU core. GPUs have a much faster memory bandwidths compared to CPUs and better parallelism. We are able to obtain acceleration factors of 20 times on a single NVIDIA S1070 GPU, and of 50 times on 4 GPUs, bringing the Jacobian computing time for a fine 3D mesh from 12 minutes to 14 seconds. We regard this as an important step towards gaining interactive reconstruction times in 3D imaging, particularly when coupled in the future with acceleration of the forward problem. While we demonstrate results for Electrical Impedance Tomography, these results apply to any soft-field imaging modality where the Jacobian matrix is computed with the Adjoint Method. PMID:23010857

  10. An efficient compression scheme for bitmap indices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie

    2004-04-13

    When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap codemore » (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time is proportional to the index size. This indicates that the compressed bitmap indices are efficient for very large datasets.« less

  11. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

    PubMed Central

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-01-01

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606

  12. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

    PubMed

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-04-07

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  13. Dynamic Quantum Allocation and Swap-Time Variability in Time-Sharing Operating Systems.

    ERIC Educational Resources Information Center

    Bhat, U. Narayan; Nance, Richard E.

    The effects of dynamic quantum allocation and swap-time variability on central processing unit (CPU) behavior are investigated using a model that allows both quantum length and swap-time to be state-dependent random variables. Effective CPU utilization is defined to be the proportion of a CPU busy period that is devoted to program processing, i.e.…

  14. Hybrid Computational Architecture for Multi-Scale Modeling of Materials and Devices

    DTIC Science & Technology

    2016-01-03

    Equivalent: Total Number: Sub Contractors (DD882) Names of Faculty Supported Names of Under Graduate students supported Names of Personnel receiving masters...GHz, 20 cores (40 with hyper-threading ( HT )) Single node performance Node # of cores Total CPU time User CPU time System CPU time Elapsed time...INTEL20 40 (with HT ) 534.785 529.984 4.800 541.179 20 468.873 466.119 2.754 476.878 10 671.798 669.653 2.145 680.510 8 772.269 770.256 2.013

  15. Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com; Suprijadi; Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132

    Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic imagesmore » and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.« less

  16. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

    PubMed

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

  17. File Usage Analysis and Resource Usage Prediction: a Measurement-Based Study. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Devarakonda, Murthy V.-S.

    1987-01-01

    A probabilistic scheme was developed to predict process resource usage in UNIX. Given the identity of the program being run, the scheme predicts CPU time, file I/O, and memory requirements of a process at the beginning of its life. The scheme uses a state-transition model of the program's resource usage in its past executions for prediction. The states of the model are the resource regions obtained from an off-line cluster analysis of processes run on the system. The proposed method is shown to work on data collected from a VAX 11/780 running 4.3 BSD UNIX. The results show that the predicted values correlate well with the actual. The coefficient of correlation between the predicted and actual values of CPU time is 0.84. Errors in prediction are mostly small. Some 82% of errors in CPU time prediction are less than 0.5 standard deviations of process CPU time.

  18. Predictability of process resource usage - A measurement-based study on UNIX

    NASA Technical Reports Server (NTRS)

    Devarakonda, Murthy V.; Iyer, Ravishankar K.

    1989-01-01

    A probabilistic scheme is developed to predict process resource usage in UNIX. Given the identity of the program being run, the scheme predicts CPU time, file I/O, and memory requirements of a process at the beginning of its life. The scheme uses a state-transition model of the program's resource usage in its past executions for prediction. The states of the model are the resource regions obtained from an off-line cluster analysis of processes run on the system. The proposed method is shown to work on data collected from a VAX 11/780 running 4.3 BSD UNIX. The results show that the predicted values correlate well with the actual. The correlation coefficient betweeen the predicted and actual values of CPU time is 0.84. Errors in prediction are mostly small. Some 82 percent of errors in CPU time prediction are less than 0.5 standard deviations of process CPU time.

  19. Predictability of process resource usage: A measurement-based study of UNIX

    NASA Technical Reports Server (NTRS)

    Devarakonda, Murthy V.; Iyer, Ravishankar K.

    1987-01-01

    A probabilistic scheme is developed to predict process resource usage in UNIX. Given the identity of the program being run, the scheme predicts CPU time, file I/O, and memory requirements of a process at the beginning of its life. The scheme uses a state-transition model of the program's resource usage in its past executions for prediction. The states of the model are the resource regions obtained from an off-line cluster analysis of processes run on the system. The proposed method is shown to work on data collected from a VAX 11/780 running 4.3 BSD UNIX. The results show that the predicted values correlate well with the actual. The correlation coefficient between the predicted and actual values of CPU time is 0.84. Errors in prediction are mostly small. Some 82% of errors in CPU time prediction are less than 0.5 standard deviations of process CPU time.

  20. A Spiking Neural Simulator Integrating Event-Driven and Time-Driven Computation Schemes Using Parallel CPU-GPU Co-Processing: A Case Study.

    PubMed

    Naveros, Francisco; Luque, Niceto R; Garrido, Jesús A; Carrillo, Richard R; Anguita, Mancia; Ros, Eduardo

    2015-07-01

    Time-driven simulation methods in traditional CPU architectures perform well and precisely when simulating small-scale spiking neural networks. Nevertheless, they still have drawbacks when simulating large-scale systems. Conversely, event-driven simulation methods in CPUs and time-driven simulation methods in graphic processing units (GPUs) can outperform CPU time-driven methods under certain conditions. With this performance improvement in mind, we have developed an event-and-time-driven spiking neural network simulator suitable for a hybrid CPU-GPU platform. Our neural simulator is able to efficiently simulate bio-inspired spiking neural networks consisting of different neural models, which can be distributed heterogeneously in both small layers and large layers or subsystems. For the sake of efficiency, the low-activity parts of the neural network can be simulated in CPU using event-driven methods while the high-activity subsystems can be simulated in either CPU (a few neurons) or GPU (thousands or millions of neurons) using time-driven methods. In this brief, we have undertaken a comparative study of these different simulation methods. For benchmarking the different simulation methods and platforms, we have used a cerebellar-inspired neural-network model consisting of a very dense granular layer and a Purkinje layer with a smaller number of cells (according to biological ratios). Thus, this cerebellar-like network includes a dense diverging neural layer (increasing the dimensionality of its internal representation and sparse coding) and a converging neural layer (integration) similar to many other biologically inspired and also artificial neural networks.

  1. On the cost of approximating and recognizing a noise perturbed straight line or a quadratic curve segment in the plane. [central processing units

    NASA Technical Reports Server (NTRS)

    Cooper, D. B.; Yalabik, N.

    1975-01-01

    Approximation of noisy data in the plane by straight lines or elliptic or single-branch hyperbolic curve segments arises in pattern recognition, data compaction, and other problems. The efficient search for and approximation of data by such curves were examined. Recursive least-squares linear curve-fitting was used, and ellipses and hyperbolas are parameterized as quadratic functions in x and y. The error minimized by the algorithm is interpreted, and central processing unit (CPU) times for estimating parameters for fitting straight lines and quadratic curves were determined and compared. CPU time for data search was also determined for the case of straight line fitting. Quadratic curve fitting is shown to require about six times as much CPU time as does straight line fitting, and curves relating CPU time and fitting error were determined for straight line fitting. Results are derived on early sequential determination of whether or not the underlying curve is a straight line.

  2. Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

    NASA Astrophysics Data System (ADS)

    Niemeyer, Kyle E.; Sung, Chih-Jen

    2014-01-01

    The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for developing new strategies for integrating stiff chemistry on GPUs was discussed.

  3. Novel hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization estimation method for population pharmacokinetic data analysis.

    PubMed

    Ng, C M

    2013-10-01

    The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU-CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU-CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.

  4. GPU Optimizations for a Production Molecular Docking Code*

    PubMed Central

    Landaverde, Raphael; Herbordt, Martin C.

    2015-01-01

    Modeling molecular docking is critical to both understanding life processes and designing new drugs. In previous work we created the first published GPU-accelerated docking code (PIPER) which achieved a roughly 5× speed-up over a contemporaneous 4 core CPU. Advances in GPU architecture and in the CPU code, however, have since reduced this relalative performance by a factor of 10. In this paper we describe the upgrade of GPU PIPER. This required an entire rewrite, including algorithm changes and moving most remaining non-accelerated CPU code onto the GPU. The result is a 7× improvement in GPU performance and a 3.3× speedup over the CPU-only code. We find that this difference in time is almost entirely due to the difference in run times of the 3D FFT library functions on CPU (MKL) and GPU (cuFFT), respectively. The GPU code has been integrated into the ClusPro docking server which has over 4000 active users. PMID:26594667

  5. GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

    PubMed Central

    Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

    2016-01-01

    Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905

  6. GPU Optimizations for a Production Molecular Docking Code.

    PubMed

    Landaverde, Raphael; Herbordt, Martin C

    2014-09-01

    Modeling molecular docking is critical to both understanding life processes and designing new drugs. In previous work we created the first published GPU-accelerated docking code (PIPER) which achieved a roughly 5× speed-up over a contemporaneous 4 core CPU. Advances in GPU architecture and in the CPU code, however, have since reduced this relalative performance by a factor of 10. In this paper we describe the upgrade of GPU PIPER. This required an entire rewrite, including algorithm changes and moving most remaining non-accelerated CPU code onto the GPU. The result is a 7× improvement in GPU performance and a 3.3× speedup over the CPU-only code. We find that this difference in time is almost entirely due to the difference in run times of the 3D FFT library functions on CPU (MKL) and GPU (cuFFT), respectively. The GPU code has been integrated into the ClusPro docking server which has over 4000 active users.

  7. The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

    NASA Astrophysics Data System (ADS)

    Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

    2017-03-01

    In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.

  8. High effective inverse dynamics modelling for dual-arm robot

    NASA Astrophysics Data System (ADS)

    Shen, Haoyu; Liu, Yanli; Wu, Hongtao

    2018-05-01

    To deal with the problem of inverse dynamics modelling for dual arm robot, a recursive inverse dynamics modelling method based on decoupled natural orthogonal complement is presented. In this model, the concepts and methods of Decoupled Natural Orthogonal Complement matrices are used to eliminate the constraint forces in the Newton-Euler kinematic equations, and the screws is used to express the kinematic and dynamics variables. On this basis, the paper has developed a special simulation program with symbol software of Mathematica and conducted a simulation research on the a dual-arm robot. Simulation results show that the proposed method based on decoupled natural orthogonal complement can save an enormous amount of CPU time that was spent in computing compared with the recursive Newton-Euler kinematic equations and the results is correct and reasonable, which can verify the reliability and efficiency of the method.

  9. Numericware i: Identical by State Matrix Calculator

    PubMed Central

    Kim, Bongsong; Beavis, William D

    2017-01-01

    We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db. PMID:28469375

  10. SU-E-J-91: FFT Based Medical Image Registration Using a Graphics Processing Unit (GPU).

    PubMed

    Luce, J; Hoggarth, M; Lin, J; Block, A; Roeske, J

    2012-06-01

    To evaluate the efficiency gains obtained from using a Graphics Processing Unit (GPU) to perform a Fourier Transform (FT) based image registration. Fourier-based image registration involves obtaining the FT of the component images, and analyzing them in Fourier space to determine the translations and rotations of one image set relative to another. An important property of FT registration is that by enlarging the images (adding additional pixels), one can obtain translations and rotations with sub-pixel resolution. The expense, however, is an increased computational time. GPUs may decrease the computational time associated with FT image registration by taking advantage of their parallel architecture to perform matrix computations much more efficiently than a Central Processor Unit (CPU). In order to evaluate the computational gains produced by a GPU, images with known translational shifts were utilized. A program was written in the Interactive Data Language (IDL; Exelis, Boulder, CO) to performCPU-based calculations. Subsequently, the program was modified using GPU bindings (Tech-X, Boulder, CO) to perform GPU-based computation on the same system. Multiple image sizes were used, ranging from 256×256 to 2304×2304. The time required to complete the full algorithm by the CPU and GPU were benchmarked and the speed increase was defined as the ratio of the CPU-to-GPU computational time. The ratio of the CPU-to- GPU time was greater than 1.0 for all images, which indicates the GPU is performing the algorithm faster than the CPU. The smallest improvement, a 1.21 ratio, was found with the smallest image size of 256×256, and the largest speedup, a 4.25 ratio, was observed with the largest image size of 2304×2304. GPU programming resulted in a significant decrease in computational time associated with a FT image registration algorithm. The inclusion of the GPU may provide near real-time, sub-pixel registration capability. © 2012 American Association of Physicists in Medicine.

  11. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

    PubMed Central

    Chen, Yaw-Chung

    2015-01-01

    The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms. PMID:26437335

  12. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    PubMed

    Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung

    2015-01-01

    The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  13. Clinical implementation of a GPU-based simplified Monte Carlo method for a treatment planning system of proton beam therapy.

    PubMed

    Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T

    2011-11-21

    We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30-16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9-67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning.

  14. The Performance of the NAS HSPs in 1st Half of 1994

    NASA Technical Reports Server (NTRS)

    Bergeron, Robert J.; Walter, Howard (Technical Monitor)

    1995-01-01

    During the first six months of 1994, the NAS (National Airspace System) 16-CPU Y-MP C90 Von Neumann (VN) delivered an average throughput of 4.045 GFLOPS while the ACSF (Aeronautics Consolidated Supercomputer Facility) 8-CPU Y-MP C90 Eagle averaged 1.658 GFLOPS. The VN rate represents a machine efficiency of 26.3% whereas the Eagle rate corresponds to a machine efficiency of 21.6%. VN displayed a greater efficiency than Eagle primarily because the stronger workload demand for its CPU cycles allowed it to devote more time to user programs and less time to idle. An additional factor increasing VN efficiency was the ability of the UNICOS 8.0 Operating System to deliver a larger fraction of CPU time to user programs. Although measurements indicate increasing vector length for both workloads, insufficient vector lengths continue to hinder HSP (High Speed Processor) performance. To improve HSP performance, NAS should continue to encourage the HSP users to modify their codes to increase program vector length.

  15. Static and Dynamic Frequency Scaling on Multicore CPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bao, Wenlei; Hong, Changwan; Chunduri, Sudheer

    2016-12-28

    Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical approaches employing DVFS involve default strategies such as running at the lowest or the highest frequency, or observing the CPU’s runtime behavior and dynamically adapting the voltage/frequency configuration based on CPU usage. In this paper, we argue that many previous approaches suffer from inherent limitations, such as not account- ing for processor-specific impact of frequency changes on energy for different workload types. We first propose a lightweight runtime-based approach to automatically adapt the frequency based on the CPU workload,more » that is agnostic of the processor characteristics. We then show that further improvements can be achieved for affine kernels in the application, using a compile-time characterization instead of run-time monitoring to select the frequency and number of CPU cores to use. Our framework relies on a one-time energy characterization of CPU-specific DVFS profiles followed by a compile-time categorization of loop-based code segments in the application. These are combined to determine a priori of the frequency and the number of cores to use to execute the application so as to optimize energy or energy-delay product, outperforming runtime approach. Extensive evaluation on 60 benchmarks and five multi-core CPUs show that our approach systematically outperforms the powersave Linux governor, while improving overall performance.« less

  16. Characterization and referral patterns of ST-elevation myocardial infarction patients admitted to chest pain units rather than directly to catherization laboratories. Data from the German Chest Pain Unit Registry.

    PubMed

    Schmidt, Frank P; Perne, Andrea; Hochadel, Matthias; Giannitsis, Evangelos; Darius, Harald; Maier, Lars S; Schmitt, Claus; Heusch, Gerd; Voigtländer, Thomas; Mudra, Harald; Gori, Tommaso; Senges, Jochen; Münzel, Thomas

    2017-03-15

    Direct transfer to the catheterization laboratory for primary percutaneous coronary intervention (PCI) is standard of care for patients with ST-segment elevation myocardial infarction (STEMI). Nevertheless, a significant number of STEMI-patients are initially treated in chest pain units (CPUs) of admitting hospitals. Thus, it is important to characterize these patients and to define why an important deviation from recommended clinical pathways occurs and in particular to quantify the impact of deviation on critical time intervals. 1679 STEMI patients admitted to a CPU in the period from 2010 to 2015 were enrolled in the German CPU registry (8.5% of 19,666). 55.9% of the patients were delivered by an emergency medical system (EMS), 16.1% transferred from other hospitals and 15.2% referred by a general practitioner (GP). 12.7% were self-referrals. 55% did not get a pre-hospital ECG. Compared to the EMS, referral by GPs markedly delayed critical time intervals while a pre-hospital ECG demonstrating ST-segment elevation reduced door-to-balloon time. When compared to STEMI patients (n=21,674) enrolled in the ALKK-registry, CPU-STEMI patients had a lower risk profile, their treatment in the CPU was guideline-conform and in-hospital mortality was low (1.5%). CPU-STEMI patients represent a numerically significant group because a pre-hospital ECG was not documented. Treatment in the CPU is guideline-conform and the intra-hospital mortality is low. The lack of a pre-hospital ECG and admission via the GP substantially delay critical time intervals suggesting that in patients with symptoms suggestive an ACS, the EMS should be contacted and not the GP. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Simulation Testing of Embedded Flight Software

    NASA Technical Reports Server (NTRS)

    Shahabuddin, Mohammad; Reinholtz, William

    2004-01-01

    Virtual Real Time (VRT) is a computer program for testing embedded flight software by computational simulation in a workstation, in contradistinction to testing it in its target central processing unit (CPU). The disadvantages of testing in the target CPU include the need for an expensive test bed, the necessity for testers and programmers to take turns using the test bed, and the lack of software tools for debugging in a real-time environment. By virtue of its architecture, most of the flight software of the type in question is amenable to development and testing on workstations, for which there is an abundance of commercially available debugging and analysis software tools. Unfortunately, the timing of a workstation differs from that of a target CPU in a test bed. VRT, in conjunction with closed-loop simulation software, provides a capability for executing embedded flight software on a workstation in a close-to-real-time environment. A scale factor is used to convert between execution time in VRT on a workstation and execution on a target CPU. VRT includes high-resolution operating- system timers that enable the synchronization of flight software with simulation software and ground software, all running on different workstations.

  18. Use of general purpose graphics processing units with MODFLOW

    USGS Publications Warehouse

    Hughes, Joseph D.; White, Jeremy T.

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.

  19. Memory interface simulator: A computer design aid

    NASA Technical Reports Server (NTRS)

    Taylor, D. S.; Williams, T.; Weatherbee, J. E.

    1972-01-01

    Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.

  20. Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

    PubMed

    Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

    2016-04-01

    Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  1. Polydrug use among college students in Brazil: a nationwide survey.

    PubMed

    Oliveira, Lúcio Garcia de; Alberghini, Denis Guilherme; Santos, Bernardo dos; Andrade, Arthur Guerra de

    2013-01-01

    To estimate the frequency of polydrug use (alcohol and illicit drugs) among college students and its associations with gender and age group. A nationwide sample of 12,544 college students was asked to complete a questionnaire on their use of drugs according to three time parameters (lifetime, past 12 months, and last 30 days). The co-use of drugs was investigated as concurrent polydrug use (CPU) and simultaneous polydrug use (SPU), a subcategory of CPU that involves the use of drugs at the same time or in close temporal proximity. Almost 26% of college students reported having engaged in CPU in the past 12 months. Among these students, 37% had engaged in SPU. In the past 30 days, 17% college students had engaged in CPU. Among these, 35% had engaged in SPU. Marijuana was the illicit drug mostly frequently used with alcohol (either as CPU or SPU), especially among males. Among females, the most commonly reported combination was alcohol and prescribed medications. A high proportion of Brazilian college students may be engaging in polydrug use. College administrators should keep themselves informed to be able to identify such use and to develop educational interventions to prevent such behavior.

  2. Spectrum Savings from High Performance Recording and Playback Onboard the Test Article

    DTIC Science & Technology

    2013-02-20

    execute within a Windows 7 environment, and data is recorded on SSDs. The underlying database is implemented using MySQL . Figure 1 illustrates the... MySQL database. This is effectively the time at which the recorded data are available for retransmission. CPU and Memory utilization were collected...17.7% MySQL avg. 3.9% EQDR Total avg. 21.6% Table 1 CPU Utilization with260 Mbits/sec Load The difference between the total System CPU (27.8

  3. Optimum element density studies for finite-element thermal analysis of hypersonic aircraft structures

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Olona, Timothy; Muramoto, Kyle M.

    1990-01-01

    Different finite element models previously set up for thermal analysis of the space shuttle orbiter structure are discussed and their shortcomings identified. Element density criteria are established for the finite element thermal modelings of space shuttle orbiter-type large, hypersonic aircraft structures. These criteria are based on rigorous studies on solution accuracies using different finite element models having different element densities set up for one cell of the orbiter wing. Also, a method for optimization of the transient thermal analysis computer central processing unit (CPU) time is discussed. Based on the newly established element density criteria, the orbiter wing midspan segment was modeled for the examination of thermal analysis solution accuracies and the extent of computation CPU time requirements. The results showed that the distributions of the structural temperatures and the thermal stresses obtained from this wing segment model were satisfactory and the computation CPU time was at the acceptable level. The studies offered the hope that modeling the large, hypersonic aircraft structures using high-density elements for transient thermal analysis is possible if a CPU optimization technique was used.

  4. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    PubMed

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  5. Hypoxia/oxidative stress alters the pharmacokinetics of CPU86017-RS through mitochondrial dysfunction and NADPH oxidase activation.

    PubMed

    Gao, Jie; Ding, Xuan-sheng; Zhang, Yu-mao; Dai, De-zai; Liu, Mei; Zhang, Can; Dai, Yin

    2013-12-01

    Hypoxia/oxidative stress can alter the pharmacokinetics (PK) of CPU86017-RS, a novel antiarrhythmic agent. The aim of this study was to investigate the mechanisms underlying the alteration of PK of CPU86017-RS by hypoxia/oxidative stress. Male SD rats exposed to normal or intermittent hypoxia (10% O2) were administered CPU86017-RS (20, 40 or 80 mg/kg, ig) for 8 consecutive days. The PK parameters of CPU86017-RS were examined on d 8. In a separate set of experiments, female SD rats were injected with isoproterenol (ISO) for 5 consecutive days to induce a stress-related status, then CPU86017-RS (80 mg/kg, ig) was administered, and the tissue distributions were examined. The levels of Mn-SOD (manganese containing superoxide dismutase), endoplasmic reticulum (ER) stress sensor proteins (ATF-6, activating transcription factor 6 and PERK, PRK-like ER kinase) and activation of NADPH oxidase (NOX) were detected with Western blotting. Rat liver microsomes were incubated under N2 for in vitro study. The Cmax, t1/2, MRT (mean residence time) and AUC (area under the curve) of CPU86017-RS were significantly increased in the hypoxic rats receiving the 3 different doses of CPU86017-RS. The hypoxia-induced alteration of PK was associated with significantly reduced Mn-SOD level, and increased ATF-6, PERK and NOX levels. In ISO-treated rats, the distributions of CPU86017-RS in plasma, heart, kidney, and liver were markedly increased, and NOX levels in heart, kidney, and liver were significantly upregulated. Co-administration of the NOX blocker apocynin eliminated the abnormalities in the PK and tissue distributions of CPU86017-RS induced by hypoxia/oxidative stress. The metabolism of CPU86017-RS in the N2-treated liver microsomes was significantly reduced, addition of N-acetylcysteine (NAC), but not vitamin C, effectively reversed this change. The altered PK and metabolism of CPU86017-RS induced by hypoxia/oxidative stress are produced by mitochondrial abnormalities, NOX activation and ER stress; these abnormalities are significantly alleviated by apocynin or NAC.

  6. Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

    NASA Astrophysics Data System (ADS)

    Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

    2013-03-01

    The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.

  7. Assessment of Linear Finite-Difference Poisson-Boltzmann Solvers

    PubMed Central

    Wang, Jun; Luo, Ray

    2009-01-01

    CPU time and memory usage are two vital issues that any numerical solvers for the Poisson-Boltzmann equation have to face in biomolecular applications. In this study we systematically analyzed the CPU time and memory usage of five commonly used finite-difference solvers with a large and diversified set of biomolecular structures. Our comparative analysis shows that modified incomplete Cholesky conjugate gradient and geometric multigrid are the most efficient in the diversified test set. For the two efficient solvers, our test shows that their CPU times increase approximately linearly with the numbers of grids. Their CPU times also increase almost linearly with the negative logarithm of the convergence criterion at very similar rate. Our comparison further shows that geometric multigrid performs better in the large set of tested biomolecules. However, modified incomplete Cholesky conjugate gradient is superior to geometric multigrid in molecular dynamics simulations of tested molecules. We also investigated other significant components in numerical solutions of the Poisson-Boltzmann equation. It turns out that the time-limiting step is the free boundary condition setup for the linear systems for the selected proteins if the electrostatic focusing is not used. Thus, development of future numerical solvers for the Poisson-Boltzmann equation should balance all aspects of the numerical procedures in realistic biomolecular applications. PMID:20063271

  8. Exact diagonalization of quantum lattice models on coprocessors

    NASA Astrophysics Data System (ADS)

    Siro, T.; Harju, A.

    2016-10-01

    We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.

  9. Instrumentation complex for Langley Research Center's National Transonic Facility

    NASA Technical Reports Server (NTRS)

    Russell, C. H.; Bryant, C. S.

    1977-01-01

    The instrumentation discussed in the present paper was developed to ensure reliable operation for a 2.5-meter cryogenic high-Reynolds-number fan-driven transonic wind tunnel. It will incorporate four CPU's and associated analog and digital input/output equipment, necessary for acquiring research data, controlling the tunnel parameters, and monitoring the process conditions. Connected in a multipoint distributed network, the CPU's will support data base management and processing; research measurement data acquisition and display; process monitoring; and communication control. The design will allow essential processes to continue, in the case of major hardware failures, by switching input/output equipment to alternate CPU's and by eliminating nonessential functions. It will also permit software modularization by CPU activity and thereby reduce complexity and development time.

  10. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  11. Dense GPU-enhanced surface reconstruction from stereo endoscopic images for intraoperative registration.

    PubMed

    Rohl, Sebastian; Bodenstedt, Sebastian; Suwelack, Stefan; Dillmann, Rudiger; Speidel, Stefanie; Kenngott, Hannes; Muller-Stich, Beat P

    2012-03-01

    In laparoscopic surgery, soft tissue deformations substantially change the surgical site, thus impeding the use of preoperative planning during intraoperative navigation. Extracting depth information from endoscopic images and building a surface model of the surgical field-of-view is one way to represent this constantly deforming environment. The information can then be used for intraoperative registration. Stereo reconstruction is a typical problem within computer vision. However, most of the available methods do not fulfill the specific requirements in a minimally invasive setting such as the need of real-time performance, the problem of view-dependent specular reflections and large curved areas with partly homogeneous or periodic textures and occlusions. In this paper, the authors present an approach toward intraoperative surface reconstruction based on stereo endoscopic images. The authors describe our answer to this problem through correspondence analysis, disparity correction and refinement, 3D reconstruction, point cloud smoothing and meshing. Real-time performance is achieved by implementing the algorithms on the gpu. The authors also present a new hybrid cpu-gpu algorithm that unifies the advantages of the cpu and the gpu version. In a comprehensive evaluation using in vivo data, in silico data from the literature and virtual data from a newly developed simulation environment, the cpu, the gpu, and the hybrid cpu-gpu versions of the surface reconstruction are compared to a cpu and a gpu algorithm from the literature. The recommended approach toward intraoperative surface reconstruction can be conducted in real-time depending on the image resolution (20 fps for the gpu and 14fps for the hybrid cpu-gpu version on resolution of 640 × 480). It is robust to homogeneous regions without texture, large image changes, noise or errors from camera calibration, and it reconstructs the surface down to sub millimeter accuracy. In all the experiments within the simulation environment, the mean distance to ground truth data is between 0.05 and 0.6 mm for the hybrid cpu-gpu version. The hybrid cpu-gpu algorithm shows a much more superior performance than its cpu and gpu counterpart (mean distance reduction 26% and 45%, respectively, for the experiments in the simulation environment). The recommended approach for surface reconstruction is fast, robust, and accurate. It can represent changes in the intraoperative environment and can be used to adapt a preoperative model within the surgical site by registration of these two models.

  12. A performance comparison of the IBM RS/6000 and the Astronautics ZS-1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, W.M.; Abraham, S.G.; Davidson, E.S.

    1991-01-01

    Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less

  13. Wang-Landau sampling: Saving CPU time

    NASA Astrophysics Data System (ADS)

    Ferreira, L. S.; Jorge, L. N.; Leão, S. A.; Caparica, A. A.

    2018-04-01

    In this work we propose an improvement to the Wang-Landau (WL) method that allows an economy in CPU time of about 60% leading to the same results with the same accuracy. We used the 2D Ising model to show that one can initiate all WL simulations using the outputs of an advanced WL level from a previous simulation. We showed that up to the seventh WL level (f6) the simulations are not biased yet and can proceed to any value that the simulation from the very beginning would reach. As a result the initial WL levels can be simulated just once. It was also observed that the saving in CPU time is larger for larger lattice sizes, exactly where the computational cost is considerable. We carried out high-resolution simulations beginning initially from the first WL level (f0) and another beginning from the eighth WL level (f7) using all the data at the end of the previous level and showed that the results for the critical temperature Tc and the critical static exponents β and γ coincide within the error bars. Finally we applied the same procedure to the 1/2-spin Baxter-Wu model and the economy in CPU time was of about 64%.

  14. Evaluation of the CPU time for solving the radiative transfer equation with high-order resolution schemes applying the normalized weighting-factor method

    NASA Astrophysics Data System (ADS)

    Xamán, J.; Zavala-Guillén, I.; Hernández-López, I.; Uriarte-Flores, J.; Hernández-Pérez, I.; Macías-Melo, E. V.; Aguilar-Castro, K. M.

    2018-03-01

    In this paper, we evaluated the convergence rate (CPU time) of a new mathematical formulation for the numerical solution of the radiative transfer equation (RTE) with several High-Order (HO) and High-Resolution (HR) schemes. In computational fluid dynamics, this procedure is known as the Normalized Weighting-Factor (NWF) method and it is adopted here. The NWF method is used to incorporate the high-order resolution schemes in the discretized RTE. The NWF method is compared, in terms of computer time needed to obtain a converged solution, with the widely used deferred-correction (DC) technique for the calculations of a two-dimensional cavity with emitting-absorbing-scattering gray media using the discrete ordinates method. Six parameters, viz. the grid size, the order of quadrature, the absorption coefficient, the emissivity of the boundary surface, the under-relaxation factor, and the scattering albedo are considered to evaluate ten schemes. The results showed that using the DC method, in general, the scheme that had the lowest CPU time is the SOU. In contrast, with the results of theDC procedure the CPU time for DIAMOND and QUICK schemes using the NWF method is shown to be, between the 3.8 and 23.1% faster and 12.6 and 56.1% faster, respectively. However, the other schemes are more time consuming when theNWFis used instead of the DC method. Additionally, a second test case was presented and the results showed that depending on the problem under consideration, the NWF procedure may be computationally faster or slower that the DC method. As an example, the CPU time for QUICK and SMART schemes are 61.8 and 203.7%, respectively, slower when the NWF formulation is used for the second test case. Finally, future researches to explore the computational cost of the NWF method in more complex problems are required.

  15. On localization attacks against cloud infrastructure

    NASA Astrophysics Data System (ADS)

    Ge, Linqiang; Yu, Wei; Sistani, Mohammad Ali

    2013-05-01

    One of the key characteristics of cloud computing is the device and location independence that enables the user to access systems regardless of their location. Because cloud computing is heavily based on sharing resource, it is vulnerable to cyber attacks. In this paper, we investigate a localization attack that enables the adversary to leverage central processing unit (CPU) resources to localize the physical location of server used by victims. By increasing and reducing CPU usage through the malicious virtual machine (VM), the response time from the victim VM will increase and decrease correspondingly. In this way, by embedding the probing signal into the CPU usage and correlating the same pattern in the response time from the victim VM, the adversary can find the location of victim VM. To determine attack accuracy, we investigate features in both the time and frequency domains. We conduct both theoretical and experimental study to demonstrate the effectiveness of such an attack.

  16. CUDA Fortran acceleration for the finite-difference time-domain method

    NASA Astrophysics Data System (ADS)

    Hadi, Mohammed F.; Esmaeili, Seyed A.

    2013-05-01

    A detailed description of programming the three-dimensional finite-difference time-domain (FDTD) method to run on graphical processing units (GPUs) using CUDA Fortran is presented. Two FDTD-to-CUDA thread-block mapping designs are investigated and their performances compared. Comparative assessment of trade-offs between GPU's shared memory and L1 cache is also discussed. This presentation is for the benefit of FDTD programmers who work exclusively with Fortran and are reluctant to port their codes to C in order to utilize GPU computing. The derived CUDA Fortran code is compared with an optimized CPU version that runs on a workstation-class CPU to present a realistic GPU to CPU run time comparison and thus help in making better informed investment decisions on FDTD code redesigns and equipment upgrades. All analyses are mirrored with CUDA C simulations to put in perspective the present state of CUDA Fortran development.

  17. A report documenting the completion of the Los Alamos National Laboratory portion of the ASC level II milestone ""Visualization on the supercomputing platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahrens, James P; Patchett, John M; Lo, Li - Ta

    2011-01-24

    This report provides documentation for the completion of the Los Alamos portion of the ASC Level II 'Visualization on the Supercomputing Platform' milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratory and Los Alamos National Laboratory. The milestone text is shown in Figure 1 with the Los Alamos portions highlighted in boldfaced text. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is the most computationally intensive portion of the visualization process. Formore » terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the perfromance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. In conclusion, we improved CPU-based rendering performance by a a factor of 2-10 times on our tests. In addition, we evaluated CPU and CPU-based rendering performance. We encourage production visualization experts to consider using CPU-based rendering solutions when it is appropriate. For example, on remote supercomputers CPU-based rendering can offer a means of viewing data without having to offload the data or geometry onto a CPU-based visualization system. In terms of comparative performance of the CPU and CPU we believe that further optimizations of the performance of both CPU or CPU-based rendering are possible. The simulation community is currently confronting this reality as they work to port their simulations to different hardware architectures. What is interesting about CPU rendering of massive datasets is that for part two decades CPU performance has significantly outperformed CPU-based systems. Based on our advancements, evaluations and explorations we believe that CPU-based rendering has returned as one viable option for the visualization of massive datasets.« less

  18. Application of queueing models to multiprogrammed computer systems operating in a time-critical environment

    NASA Technical Reports Server (NTRS)

    Eckhardt, D. E., Jr.

    1979-01-01

    A model of a central processor (CPU) which services background applications in the presence of time critical activity is presented. The CPU is viewed as an M/M/1 queueing system subject to periodic interrupts by deterministic, time critical process. The Laplace transform of the distribution of service times for the background applications is developed. The use of state of the art queueing models for studying the background processing capability of time critical computer systems is discussed and the results of a model validation study which support this application of queueing models are presented.

  19. Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

    PubMed

    Robinson, Kelly M; Hawkins, Aziah S; Santana-Cruz, Ivette; Adkins, Ricky S; Shetty, Amol C; Nagaraj, Sushma; Sadzewicz, Lisa; Tallon, Luke J; Rasko, David A; Fraser, Claire M; Mahurkar, Anup; Silva, Joana C; Dunning Hotopp, Julie C

    2017-09-01

    As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi ) and one minority member (i.e. human or the Wolbachia endosymbiont w Bm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium , at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium- human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

  20. An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Guangye; Chacon, Luis; Barnes, Daniel C

    2012-01-01

    Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been developed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230, 18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver and is capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle orbit integrations from the field solver, while remaining fully self-consistent. This provides great flexibility, and dramatically improves the solver efficiency by reducing the degrees of freedom of the associated nonlinear system. However, it requires a particle push per nonlinearmore » residual evaluation, which makes the particle push the most time-consuming operation in the algorithm. This paper describes a very efficient mixed-precision, hybrid CPU-GPU implementation of the implicit PIC algorithm. The JFNK solver is kept on the CPU (in double precision), while the inherent data parallelism of the particle mover is exploited by implementing it in single-precision on a graphics processing unit (GPU) using CUDA. Performance-oriented optimizations, with the aid of an analytical performance model, the roofline model, are employed. Despite being highly dynamic, the adaptive, charge-conserving particle mover algorithm achieves up to 300 400 GOp/s (including single-precision floating-point, integer, and logic operations) on a Nvidia GeForce GTX580, corresponding to 20 25% absolute GPU efficiency (against the peak theoretical performance) and 50-70% intrinsic efficiency (against the algorithm s maximum operational throughput, which neglects all latencies). This is about 200-300 times faster than an equivalent serial CPU implementation. When the single-precision GPU particle mover is combined with a double-precision CPU JFNK field solver, overall performance gains 100 vs. the double-precision CPU-only serial version are obtained, with no apparent loss of robustness or accuracy when applied to a challenging long-time scale ion acoustic wave simulation.« less

  1. Finite and spectral cell method for wave propagation in heterogeneous materials

    NASA Astrophysics Data System (ADS)

    Joulaian, Meysam; Duczek, Sascha; Gabbert, Ulrich; Düster, Alexander

    2014-09-01

    In the current paper we present a fast, reliable technique for simulating wave propagation in complex structures made of heterogeneous materials. The proposed approach, the spectral cell method, is a combination of the finite cell method and the spectral element method that significantly lowers preprocessing and computational expenditure. The spectral cell method takes advantage of explicit time-integration schemes coupled with a diagonal mass matrix to reduce the time spent on solving the equation system. By employing a fictitious domain approach, this method also helps to eliminate some of the difficulties associated with mesh generation. Besides introducing a proper, specific mass lumping technique, we also study the performance of the low-order and high-order versions of this approach based on several numerical examples. Our results show that the high-order version of the spectral cell method together requires less memory storage and less CPU time than other possible versions, when combined simultaneously with explicit time-integration algorithms. Moreover, as the implementation of the proposed method in available finite element programs is straightforward, these properties turn the method into a viable tool for practical applications such as structural health monitoring [1-3], quantitative ultrasound applications [4], or the active control of vibrations and noise [5, 6].

  2. Self-organized neural maps of human protein sequences.

    PubMed Central

    Ferrán, E. A.; Pflugfelder, B.; Ferrara, P.

    1994-01-01

    We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis. PMID:8019421

  3. The METAL System. Volume I and Volume II. Appendices.

    DTIC Science & Technology

    1981-01-01

    demands , and fair CPU time were measured. The fair measure reported here includes the pure CPU time plus a pro-rated portion of the time consumed by the...syntactic class or the form matched . NO = noun VB = verb OTR = other part of speech IT-12 Although the above feature is not used by the system at present...indicate the syntactic class of the form matched . NO = noun other than gerund ("content", "dark", "African") INF = infinitive ("direct", "equal", "content

  4. eWaterCycle: A high resolution global hydrological model

    NASA Astrophysics Data System (ADS)

    van de Giesen, Nick; Bierkens, Marc; Drost, Niels; Hut, Rolf; Sutanudjaja, Edwin

    2014-05-01

    In 2013, the eWaterCycle project was started, which has the ambitious goal to run a high resolution global hydrological model. Starting point was the PCR-GLOBWB built by Utrecht University. The software behind this model will partially be re-engineered in order to enable to run it in a High Performance Computing (HPC) environment. The aim is to have a spatial resolution of 1km x 1km. The idea is also to run the model in real-time and forecasting mode, using data assimilation. An on-demand hydraulic model will be available for detailed flow and flood forecasting in support of navigation and disaster management. The project faces a set of scientific challenges. First, to enable the model to run in a HPC environment, model runs were analyzed to examine on which parts of the program most CPU time was spent. These parts were re-coded in Open MPI to allow for parallel processing. Different parallelization strategies are thinkable. In our case, it was decided to use watershed logic as a first step to distribute the analysis. There is rather limited recent experience with HPC in hydrology and there is much to be learned and adjusted, both on the hydrological modeling side and the computer science side. For example, an interesting early observation was that hydrological models are, due to their localized parameterization, much more memory intensive than models of sister-disciplines such as meteorology and oceanography. Because it would be deadly to have to swap information between CPU and hard drive, memory management becomes crucial. A standard Ensemble Kalman Filter (enKF) would, for example, have excessive memory demands. To circumvent these problems, an alternative to the enKF was developed that produces equivalent results. This presentation shows the most recent results from the model, including a 5km x 5km simulation and a proof of concept for the new data assimilation approach. Finally, some early ideas about financial sustainability of an operational global hydrological model are presented.

  5. High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.

    PubMed

    Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D

    2008-08-01

    The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.

  6. A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

    PubMed

    Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

    2017-06-16

    Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.

  7. Design Alternatives to Improve Access Time Performance of Disk Drives Under DOS and UNIX

    NASA Astrophysics Data System (ADS)

    Hospodor, Andy

    For the past 25 years, improvements in CPU performance have overshadowed improvements in the access time performance of disk drives. CPU performance has been slanted towards greater instruction execution rates, measured in millions of instructions per second (MIPS). However, the slant for performance of disk storage has been towards capacity and corresponding increased storage densities. The IBM PC, introduced in 1982, processed only a fraction of a MIP. Follow-on CPUs, such as the 80486 and 80586, sported 5-10 MIPS by 1992. Single user PCs and workstations, with one CPU and one disk drive, became the dominant application, as implied by their production volumes. However, disk drives did not enjoy a corresponding improvement in access time performance, although the potential still exists. The time to access a disk drive improves (decreases) in two ways: by altering the mechanical properties of the drive or by adding cache to the drive. This paper explores the improvement to access time performance of disk drives using cache, prefetch, faster rotation rates, and faster seek acceleration.

  8. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

    PubMed Central

    Hadjis, Stefan; Abuzaid, Firas; Zhang, Ce; Ré, Christopher

    2016-01-01

    We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5× throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs. PMID:27314106

  9. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.

    PubMed

    Hadjis, Stefan; Abuzaid, Firas; Zhang, Ce; Ré, Christopher

    2015-01-01

    We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5× throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

  10. OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

    NASA Astrophysics Data System (ADS)

    Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

    2017-11-01

    We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for six different trap symmetries: axially and radially symmetric traps in 3d, circularly symmetric traps in 2d, fully isotropic (spherically symmetric) and fully anisotropic traps in 2d and 3d, as well as 1d traps, where no spatial symmetry is considered. Solution method: We employ the split-step Crank-Nicolson algorithm to discretize the time-dependent GP equation in space and time. The discretized equation is then solved by imaginary- or real-time propagation, employing adequately small space and time steps, to yield the solution of stationary and non-stationary problems, respectively. Reasons for the new version: Previously published Fortran programs [1,2] have now become popular tools [3] for solving the GP equation. These programs have been translated to the C programming language [4] and later extended to the more complex scenario of dipolar atoms [5]. Now virtually all computers have multi-core processors and some have motherboards with more than one physical computer processing unit (CPU), which may increase the number of available CPU cores on a single computer to several tens. The C programs have been adopted to be very fast on such multi-core modern computers using general-purpose graphic processing units (GPGPU) with Nvidia CUDA and computer clusters using Message Passing Interface (MPI) [6]. Nevertheless, previously developed Fortran programs are also commonly used for scientific computation and most of them use a single CPU core at a time in modern multi-core laptops, desktops, and workstations. Unless the Fortran programs are made aware and capable of making efficient use of the available CPU cores, the solution of even a realistic dynamical 1d problem, not to mention the more complicated 2d and 3d problems, could be time consuming using the Fortran programs. Previously, we published auto-parallel Fortran programs [2] suitable for Intel (but not GNU) compiler for solving the GP equation. Hence, a need for the full OpenMP version of the Fortran programs to reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of actually used dimensionless nonlinearities from the physical input parameters, where the input parameters correspond to the identical nonlinearity values as in the previously published programs [1], so that the output files of the old and new programs can be directly compared. The output files are conveniently named such that their contents can be easily identified, following the naming convention introduced in Ref. [2]. For example, a file named -out.txt, where is a name of the individual program, represents the general output file containing input data, time and space steps, nonlinearity, energy and chemical potential, and was named fort.7 in the old Fortran version of programs [1]. A file named -den.txt is the output file with the condensate density, which had the names fort.3 and fort.4 in the old Fortran version [1] for imaginary- and real-time propagation programs, respectively. Other possible density outputs, such as the initial density, are commented out in the programs to have a simpler set of output files, but users can uncomment and re-enable them, if needed. In addition, there are output files for reduced (integrated) 1d and 2d densities for different programs. In the real-time programs there is also an output file reporting the dynamics of evolution of root-mean-square sizes after a perturbation is introduced. The supplied real-time programs solve the stationary GP equation, and then calculate the dynamics. As the imaginary-time programs are more accurate than the real-time programs for the solution of a stationary problem, one can first solve the stationary problem using the imaginary-time programs, adapt the real-time programs to read the pre-calculated wave function and then study the dynamics. In that case the parameter NSTP in the real-time programs should be set to zero and the space mesh and nonlinearity parameters should be identical in both programs. The reader is advised to consult our previous publication where a complete description of the output files is given [2]. A readme.txt file, included in the root directory, explains the procedure to compile and run the programs. We tested our programs on a workstation with two 10-core Intel Xeon E5-2650 v3 CPUs. The parameters used for testing are given in sample input files, provided in the corresponding directory together with the programs. In Table 1 we present wall-clock execution times for runs on 1, 6, and 19 CPU cores for programs compiled using Intel and GNU Fortran compilers. The corresponding columns "Intel speedup" and "GNU speedup" give the ratio of wall-clock execution times of runs on 1 and 19 CPU cores, and denote the actual measured speedup for 19 CPU cores. In all cases and for all numbers of CPU cores, although the GNU Fortran compiler gives excellent results, the Intel Fortran compiler turns out to be slightly faster. Note that during these tests we always ran only a single simulation on a workstation at a time, to avoid any possible interference issues. Therefore, the obtained wall-clock times are more reliable than the ones that could be measured with two or more jobs running simultaneously. We also studied the speedup of the programs as a function of the number of CPU cores used. The performance of the Intel and GNU Fortran compilers is illustrated in Fig. 1, where we plot the speedup and actual wall-clock times as functions of the number of CPU cores for 2d and 3d programs. We see that the speedup increases monotonically with the number of CPU cores in all cases and has large values (between 10 and 14 for 3d programs) for the maximal number of cores. This fully justifies the development of OpenMP programs, which enable much faster and more efficient solving of the GP equation. However, a slow saturation in the speedup with the further increase in the number of CPU cores is observed in all cases, as expected. The speedup tends to increase for programs in higher dimensions, as they become more complex and have to process more data. This is why the speedups of the supplied 2d and 3d programs are larger than those of 1d programs. Also, for a single program the speedup increases with the size of the spatial grid, i.e., with the number of spatial discretization points, since this increases the amount of calculations performed by the program. To demonstrate this, we tested the supplied real2d-th program and varied the number of spatial discretization points NX=NY from 20 to 1000. The measured speedup obtained when running this program on 19 CPU cores as a function of the number of discretization points is shown in Fig. 2. The speedup first increases rapidly with the number of discretization points and eventually saturates. Additional comments: Example inputs provided with the programs take less than 30 minutes to run on a workstation with two Intel Xeon E5-2650 v3 processors (2 QPI links, 10 CPU cores, 25 MB cache, 2.3 GHz).

  11. Techniques for increasing the efficiency of Earth gravity calculations for precision orbit determination

    NASA Technical Reports Server (NTRS)

    Smith, R. L.; Lyubomirsky, A. S.

    1981-01-01

    Two techniques were analyzed. The first is a representation using Chebyshev expansions in three-dimensional cells. The second technique employs a temporary file for storing the components of the nonspherical gravity force. Computer storage requirements and relative CPU time requirements are presented. The Chebyshev gravity representation can provide a significant reduction in CPU time in precision orbit calculations, but at the cost of a large amount of direct-access storage space, which is required for a global model.

  12. Personal Computer and Workstation Operating Systems Tutorial

    DTIC Science & Technology

    1994-03-01

    to a RAM area where it is executed by the CPU. The program consists of instructions that perform operations on data. The CPU will perform two basic...memory to improve system performance. More often the user will buy a new fixed disk so the computer will hold more programs internally. The trend today...MHZ. Another way to view how fast the information is going into the register is in a time domain rather than a frequency domain knowing that time and

  13. Lossless data compression for improving the performance of a GPU-based beamformer.

    PubMed

    Lok, U-Wai; Fan, Gang-Wei; Li, Pai-Chi

    2015-04-01

    The powerful parallel computation ability of a graphics processing unit (GPU) makes it feasible to perform dynamic receive beamforming However, a real time GPU-based beamformer requires high data rate to transfer radio-frequency (RF) data from hardware to software memory, as well as from central processing unit (CPU) to GPU memory. There are data compression methods (e.g. Joint Photographic Experts Group (JPEG)) available for the hardware front end to reduce data size, alleviating the data transfer requirement of the hardware interface. Nevertheless, the required decoding time may even be larger than the transmission time of its original data, in turn degrading the overall performance of the GPU-based beamformer. This article proposes and implements a lossless compression-decompression algorithm, which enables in parallel compression and decompression of data. By this means, the data transfer requirement of hardware interface and the transmission time of CPU to GPU data transfers are reduced, without sacrificing image quality. In simulation results, the compression ratio reached around 1.7. The encoder design of our lossless compression approach requires low hardware resources and reasonable latency in a field programmable gate array. In addition, the transmission time of transferring data from CPU to GPU with the parallel decoding process improved by threefold, as compared with transferring original uncompressed data. These results show that our proposed lossless compression plus parallel decoder approach not only mitigate the transmission bandwidth requirement to transfer data from hardware front end to software system but also reduce the transmission time for CPU to GPU data transfer. © The Author(s) 2014.

  14. Benchmarking worker nodes using LHCb productions and comparing with HEPSpec06

    NASA Astrophysics Data System (ADS)

    Charpentier, P.

    2017-10-01

    In order to estimate the capabilities of a computing slot with limited processing time, it is necessary to know with a rather good precision its “power”. This allows for example pilot jobs to match a task for which the required CPU-work is known, or to define the number of events to be processed knowing the CPU-work per event. Otherwise one always has the risk that the task is aborted because it exceeds the CPU capabilities of the resource. It also allows a better accounting of the consumed resources. The traditional way the CPU power is estimated in WLCG since 2007 is using the HEP-Spec06 benchmark (HS06) suite that was verified at the time to scale properly with a set of typical HEP applications. However, the hardware architecture of processors has evolved, all WLCG experiments moved to using 64-bit applications and use different compilation flags from those advertised for running HS06. It is therefore interesting to check the scaling of HS06 with the HEP applications. For this purpose, we have been using CPU intensive massive simulation productions from the LHCb experiment and compared their event throughput to the HS06 rating of the worker nodes. We also compared it with a much faster benchmark script that is used by the DIRAC framework used by LHCb for evaluating at run time the performance of the worker nodes. This contribution reports on the finding of these comparisons: the main observation is that the scaling with HS06 is no longer fulfilled, while the fast benchmarks have a better scaling but are less precise. One can also clearly see that some hardware or software features when enabled on the worker nodes may enhance their performance beyond expectation from either benchmark, depending on external factors.

  15. Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    2000-01-01

    The shared memory Multi-Level Parallelism (MLP) technique, developed last year at NASA Ames has been very successful in dramatically improving the performance of important NASA CFD codes. This new and very simple parallel programming technique was first inserted into the OVERFLOW production CFD code in FY 1998. The OVERFLOW-MLP code's parallel performance scaled linearly to 256 CPUs on the NASA Ames 256 CPU Origin 2000 system (steger). Overall performance exceeded 20.1 GFLOP/s, or about 4.5x the performance of a dedicated 16 CPU C90 system. All of this was achieved without any major modification to the original vector based code. The OVERFLOW-MLP code is now in production on the inhouse Origin systems as well as being used offsite at commercial aerospace companies. Partially as a result of this work, NASA Ames has purchased a new 512 CPU Origin 2000 system to further test the limits of parallel performance for NASA codes of interest. This paper presents the performance obtained from the latest optimization efforts on this machine for the LAURA-MLP and OVERFLOW-MLP codes. The Langley Aerothermodynamics Upwind Relaxation Algorithm (LAURA) code is a key simulation tool in the development of the next generation shuttle, interplanetary reentry vehicles, and nearly all "X" plane development. This code sustains about 4-5 GFLOP/s on a dedicated 16 CPU C90. At this rate, expected workloads would require over 100 C90 CPU years of computing over the next few calendar years. It is not feasible to expect that this would be affordable or available to the user community. Dramatic performance gains on cheaper systems are needed. This code is expected to be perhaps the largest consumer of NASA Ames compute cycles per run in the coming year.The OVERFLOW CFD code is extensively used in the government and commercial aerospace communities to evaluate new aircraft designs. It is one of the largest consumers of NASA supercomputing cycles and large simulations of highly resolved full aircraft are routinely undertaken. Typical large problems might require 100s of Cray C90 CPU hours to complete. The dramatic performance gains with the 256 CPU steger system are exciting. Obtaining results in hours instead of months is revolutionizing the way in which aircraft manufacturers are looking at future aircraft simulation work. Figure 2 below is a current state of the art plot of OVERFLOW-MLP performance on the 512 CPU Lomax system. As can be seen, the chart indicates that OVERFLOW-MLP continues to scale linearly with CPU count up to 512 CPUs on a large 35 million point full aircraft RANS simulation. At this point performance is such that a fully converged simulation of 2500 time steps is completed in less than 2 hours of elapsed time. Further work over the next few weeks will improve the performance of this code even further.The LAURA code has been converted to the MLP format as well. This code is currently being optimized for the 512 CPU system. Performance statistics indicate that the goal of 100 GFLOP/s will be achieved by year's end. This amounts to 20x the 16 CPU C90 result and strongly demonstrates the viability of the new parallel systems rapidly solving very large simulations in a production environment.

  16. Inexact adaptive Newton methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertiger, W.I.; Kelsey, F.J.

    1985-02-01

    The Inexact Adaptive Newton method (IAN) is a modification of the Adaptive Implicit Method/sup 1/ (AIM) with improved Newton convergence. Both methods simplify the Jacobian at each time step by zeroing coefficients in regions where saturations are changing slowly. The methods differ in how the diagonal block terms are treated. On test problems with up to 3,000 cells, IAN consistently saves approximately 30% of the CPU time when compared to the fully implicit method. AIM shows similar savings on some problems, but takes as much CPU time as fully implicit on other test problems due to poor Newton convergence.

  17. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

    PubMed

    Wu, Xin; Koslowski, Axel; Thiel, Walter

    2012-07-10

    In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.

  18. A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.

    PubMed

    Nagaoka, Tomoaki; Watanabe, Soichi

    2010-01-01

    Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.

  19. Research on control law accelerator of digital signal process chip TMS320F28035 for real-time data acquisition and processing

    NASA Astrophysics Data System (ADS)

    Zhao, Shuangle; Zhang, Xueyi; Sun, Shengli; Wang, Xudong

    2017-08-01

    TI C2000 series digital signal process (DSP) chip has been widely used in electrical engineering, measurement and control, communications and other professional fields, DSP TMS320F28035 is one of the most representative of a kind. When using the DSP program, need data acquisition and data processing, and if the use of common mode C or assembly language programming, the program sequence, analogue-to-digital (AD) converter cannot be real-time acquisition, often missing a lot of data. The control low accelerator (CLA) processor can run in parallel with the main central processing unit (CPU), and the frequency is consistent with the main CPU, and has the function of floating point operations. Therefore, the CLA coprocessor is used in the program, and the CLA kernel is responsible for data processing. The main CPU is responsible for the AD conversion. The advantage of this method is to reduce the time of data processing and realize the real-time performance of data acquisition.

  20. Restricted Collision List method for faster Direct Simulation Monte-Carlo (DSMC) collisions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Macrossan, Michael N., E-mail: m.macrossan@uq.edu.au

    The ‘Restricted Collision List’ (RCL) method for speeding up the calculation of DSMC Variable Soft Sphere collisions, with Borgnakke–Larsen (BL) energy exchange, is presented. The method cuts down considerably on the number of random collision parameters which must be calculated (deflection and azimuthal angles, and the BL energy exchange factors). A relatively short list of these parameters is generated and the parameters required in any cell are selected from this list. The list is regenerated at intervals approximately equal to the smallest mean collision time in the flow, and the chance of any particle re-using the same collision parameters inmore » two successive collisions is negligible. The results using this method are indistinguishable from those obtained with standard DSMC. The CPU time saving depends on how much of a DSMC calculation is devoted to collisions and how much is devoted to other tasks, such as moving particles and calculating particle interactions with flow boundaries. For 1-dimensional calculations of flow in a tube, the new method saves 20% of the CPU time per collision for VSS scattering with no energy exchange. With RCL applied to rotational energy exchange, the CPU saving can be greater; for small values of the rotational collision number, for which most collisions involve some rotational energy exchange, the CPU may be reduced by 50% or more.« less

  1. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  2. SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moriya, S; Sato, M; Tachibana, H

    Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation runningmore » on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.« less

  3. The numerical study and comparison of radial basis functions in applications of the dual reciprocity boundary element method to convection-diffusion problems

    NASA Astrophysics Data System (ADS)

    Chanthawara, Krittidej; Kaennakham, Sayan; Toutip, Wattana

    2016-02-01

    The methodology of Dual Reciprocity Boundary Element Method (DRBEM) is applied to the convection-diffusion problems and investigating its performance is our first objective of the work. Seven types of Radial Basis Functions (RBF); Linear, Thin-plate Spline, Cubic, Compactly Supported, Inverse Multiquadric, Quadratic, and that proposed by [12], were closely investigated in order to numerically compare their effectiveness drawbacks etc. and this is taken as our second objective. A sufficient number of simulations were performed covering as many aspects as possible. Varidated against both exacts and other numerical works, the final results imply strongly that the Thin-Plate Spline and Linear type of RBF are superior to others in terms of both solutions' quality and CPU-time spent while the Inverse Multiquadric seems to poorly yield the results. It is also found that DRBEM can perform relatively well at moderate level of convective force and as anticipated becomes unstable when the problem becomes more convective-dominated, as normally found in all classical mesh-dependence methods.

  4. An evaluation of superminicomputers for thermal analysis

    NASA Technical Reports Server (NTRS)

    Storaasli, O. O.; Vidal, J. B.; Jones, G. K.

    1962-01-01

    The feasibility and cost effectiveness of solving thermal analysis problems on superminicomputers is demonstrated. Conventional thermal analysis and the changing computer environment, computer hardware and software used, six thermal analysis test problems, performance of superminicomputers (CPU time, accuracy, turnaround, and cost) and comparison with large computers are considered. Although the CPU times for superminicomputers were 15 to 30 times greater than the fastest mainframe computer, the minimum cost to obtain the solutions on superminicomputers was from 11 percent to 59 percent of the cost of mainframe solutions. The turnaround (elapsed) time is highly dependent on the computer load, but for large problems, superminicomputers produced results in less elapsed time than a typically loaded mainframe computer.

  5. Airloads on Bluff Bodies, with Application to the Rotor-Induced Downloads on Tilt-Rotor Aircraft.

    DTIC Science & Technology

    1983-09-01

    interference aerodynamics would be tion on hover performance (Ref. (11). to study the two-dimensional sec- tion characteristics of a wing in the wake of a...resources for large numbers of vortices; a typical case requires 10-15 min CPU time on the Ames Cray IS computer. Figure 6 shows a typical result. Here...CPU time per case on a Prime 550UPPER SURFACE (WINDWARD) computer to converge to a steady solution; this would be equivalent to one or two seconds on

  6. Dosimetric comparison of helical tomotherapy treatment plans for total marrow irradiation created using GPU and CPU dose calculation engines.

    PubMed

    Nalichowski, Adrian; Burmeister, Jay

    2013-07-01

    To compare optimization characteristics, plan quality, and treatment delivery efficiency between total marrow irradiation (TMI) plans using the new TomoTherapy graphic processing unit (GPU) based dose engine and CPU/cluster based dose engine. Five TMI plans created on an anthropomorphic phantom were optimized and calculated with both dose engines. The planning treatment volume (PTV) included all the bones from head to mid femur except for upper extremities. Evaluated organs at risk (OAR) consisted of lung, liver, heart, kidneys, and brain. The following treatment parameters were used to generate the TMI plans: field widths of 2.5 and 5 cm, modulation factors of 2 and 2.5, and pitch of either 0.287 or 0.43. The optimization parameters were chosen based on the PTV and OAR priorities and the plans were optimized with a fixed number of iterations. The PTV constraint was selected to ensure that at least 95% of the PTV received the prescription dose. The plans were evaluated based on D80 and D50 (dose to 80% and 50% of the OAR volume, respectively) and hotspot volumes within the PTVs. Gamma indices (Γ) were also used to compare planar dose distributions between the two modalities. The optimization and dose calculation times were compared between the two systems. The treatment delivery times were also evaluated. The results showed very good dosimetric agreement between the GPU and CPU calculated plans for any of the evaluated planning parameters indicating that both systems converge on nearly identical plans. All D80 and D50 parameters varied by less than 3% of the prescription dose with an average difference of 0.8%. A gamma analysis Γ(3%, 3 mm) < 1 of the GPU plan resulted in over 90% of calculated voxels satisfying Γ < 1 criterion as compared to baseline CPU plan. The average number of voxels meeting the Γ < 1 criterion for all the plans was 97%. In terms of dose optimization/calculation efficiency, there was a 20-fold reduction in planning time with the new GPU system. The average optimization/dose calculation time utilizing the traditional CPU/cluster based system was 579 vs 26.8 min for the GPU based system. There was no difference in the calculated treatment delivery time per fraction. Beam-on time varied based on field width and pitch and ranged between 15 and 28 min. The TomoTherapy GPU based dose engine is capable of calculating TMI treatment plans with plan quality nearly identical to plans calculated using the traditional CPU/cluster based system, while significantly reducing the time required for optimization and dose calculation.

  7. A CPU benchmark for protein crystallographic refinement.

    PubMed

    Bourne, P E; Hendrickson, W A

    1990-01-01

    The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.

  8. Adaptive real-time methodology for optimizing energy-efficient computing

    DOEpatents

    Hsu, Chung-Hsing [Los Alamos, NM; Feng, Wu-Chun [Blacksburg, VA

    2011-06-28

    Dynamic voltage and frequency scaling (DVFS) is an effective way to reduce energy and power consumption in microprocessor units. Current implementations of DVFS suffer from inaccurate modeling of power requirements and usage, and from inaccurate characterization of the relationships between the applicable variables. A system and method is proposed that adjusts CPU frequency and voltage based on run-time calculations of the workload processing time, as well as a calculation of performance sensitivity with respect to CPU frequency. The system and method are processor independent, and can be applied to either an entire system as a unit, or individually to each process running on a system.

  9. Real time display Fourier-domain OCT using multi-thread parallel computing with data vectorization

    NASA Astrophysics Data System (ADS)

    Eom, Tae Joong; Kim, Hoon Seop; Kim, Chul Min; Lee, Yeung Lak; Choi, Eun-Seo

    2011-03-01

    We demonstrate a real-time display of processed OCT images using multi-thread parallel computing with a quad-core CPU of a personal computer. The data of each A-line are treated as one vector to maximize the data translation rate between the cores of the CPU and RAM stored image data. A display rate of 29.9 frames/sec for processed OCT data (4096 FFT-size x 500 A-scans) is achieved in our system using a wavelength swept source with 52-kHz swept frequency. The data processing times of the OCT image and a Doppler OCT image with a 4-time average are 23.8 msec and 91.4 msec.

  10. A fast three-dimensional gamma evaluation using a GPU utilizing texture memory for on-the-fly interpolations.

    PubMed

    Persoon, Lucas C G G; Podesta, Mark; van Elmpt, Wouter J C; Nijsten, Sebastiaan M J J G; Verhaegen, Frank

    2011-07-01

    A widely accepted method to quantify differences in dose distributions is the gamma (gamma) evaluation. Currently, almost all gamma implementations utilize the central processing unit (CPU). Recently, the graphics processing unit (GPU) has become a powerful platform for specific computing tasks. In this study, we describe the implementation of a 3D gamma evaluation using a GPU to improve calculation time. The gamma evaluation algorithm was implemented on an NVIDIA Tesla C2050 GPU using the compute unified device architecture (CUDA). First, several cubic virtual phantoms were simulated. These phantoms were tested with varying dose cube sizes and set-ups, introducing artificial dose differences. Second, to show applicability in clinical practice, five patient cases have been evaluated using the 3D dose distribution from a treatment planning system as the reference and the delivered dose determined during treatment as the comparison. A calculation time comparison between the CPU and GPU was made with varying thread-block sizes including the option of using texture or global memory. A GPU over CPU speed-up of 66 +/- 12 was achieved for the virtual phantoms. For the patient cases, a speed-up of 57 +/- 15 using the GPU was obtained. A thread-block size of 16 x 16 performed best in all cases. The use of texture memory improved the total calculation time, especially when interpolation was applied. Differences between the CPU and GPU gammas were negligible. The GPU and its features, such as texture memory, decreased the calculation time for gamma evaluations considerably without loss of accuracy.

  11. GPU based contouring method on grid DEM data

    NASA Astrophysics Data System (ADS)

    Tan, Liheng; Wan, Gang; Li, Feng; Chen, Xiaohui; Du, Wenlong

    2017-08-01

    This paper presents a novel method to generate contour lines from grid DEM data based on the programmable GPU pipeline. The previous contouring approaches often use CPU to construct a finite element mesh from the raw DEM data, and then extract contour segments from the elements. They also need a tracing or sorting strategy to generate the final continuous contours. These approaches can be heavily CPU-costing and time-consuming. Meanwhile the generated contours would be unsmooth if the raw data is sparsely distributed. Unlike the CPU approaches, we employ the GPU's vertex shader to generate a triangular mesh with arbitrary user-defined density, in which the height of each vertex is calculated through a third-order Cardinal spline function. Then in the same frame, segments are extracted from the triangles by the geometry shader, and translated to the CPU-side with an internal order in the GPU's transform feedback stage. Finally we propose a "Grid Sorting" algorithm to achieve the continuous contour lines by travelling the segments only once. Our method makes use of multiple stages of GPU pipeline for computation, which can generate smooth contour lines, and is significantly faster than the previous CPU approaches. The algorithm can be easily implemented with OpenGL 3.3 API or higher on consumer-level PCs.

  12. Upwind relaxation methods for the Navier-Stokes equations using inner iterations

    NASA Technical Reports Server (NTRS)

    Taylor, Arthur C., III; Ng, Wing-Fai; Walters, Robert W.

    1992-01-01

    A subsonic and a supersonic problem are respectively treated by an upwind line-relaxation algorithm for the Navier-Stokes equations using inner iterations to accelerate steady-state solution convergence and thereby minimize CPU time. While the ability of the inner iterative procedure to mimic the quadratic convergence of the direct solver method is attested to in both test problems, some of the nonquadratic inner iterative results are noted to have been more efficient than the quadratic. In the more successful, supersonic test case, inner iteration required only about 65 percent of the line-relaxation method-entailed CPU time.

  13. Method and apparatus for measuring spatial uniformity of radiation

    DOEpatents

    Field, Halden

    2002-01-01

    A method and apparatus for measuring the spatial uniformity of the intensity of a radiation beam from a radiation source based on a single sampling time and/or a single pulse of radiation. The measuring apparatus includes a plurality of radiation detectors positioned on planar mounting plate to form a radiation receiving area that has a shape and size approximating the size and shape of the cross section of the radiation beam. The detectors concurrently receive portions of the radiation beam and transmit electrical signals representative of the intensity of impinging radiation to a signal processor circuit connected to each of the detectors and adapted to concurrently receive the electrical signals from the detectors and process with a central processing unit (CPU) the signals to determine intensities of the radiation impinging at each detector location. The CPU displays the determined intensities and relative intensity values corresponding to each detector location to an operator of the measuring apparatus on an included data display device. Concurrent sampling of each detector is achieved by connecting to each detector a sample and hold circuit that is configured to track the signal and store it upon receipt of a "capture" signal. A switching device then selectively retrieves the signals and transmits the signals to the CPU through a single analog to digital (A/D) converter. The "capture" signal. is then removed from the sample-and-hold circuits. Alternatively, concurrent sampling is achieved by providing an A/D converter for each detector, each of which transmits a corresponding digital signal to the CPU. The sampling or reading of the detector signals can be controlled by the CPU or level-detection and timing circuit.

  14. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    PubMed

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  15. Time Well Spent? Relating Television Use to Children’s Free-Time Activities

    PubMed Central

    Vandewater, Elizabeth A.; Bickham, David S.; Lee, June H.

    2010-01-01

    OBJECTIVES This study assessed the claim that children’s television use interferes with time spent in more developmentally appropriate activities. METHODS Data came from the first wave of the Child Development Supplement, a nationally representative sample of children aged 0 to 12 in 1997 (N = 1712). Twenty-four-hour time-use diaries from 1 randomly chosen weekday and 1 randomly chosen weekend day were used to assess children’s time spent watching television, time spent with parents, time spent with siblings, time spent reading (or being read to), time spent doing homework, time spent in creative play, and time spent in active play. Ordinary least squares multiple regression was used to assess the relationship between children’s television use and time spent pursuing other activities. RESULTS Results indicated that time spent watching television both with and without parents or siblings was negatively related to time spent with parents or siblings, respectively, in other activities. Television viewing also was negatively related to time spent doing homework for 7- to 12-year-olds and negatively related to creative play, especially among very young children (younger than 5 years). There was no relationship between time spent watching television and time spent reading (or being read to) or to time spent in active play. CONCLUSIONS The results of this study are among the first to provide empirical support for the assumptions made by the American Academy of Pediatrics in their screen time recommendations. Time spent viewing television both with and without parents and siblings present was strongly negatively related to time spent interacting with parents or siblings. Television viewing was associated with decreased homework time and decreased time in creative play. Conversely, there was no support for the widespread belief that television interferes with time spent reading or in active play. PMID:16452327

  16. Performance and accuracy of criticality calculations performed using WARP – A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs

    DOE PAGES

    Bergmann, Ryan M.; Rowland, Kelly L.; Radnović, Nikola; ...

    2017-05-01

    In this companion paper to "Algorithmic Choices in WARP - A Framework for Continuous Energy Monte Carlo Neutron Transport in General 3D Geometries on GPUs" (doi:10.1016/j.anucene.2014.10.039), the WARP Monte Carlo neutron transport framework for graphics processing units (GPUs) is benchmarked against production-level central processing unit (CPU) Monte Carlo neutron transport codes for both performance and accuracy. We compare neutron flux spectra, multiplication factors, runtimes, speedup factors, and costs of various GPU and CPU platforms running either WARP, Serpent 2.1.24, or MCNP 6.1. WARP compares well with the results of the production-level codes, and it is shown that on the newestmore » hardware considered, GPU platforms running WARP are between 0.8 to 7.6 times as fast as CPU platforms running production codes. Also, the GPU platforms running WARP were between 15% and 50% as expensive to purchase and between 80% to 90% as expensive to operate as equivalent CPU platforms performing at an equal simulation rate.« less

  17. Performance and accuracy of criticality calculations performed using WARP – A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bergmann, Ryan M.; Rowland, Kelly L.; Radnović, Nikola

    In this companion paper to "Algorithmic Choices in WARP - A Framework for Continuous Energy Monte Carlo Neutron Transport in General 3D Geometries on GPUs" (doi:10.1016/j.anucene.2014.10.039), the WARP Monte Carlo neutron transport framework for graphics processing units (GPUs) is benchmarked against production-level central processing unit (CPU) Monte Carlo neutron transport codes for both performance and accuracy. We compare neutron flux spectra, multiplication factors, runtimes, speedup factors, and costs of various GPU and CPU platforms running either WARP, Serpent 2.1.24, or MCNP 6.1. WARP compares well with the results of the production-level codes, and it is shown that on the newestmore » hardware considered, GPU platforms running WARP are between 0.8 to 7.6 times as fast as CPU platforms running production codes. Also, the GPU platforms running WARP were between 15% and 50% as expensive to purchase and between 80% to 90% as expensive to operate as equivalent CPU platforms performing at an equal simulation rate.« less

  18. Adaptive real-time methodology for optimizing energy-efficient computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hsu, Chung-Hsing; Feng, Wu-Chun

    Dynamic voltage and frequency scaling (DVFS) is an effective way to reduce energy and power consumption in microprocessor units. Current implementations of DVFS suffer from inaccurate modeling of power requirements and usage, and from inaccurate characterization of the relationships between the applicable variables. A system and method is proposed that adjusts CPU frequency and voltage based on run-time calculations of the workload processing time, as well as a calculation of performance sensitivity with respect to CPU frequency. The system and method are processor independent, and can be applied to either an entire system as a unit, or individually to eachmore » process running on a system.« less

  19. AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

    NASA Astrophysics Data System (ADS)

    Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

    2017-05-01

    We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.

  20. Enhanced round robin CPU scheduling with burst time based time quantum

    NASA Astrophysics Data System (ADS)

    Indusree, J. R.; Prabadevi, B.

    2017-11-01

    Process scheduling is a very important functionality of Operating system. The main-known process-scheduling algorithms are First Come First Serve (FCFS) algorithm, Round Robin (RR) algorithm, Priority scheduling algorithm and Shortest Job First (SJF) algorithm. Compared to its peers, Round Robin (RR) algorithm has the advantage that it gives fair share of CPU to the processes which are already in the ready-queue. The effectiveness of the RR algorithm greatly depends on chosen time quantum value. Through this research paper, we are proposing an enhanced algorithm called Enhanced Round Robin with Burst-time based Time Quantum (ERRBTQ) process scheduling algorithm which calculates time quantum as per the burst-time of processes already in ready queue. The experimental results and analysis of ERRBTQ algorithm clearly indicates the improved performance when compared with conventional RR and its variants.

  1. Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

    NASA Astrophysics Data System (ADS)

    Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

    2014-07-01

    As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.

  2. Double dissociation of the anterior and posterior dorsomedial caudate-putamen in the acquisition and expression of associative learning with the nicotine stimulus.

    PubMed

    Charntikov, Sergios; Pittenger, Steven T; Swalve, Natashia; Li, Ming; Bevins, Rick A

    2017-07-15

    Tobacco use is the leading cause of preventable deaths worldwide. This habit is not only debilitating to individual users but also to those around them (second-hand smoking). Nicotine is the main addictive component of tobacco products and is a moderate stimulant and a mild reinforcer. Importantly, besides its unconditional effects, nicotine also has conditioned stimulus effects that may contribute to the tenacity of the smoking habit. Because the neurobiological substrates underlying these processes are virtually unexplored, the present study investigated the functional involvement of the dorsomedial caudate putamen (dmCPu) in learning processes with nicotine as an interoceptive stimulus. Rats were trained using the discriminated goal-tracking task where nicotine injections (0.4 mg/kg; SC), on some days, were paired with intermittent (36 per session) sucrose deliveries; sucrose was not available on interspersed saline days. Pre-training excitotoxic or post-training transient lesions of anterior or posterior dmCPu were used to elucidate the role of these areas in acquisition or expression of associative learning with nicotine stimulus. Pre-training lesion of p-dmCPu inhibited acquisition while post-training lesions of p-dmCPu attenuated the expression of associative learning with the nicotine stimulus. On the other hand, post-training lesions of a-dmCPu evoked nicotine-like responding following saline treatment indicating the role of this area in disinhibition of learned motor behaviors. These results, for the first time, show functionally distinct involvement of a- and p-dmCPu in various stages of associative learning using nicotine stimulus and provide an initial account of neural plasticity underlying these learning processes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. On the Finite Element Implementation of the Generalized Method of Cells Micromechanics Constitutive Model

    NASA Technical Reports Server (NTRS)

    Wilt, T. E.

    1995-01-01

    The Generalized Method of Cells (GMC), a micromechanics based constitutive model, is implemented into the finite element code MARC using the user subroutine HYPELA. Comparisons in terms of transverse deformation response, micro stress and strain distributions, and required CPU time are presented for GMC and finite element models of fiber/matrix unit cell. GMC is shown to provide comparable predictions of the composite behavior and requires significantly less CPU time as compared to a finite element analysis of the unit cell. Details as to the organization of the HYPELA code are provided with the actual HYPELA code included in the appendix.

  4. Sequence search on a supercomputer.

    PubMed

    Gotoh, O; Tagashira, Y

    1986-01-10

    A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.

  5. Measured energy savings and performance of power-managed personal computers and monitors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nordman, B.; Piette, M.A.; Kinney, K.

    1996-08-01

    Personal computers and monitors are estimated to use 14 billion kWh/year of electricity, with power management potentially saving $600 million/year by the year 2000. The effort to capture these savings is lead by the US Environmental Protection Agency`s Energy Star program, which specifies a 30W maximum demand for the computer and for the monitor when in a {open_quote}sleep{close_quote} or idle mode. In this paper the authors discuss measured energy use and estimated savings for power-managed (Energy Star compliant) PCs and monitors. They collected electricity use measurements of six power-managed PCs and monitors in their office and five from two othermore » research projects. The devices are diverse in machine type, use patterns, and context. The analysis method estimates the time spent in each system operating mode (off, low-, and full-power) and combines these with real power measurements to derive hours of use per mode, energy use, and energy savings. Three schedules are explored in the {open_quotes}As-operated,{close_quotes} {open_quotes}Standardized,{close_quotes} and `Maximum` savings estimates. Energy savings are established by comparing the measurements to a baseline with power management disabled. As-operated energy savings for the eleven PCs and monitors ranged from zero to 75 kWh/year. Under the standard operating schedule (on 20% of nights and weekends), the savings are about 200 kWh/year. An audit of power management features and configurations for several dozen Energy Star machines found only 11% of CPU`s fully enabled and about two thirds of monitors were successfully power managed. The highest priority for greater power management savings is to enable monitors, as opposed to CPU`s, since they are generally easier to configure, less likely to interfere with system operation, and have greater savings. The difficulties in properly configuring PCs and monitors is the largest current barrier to achieving the savings potential from power management.« less

  6. Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Priimak, Dmitri

    2014-12-01

    We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques.

  7. The development of an interim generalized gate logic software simulator

    NASA Technical Reports Server (NTRS)

    Mcgough, J. G.; Nemeroff, S.

    1985-01-01

    A proof-of-concept computer program called IGGLOSS (Interim Generalized Gate Logic Software Simulator) was developed and is discussed. The simulator engine was designed to perform stochastic estimation of self test coverage (fault-detection latency times) of digital computers or systems. A major attribute of the IGGLOSS is its high-speed simulation: 9.5 x 1,000,000 gates/cpu sec for nonfaulted circuits and 4.4 x 1,000,000 gates/cpu sec for faulted circuits on a VAX 11/780 host computer.

  8. Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid absorbing boundary condition

    NASA Astrophysics Data System (ADS)

    Cai, Xiaohui; Liu, Yang; Ren, Zhiming

    2018-06-01

    Reverse-time migration (RTM) is a powerful tool for imaging geologically complex structures such as steep-dip and subsalt. However, its implementation is quite computationally expensive. Recently, as a low-cost solution, the graphic processing unit (GPU) was introduced to improve the efficiency of RTM. In the paper, we develop three ameliorative strategies to implement RTM on GPU card. First, given the high accuracy and efficiency of the adaptive optimal finite-difference (FD) method based on least squares (LS) on central processing unit (CPU), we study the optimal LS-based FD method on GPU. Second, we develop the CPU-based hybrid absorbing boundary condition (ABC) to the GPU-based one by addressing two issues of the former when introduced to GPU card: time-consuming and chaotic threads. Third, for large-scale data, the combinatorial strategy for optimal checkpointing and efficient boundary storage is introduced for the trade-off between memory and recomputation. To save the time of communication between host and disk, the portable operating system interface (POSIX) thread is utilized to create the other CPU core at the checkpoints. Applications of the three strategies on GPU with the compute unified device architecture (CUDA) programming language in RTM demonstrate their efficiency and validity.

  9. Association between problematic cellular phone use and suicide: the moderating effect of family function and depression.

    PubMed

    Wang, Peng-Wei; Liu, Tai-Ling; Ko, Chih-Hung; Lin, Huang-Chi; Huang, Mei-Feng; Yeh, Yi-Chun; Yen, Cheng-Fang

    2014-02-01

    Suicidal ideation and attempt among adolescents are risk factors for eventual completed suicide. Cellular phone use (CPU) has markedly changed the everyday lives of adolescents. Issues about how cellular phone use relates to adolescent mental health, such as suicidal ideation and attempts, are important because of the high rate of cellular phone usage among children in that age group. This study explored the association between problematic CPU and suicidal ideation and attempts among adolescents and investigated how family function and depression influence the association between problematic CPU and suicidal ideation and attempts. A total of 5051 (2872 girls and 2179 boys) adolescents who owned at least one cellular phone completed the research questionnaires. We collected data on participants' CPU and suicidal behavior (ideation and attempts) during the past month as well as information on family function and history of depression. Five hundred thirty-two adolescents (10.54%) had problematic CPU. The rates of suicidal ideation were 23.50% and 11.76% in adolescents with problematic CPU and without problematic CPU, respectively. The rates of suicidal attempts in both groups were 13.70% and 5.45%, respectively. Family function, but not depression, had a moderating effect on the association between problematic CPU and suicidal ideation and attempt. This study highlights the association between problematic CPU and suicidal ideation as well as attempts and indicates that good family function may have a more significant role on reducing the risks of suicidal ideation and attempts in adolescents with problematic CPU than in those without problematic CPU. © 2014.

  10. High-performance computing on GPUs for resistivity logging of oil and gas wells

    NASA Astrophysics Data System (ADS)

    Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.

    2017-10-01

    We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.

  11. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    NASA Astrophysics Data System (ADS)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  12. GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

    NASA Astrophysics Data System (ADS)

    Takaishi, Tetsuya

    2015-01-01

    The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran.

  13. Shadow: Running Tor in a Box for Accurate and Efficient Experimentation

    DTIC Science & Technology

    2011-09-23

    Modeling the speed of a target CPU is done by running an OpenSSL [31] speed test on a real CPU of that type. This provides us with the raw CPU processing...rate, but we are also interested in the processing speed of an application. By running application 5 benchmarks on the same CPU as the OpenSSL speed test...simulation, saving CPU cy- cles on our simulation host machine. Shadow removes cryptographic processing by preloading the main OpenSSL [31] functions used

  14. Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

    NASA Astrophysics Data System (ADS)

    Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

    2015-06-01

    The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.

  15. Heterogeneous CPU-GPU moving targets detection for UAV video

    NASA Astrophysics Data System (ADS)

    Li, Maowen; Tang, Linbo; Han, Yuqi; Yu, Chunlei; Zhang, Chao; Fu, Huiquan

    2017-07-01

    Moving targets detection is gaining popularity in civilian and military applications. On some monitoring platform of motion detection, some low-resolution stationary cameras are replaced by moving HD camera based on UAVs. The pixels of moving targets in the HD Video taken by UAV are always in a minority, and the background of the frame is usually moving because of the motion of UAVs. The high computational cost of the algorithm prevents running it at higher resolutions the pixels of frame. Hence, to solve the problem of moving targets detection based UAVs video, we propose a heterogeneous CPU-GPU moving target detection algorithm for UAV video. More specifically, we use background registration to eliminate the impact of the moving background and frame difference to detect small moving targets. In order to achieve the effect of real-time processing, we design the solution of heterogeneous CPU-GPU framework for our method. The experimental results show that our method can detect the main moving targets from the HD video taken by UAV, and the average process time is 52.16ms per frame which is fast enough to solve the problem.

  16. The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents.

    PubMed

    Yang, Yuan-Sheng; Yen, Ju-Yu; Ko, Chih-Hung; Cheng, Chung-Ping; Yen, Cheng-Fang

    2010-04-28

    Cellular phone use (CPU) is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU.

  17. Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU.

    PubMed

    Kalantzis, Georgios; Tachibana, Hidenobu

    2014-01-01

    For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  18. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    PubMed

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .

  19. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

    PubMed

    Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

    2016-04-01

    Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.

  20. Fast Simulation of Dynamic Ultrasound Images Using the GPU.

    PubMed

    Storve, Sigurd; Torp, Hans

    2017-10-01

    Simulated ultrasound data is a valuable tool for development and validation of quantitative image analysis methods in echocardiography. Unfortunately, simulation time can become prohibitive for phantoms consisting of a large number of point scatterers. The COLE algorithm by Gao et al. is a fast convolution-based simulator that trades simulation accuracy for improved speed. We present highly efficient parallelized CPU and GPU implementations of the COLE algorithm with an emphasis on dynamic simulations involving moving point scatterers. We argue that it is crucial to minimize the amount of data transfers from the CPU to achieve good performance on the GPU. We achieve this by storing the complete trajectories of the dynamic point scatterers as spline curves in the GPU memory. This leads to good efficiency when simulating sequences consisting of a large number of frames, such as B-mode and tissue Doppler data for a full cardiac cycle. In addition, we propose a phase-based subsample delay technique that efficiently eliminates flickering artifacts seen in B-mode sequences when COLE is used without enough temporal oversampling. To assess the performance, we used a laptop computer and a desktop computer, each equipped with a multicore Intel CPU and an NVIDIA GPU. Running the simulator on a high-end TITAN X GPU, we observed two orders of magnitude speedup compared to the parallel CPU version, three orders of magnitude speedup compared to simulation times reported by Gao et al. in their paper on COLE, and a speedup of 27000 times compared to the multithreaded version of Field II, using numbers reported in a paper by Jensen. We hope that by releasing the simulator as an open-source project we will encourage its use and further development.

  1. Prevalence and socio-demographic correlates of time spent cooking by adults in the 2005 UK Time Use Survey. Cross-sectional analysis☆

    PubMed Central

    Adams, Jean; White, Martin

    2015-01-01

    This study aimed to document the prevalence and socio-demographic correlates of time spent cooking by adults in the 2005 UK Time-Use Survey. Respondents reported their main activities, in 10 minute slots, throughout one 24 hour period. Activities were coded into 30 pre-defined codes, including ‘cooking, washing up’. Four measures of time spent cooking were calculated: any time spent cooking, 30 continuous minutes spent cooking, total time spent cooking, and longest continuous time spent cooking. Socio-demographic correlates were: age, employment, social class, education, and number of adults and children in the household. Analyses were stratified by gender. Data from 4214 participants were included. 85% of women and 60% of men spent any time cooking; 60% of women and 33% of men spent 30 continuous minutes cooking. Amongst women, older age, not being in employment, lower social class, greater education, and living with other adults or children were positively associated with time cooking. Few differences in time spent cooking were seen in men. Socio-economic differences in time spent cooking may have been overstated as a determinant of socio-economic differences in diet, overweight and obesity. Gender was a stronger determinant of time spent cooking than other socio-demographic variables. PMID:26004671

  2. 3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

    NASA Astrophysics Data System (ADS)

    Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran

    2017-03-01

    In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.

  3. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dong, Tingzing Tim; Tomov, Stanimire Z; Luszczek, Piotr R

    As modern hardware keeps evolving, an increasingly effective approach to developing energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development of one-sided factorizations that work for a set of small dense matrices in parallel, and we illustrate our techniques on the QR factorization based on Householder transformations. We refer to this mode of operation as a batched factorization. Our approach ismore » based on representing the algorithms as a sequence of batched BLAS routines for GPU-only execution. This is in contrast to the hybrid CPU-GPU algorithms that rely heavily on using the multicore CPU for specific parts of the workload. But for a system to benefit fully from the GPU's significantly higher energy efficiency, avoiding the use of the multicore CPU must be a primary design goal, so the system can rely more heavily on the more efficient GPU. Additionally, this will result in the removal of the costly CPU-to-GPU communication. Furthermore, we do not use a single symmetric multiprocessor(on the GPU) to factorize a single problem at a time. We illustrate how our performance analysis, and the use of profiling and tracing tools, guided the development and optimization of our batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to our highly optimized batched CPU implementations based on the MKL library(when using two sockets of Intel Sandy Bridge CPUs). Compared to a batched QR factorization featured in the CUBLAS library for GPUs, we achieved up to 5x speedup on the K40 GPU.« less

  4. A parallel method of atmospheric correction for multispectral high spatial resolution remote sensing images

    NASA Astrophysics Data System (ADS)

    Zhao, Shaoshuai; Ni, Chen; Cao, Jing; Li, Zhengqiang; Chen, Xingfeng; Ma, Yan; Yang, Leiku; Hou, Weizhen; Qie, Lili; Ge, Bangyu; Liu, Li; Xing, Jin

    2018-03-01

    The remote sensing image is usually polluted by atmosphere components especially like aerosol particles. For the quantitative remote sensing applications, the radiative transfer model based atmospheric correction is used to get the reflectance with decoupling the atmosphere and surface by consuming a long computational time. The parallel computing is a solution method for the temporal acceleration. The parallel strategy which uses multi-CPU to work simultaneously is designed to do atmospheric correction for a multispectral remote sensing image. The parallel framework's flow and the main parallel body of atmospheric correction are described. Then, the multispectral remote sensing image of the Chinese Gaofen-2 satellite is used to test the acceleration efficiency. When the CPU number is increasing from 1 to 8, the computational speed is also increasing. The biggest acceleration rate is 6.5. Under the 8 CPU working mode, the whole image atmospheric correction costs 4 minutes.

  5. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meredith, J; Conger, J; Liu, Y

    2005-11-11

    Modern graphics processing units (GPUs) can provide tremendous performance boosts for some applications beyond what a single CPU can accomplish, and their performance is growing at a rate faster than CPUs as well. Mobile GPUs available for laptops have the small form factor and low power requirements suitable for use in embedded processing. We evaluated several desktop and mobile GPUs and CPUs on traditional and non-traditional graphics tasks, as well as on the most time consuming pieces of a full hyperspectral imaging application. Accuracy remained high despite small differences in arithmetic operations like rounding. Performance improvements are summarized here relativemore » to a desktop Pentium 4 CPU.« less

  6. Convolution of large 3D images on GPU and its decomposition

    NASA Astrophysics Data System (ADS)

    Karas, Pavel; Svoboda, David

    2011-12-01

    In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.

  7. The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents

    PubMed Central

    2010-01-01

    Background Cellular phone use (CPU) is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. Methods A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. Results The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. Conclusions There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU. PMID:20426807

  8. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

    PubMed

    Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

    2012-01-01

    Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.

  9. General approach to boat simulation in virtual reality systems

    NASA Astrophysics Data System (ADS)

    Aranov, Vladislav Y.; Belyaev, Sergey Y.

    2002-02-01

    The paper is dedicated to real time simulation of sport boats, particularly a kayak and high-speed skimming boat, for training goals. This training is issue of the day, since kayaking and riding a high-speed skimming boat are both extreme sports. Participating in such types of competitions puts sportsmen into danger, particularly due to rapids, waterfalls, different water streams, and other obstacles. In order to make the simulation realistic, it is necessary to calculate data for at least 30 frames per second. These calculations may take not more than 5% CPU time, because very time-consuming 3D rendering process takes the rest - 95% CPU time. This paper describes an approach for creating minimal boat simulator models that satisfy the mentioned requirements. Besides, this approach can be used for other watercraft models of this kind.

  10. A fast sequence assembly method based on compressed data structures.

    PubMed

    Liang, Peifeng; Zhang, Yancong; Lin, Kui; Hu, Jinglu

    2014-01-01

    Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, a memory and time efficient assembler is presented from applying FM-index in JR-Assembler, called FMJ-Assembler, where FM stand for FMR-index derived from the FM-index and BWT and J for jumping extension. The FMJ-Assembler uses expanded FM-index and BWT to compress data of reads to save memory and jumping extension method make it faster in CPU time. An extensive comparison of the FMJ-Assembler with current assemblers shows that the FMJ-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less CPU time. All these advantages of the FMJ-Assembler indicate that the FMJ-Assembler will be an efficient assembly method in next generation sequencing technology.

  11. Jobs masonry in LHCb with elastic Grid Jobs

    NASA Astrophysics Data System (ADS)

    Stagni, F.; Charpentier, Ph

    2015-12-01

    In any distributed computing infrastructure, a job is normally forbidden to run for an indefinite amount of time. This limitation is implemented using different technologies, the most common one being the CPU time limit implemented by batch queues. It is therefore important to have a good estimate of how much CPU work a job will require: otherwise, it might be killed by the batch system, or by whatever system is controlling the jobs’ execution. In many modern interwares, the jobs are actually executed by pilot jobs, that can use the whole available time in running multiple consecutive jobs. If at some point the available time in a pilot is too short for the execution of any job, it should be released, while it could have been used efficiently by a shorter job. Within LHCbDIRAC, the LHCb extension of the DIRAC interware, we developed a simple way to fully exploit computing capabilities available to a pilot, even for resources with limited time capabilities, by adding elasticity to production MonteCarlo (MC) simulation jobs. With our approach, independently of the time available, LHCbDIRAC will always have the possibility to execute a MC job, whose length will be adapted to the available amount of time: therefore the same job, running on different computing resources with different time limits, will produce different amounts of events. The decision on the number of events to be produced is made just in time at the start of the job, when the capabilities of the resource are known. In order to know how many events a MC job will be instructed to produce, LHCbDIRAC simply requires three values: the CPU-work per event for that type of job, the power of the machine it is running on, and the time left for the job before being killed. Knowing these values, we can estimate the number of events the job will be able to simulate with the available CPU time. This paper will demonstrate that, using this simple but effective solution, LHCb manages to make a more efficient use of the available resources, and that it can easily use new types of resources. An example is represented by resources provided by batch queues, where low-priority MC jobs can be used as "masonry" jobs in multi-jobs pilots. A second example is represented by opportunistic resources with limited available time.

  12. Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

    NASA Technical Reports Server (NTRS)

    Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

    2015-01-01

    This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.

  13. Real-time image reconstruction and display system for MRI using a high-speed personal computer.

    PubMed

    Haishi, T; Kose, K

    1998-09-01

    A real-time NMR image reconstruction and display system was developed using a high-speed personal computer and optimized for the 32-bit multitasking Microsoft Windows 95 operating system. The system was operated at various CPU clock frequencies by changing the motherboard clock frequency and the processor/bus frequency ratio. When the Pentium CPU was used at the 200 MHz clock frequency, the reconstruction time for one 128 x 128 pixel image was 48 ms and that for the image display on the enlarged 256 x 256 pixel window was about 8 ms. NMR imaging experiments were performed with three fast imaging sequences (FLASH, multishot EPI, and one-shot EPI) to demonstrate the ability of the real-time system. It was concluded that in most cases, high-speed PC would be the best choice for the image reconstruction and display system for real-time MRI. Copyright 1998 Academic Press.

  14. QR-decomposition based SENSE reconstruction using parallel architecture.

    PubMed

    Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad

    2018-04-01

    Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Prevalence and socio-demographic correlates of time spent cooking by adults in the 2005 UK Time Use Survey. Cross-sectional analysis.

    PubMed

    Adams, Jean; White, Martin

    2015-09-01

    This study aimed to document the prevalence and socio-demographic correlates of time spent cooking by adults in the 2005 UK Time-Use Survey. Respondents reported their main activities, in 10 minute slots, throughout one 24 hour period. Activities were coded into 30 pre-defined codes, including 'cooking, washing up'. Four measures of time spent cooking were calculated: any time spent cooking, 30 continuous minutes spent cooking, total time spent cooking, and longest continuous time spent cooking. Socio-demographic correlates were: age, employment, social class, education, and number of adults and children in the household. Analyses were stratified by gender. Data from 4214 participants were included. 85% of women and 60% of men spent any time cooking; 60% of women and 33% of men spent 30 continuous minutes cooking. Amongst women, older age, not being in employment, lower social class, greater education, and living with other adults or children were positively associated with time cooking. Few differences in time spent cooking were seen in men. Socio-economic differences in time spent cooking may have been overstated as a determinant of socio-economic differences in diet, overweight and obesity. Gender was a stronger determinant of time spent cooking than other socio-demographic variables. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations.

    PubMed

    Maia, Julio Daniel Carvalho; Urquiza Carvalho, Gabriel Aires; Mangueira, Carlos Peixoto; Santana, Sidney Ramos; Cabral, Lucidio Anjos Formiga; Rocha, Gerd B

    2012-09-11

    In this study, we present some modifications in the semiempirical quantum chemistry MOPAC2009 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 2500 atoms) molecular systems using GPU coprocessors and multithreaded shared-memory CPUs. Our modifications consisted of using a combination of highly optimized linear algebra libraries for both CPU (LAPACK and BLAS from Intel MKL) and GPU (MAGMA and CUBLAS) to hasten time-consuming parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling. We have shown that it is possible to obtain large speedups just by using CPU serial linear algebra libraries in the MOPAC code. As a special case, we show a speedup of up to 14 times for a methanol simulation box containing 2400 atoms and 4800 basis functions, with even greater gains in performance when using multithreaded CPUs (2.1 times in relation to the single-threaded CPU code using linear algebra libraries) and GPUs (3.8 times). This degree of acceleration opens new perspectives for modeling larger structures which appear in inorganic chemistry (such as zeolites and MOFs), biochemistry (such as polysaccharides, small proteins, and DNA fragments), and materials science (such as nanotubes and fullerenes). In addition, we believe that this parallel (GPU-GPU) MOPAC code will make it feasible to use semiempirical methods in lengthy molecular simulations using both hybrid QM/MM and QM/QM potentials.

  17. Symptoms of Problematic Cellular Phone Use, Functional Impairment and Its Association with Depression among Adolescents in Southern Taiwan

    ERIC Educational Resources Information Center

    Yen, Cheng-Fang; Tang, Tze-Chun; Yen, Ju-Yu; Lin, Huang-Chi; Huang, Chi-Fen; Liu, Shu-Chun; Ko, Chih-Hung

    2009-01-01

    The aims of this study were: (1) to examine the prevalence of symptoms of problematic cellular phone use (CPU); (2) to examine the associations between the symptoms of problematic CPU, functional impairment caused by CPU and the characteristics of CPU; (3) to establish the optimal cut-off point of the number of symptoms for functional impairment…

  18. Derivative free Davidon-Fletcher-Powell (DFP) for solving symmetric systems of nonlinear equations

    NASA Astrophysics Data System (ADS)

    Mamat, M.; Dauda, M. K.; Mohamed, M. A. bin; Waziri, M. Y.; Mohamad, F. S.; Abdullah, H.

    2018-03-01

    Research from the work of engineers, economist, modelling, industry, computing, and scientist are mostly nonlinear equations in nature. Numerical solution to such systems is widely applied in those areas of mathematics. Over the years, there has been significant theoretical study to develop methods for solving such systems, despite these efforts, unfortunately the methods developed do have deficiency. In a contribution to solve systems of the form F(x) = 0, x ∈ Rn , a derivative free method via the classical Davidon-Fletcher-Powell (DFP) update is presented. This is achieved by simply approximating the inverse Hessian matrix with {Q}k+1-1 to θkI. The modified method satisfied the descent condition and possess local superlinear convergence properties. Interestingly, without computing any derivative, the proposed method never fail to converge throughout the numerical experiments. The output is based on number of iterations and CPU time, different initial starting points were used on a solve 40 benchmark test problems. With the aid of the squared norm merit function and derivative-free line search technique, the approach yield a method of solving symmetric systems of nonlinear equations that is capable of significantly reducing the CPU time and number of iteration, as compared to its counterparts. A comparison between the proposed method and classical DFP update were made and found that the proposed methodis the top performer and outperformed the existing method in almost all the cases. In terms of number of iterations, out of the 40 problems solved, the proposed method solved 38 successfully, (95%) while classical DFP solved 2 problems (i.e. 05%). In terms of CPU time, the proposed method solved 29 out of the 40 problems given, (i.e.72.5%) successfully whereas classical DFP solves 11 (27.5%). The method is valid in terms of derivation, reliable in terms of number of iterations and accurate in terms of CPU time. Thus, suitable and achived the objective.

  19. Synthesis and characterization of conductive, biodegradable, elastomeric polyurethanes for biomedical applications.

    PubMed

    Xu, Cancan; Yepez, Gerardo; Wei, Zi; Liu, Fuqiang; Bugarin, Alejandro; Hong, Yi

    2016-09-01

    Biodegradable conductive polymers are currently of significant interest in tissue repair and regeneration, drug delivery, and bioelectronics. However, biodegradable materials exhibiting both conductive and elastic properties have rarely been reported to date. To that end, an electrically conductive polyurethane (CPU) was synthesized from polycaprolactone diol, hexadiisocyanate, and aniline trimer and subsequently doped with (1S)-(+)-10-camphorsulfonic acid (CSA). All CPU films showed good elasticity within a 30% strain range. The electrical conductivity of the CPU films, as enhanced with increasing amounts of CSA, ranged from 2.7 ± 0.9 × 10(-10) to 4.4 ± 0.6 × 10(-7) S/cm in a dry state and 4.2 ± 0.5 × 10(-8) to 7.3 ± 1.5 × 10(-5) S/cm in a wet state. The redox peaks of a CPU1.5 film (molar ratio CSA:aniline trimer = 1.5:1) in the cyclic voltammogram confirmed the desired good electroactivity. The doped CPU film exhibited good electrical stability (87% of initial conductivity after 150 hours charge) as measured in a cell culture medium. The degradation rates of CPU films increased with increasing CSA content in both phosphate-buffered solution (PBS) and lipase/PBS solutions. After 7 days of enzymatic degradation, the conductivity of all CSA-doped CPU films had decreased to that of the undoped CPU film. Mouse 3T3 fibroblasts proliferated and spread on all CPU films. This developed biodegradable CPU with good elasticity, electrical stability, and biocompatibility may find potential applications in tissue engineering, smart drug release, and electronics. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 104A: 2305-2314, 2016. © 2016 Wiley Periodicals, Inc.

  20. General-purpose interface bus for multiuser, multitasking computer system

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.; Roth, Don J.; Stang, David B.

    1990-01-01

    The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.

  1. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications

    PubMed Central

    2012-01-01

    Background Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. Results In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Conclusions Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications. PMID:22369626

  2. Running climate model on a commercial cloud computing environment: A case study using Community Earth System Model (CESM) on Amazon AWS

    NASA Astrophysics Data System (ADS)

    Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock

    2017-01-01

    The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.

  3. Increases in cytoplasmic dopamine compromise the normal resistance of the nucleus accumbens to methamphetamine neurotoxicity

    PubMed Central

    Thomas, David M.; Francescutti-Verbeem, Dina M.; Kuhnt, Donald M.

    2016-01-01

    Methamphetamine (METH) is a neurotoxic drug of abuse that damages the dopamine (DA) neuronal system in a highly delimited manner. The brain structure most affected by METH is the caudate–putamen (CPu) where long-term DA depletion and microglial activation are most evident. Even damage within the CPu is remarkably heterogenous with lateral and ventral aspects showing the greatest deficits. The nucleus accumbens (NAc) is largely spared of the damage that accompanies binge METH intoxication. Increases in cytoplasmic DA produced by reserpine, L-DOPA or clorgyline prior to METH uncover damage in the NAc as evidenced by microglial activation and depletion of DA, tyrosine hydroxylase (TH), and the DA transporter. These effects do not occur in the NAc after treatment with METH alone. In contrast to the CPu where DA, TH, and DA transporter levels remain depleted chronically, DA nerve ending alterations in the NAc show a partial recovery over time. None of the treatments that enhance METH toxicity in the NAc and CPu lead to losses of TH protein or DA cell bodies in the substantia nigra or the ventral tegmentum. These data show that increases in cytoplasmic DA dramatically broaden the neurotoxic profile of METH to include brain structures not normally targeted for damage by METH alone. The resistance of the NAc to METH-induced neurotoxicity and its ability to recover reveal a fundamentally different neuroplasticity by comparison to the CPu. Recruitment of the NAc as a target of METH neurotoxicity by alterations in DA homeostasis is significant in light of the important roles played by this brain structure. PMID:19457119

  4. Increases in cytoplasmic dopamine compromise the normal resistance of the nucleus accumbens to methamphetamine neurotoxicity.

    PubMed

    Thomas, David M; Francescutti-Verbeem, Dina M; Kuhn, Donald M

    2009-06-01

    Methamphetamine (METH) is a neurotoxic drug of abuse that damages the dopamine (DA) neuronal system in a highly delimited manner. The brain structure most affected by METH is the caudate-putamen (CPu) where long-term DA depletion and microglial activation are most evident. Even damage within the CPu is remarkably heterogenous with lateral and ventral aspects showing the greatest deficits. The nucleus accumbens (NAc) is largely spared of the damage that accompanies binge METH intoxication. Increases in cytoplasmic DA produced by reserpine, L-DOPA or clorgyline prior to METH uncover damage in the NAc as evidenced by microglial activation and depletion of DA, tyrosine hydroxylase (TH), and the DA transporter. These effects do not occur in the NAc after treatment with METH alone. In contrast to the CPu where DA, TH, and DA transporter levels remain depleted chronically, DA nerve ending alterations in the NAc show a partial recovery over time. None of the treatments that enhance METH toxicity in the NAc and CPu lead to losses of TH protein or DA cell bodies in the substantia nigra or the ventral tegmentum. These data show that increases in cytoplasmic DA dramatically broaden the neurotoxic profile of METH to include brain structures not normally targeted for damage by METH alone. The resistance of the NAc to METH-induced neurotoxicity and its ability to recover reveal a fundamentally different neuroplasticity by comparison to the CPu. Recruitment of the NAc as a target of METH neurotoxicity by alterations in DA homeostasis is significant in light of the important roles played by this brain structure.

  5. Analysis of cache for streaming tape drive

    NASA Technical Reports Server (NTRS)

    Chinnaswamy, V.

    1993-01-01

    A tape subsystem consists of a controller and a tape drive. Tapes are used for backup, data interchange, and software distribution. The backup operation is addressed. During a backup operation, data is read from disk, processed in CPU, and then sent to tape. The processing speeds of a disk subsystem, CPU, and a tape subsystem are likely to be different. A powerful CPU can read data from a fast disk, process it, and supply the data to the tape subsystem at a faster rate than the tape subsystem can handle. On the other hand, a slow disk drive and a slow CPU may not be able to supply data fast enough to keep a tape drive busy all the time. The backup process may supply data to tape drive in bursts. Each burst may be followed by an idle period. Depending on the nature of the file distribution in the disk, the input stream to the tape subsystem may vary significantly during backup. To compensate for these differences and optimize the utilization of a tape subsystem, a cache or buffer is introduced in the tape controller. Most of the tape drives today are streaming tape drives. A streaming tape drive goes into reposition when there is no data from the controller. Once the drive goes into reposition, the controller can receive data, but it cannot supply data to the tape drive until the drive completes its reposition. A controller can also receive data from the host and send data to the tape drive at the same time. The relationship of cache size, host transfer rate, drive transfer rate, reposition, and ramp up times for optimal performance of the tape subsystem are investigated. Formulas developed will also show the advantages of cache watermarks to increase the streaming time of the tape drive, maximum loss due to insufficient cache, tradeoffs between cache and reposition times and the effectiveness of cache on a streaming tape drive due to idle times or interruptions due in host transfers. Several mathematical formulas are developed to predict the performance of the tape drive. Some examples are given illustrating the usefulness of these formulas. Finally, a summary and some conclusions are provided.

  6. An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

    NASA Astrophysics Data System (ADS)

    Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

    2018-02-01

    De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

  7. GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

    NASA Astrophysics Data System (ADS)

    Ammazzalorso, F.; Bednarz, T.; Jelen, U.

    2014-03-01

    We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.

  8. Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

    NASA Astrophysics Data System (ADS)

    Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

    2017-05-01

    In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.

  9. GPU-based stochastic-gradient optimization for non-rigid medical image registration in time-critical applications

    NASA Astrophysics Data System (ADS)

    Bhosale, Parag; Staring, Marius; Al-Ars, Zaid; Berendsen, Floris F.

    2018-03-01

    Currently, non-rigid image registration algorithms are too computationally intensive to use in time-critical applications. Existing implementations that focus on speed typically address this by either parallelization on GPU-hardware, or by introducing methodically novel techniques into CPU-oriented algorithms. Stochastic gradient descent (SGD) optimization and variations thereof have proven to drastically reduce the computational burden for CPU-based image registration, but have not been successfully applied in GPU hardware due to its stochastic nature. This paper proposes 1) NiftyRegSGD, a SGD optimization for the GPU-based image registration tool NiftyReg, 2) random chunk sampler, a new random sampling strategy that better utilizes the memory bandwidth of GPU hardware. Experiments have been performed on 3D lung CT data of 19 patients, which compared NiftyRegSGD (with and without random chunk sampler) with CPU-based elastix Fast Adaptive SGD (FASGD) and NiftyReg. The registration runtime was 21.5s, 4.4s and 2.8s for elastix-FASGD, NiftyRegSGD without, and NiftyRegSGD with random chunk sampling, respectively, while similar accuracy was obtained. Our method is publicly available at https://github.com/SuperElastix/NiftyRegSGD.

  10. Implementation of GPU accelerated SPECT reconstruction with Monte Carlo-based scatter correction.

    PubMed

    Bexelius, Tobias; Sohlberg, Antti

    2018-06-01

    Statistical SPECT reconstruction can be very time-consuming especially when compensations for collimator and detector response, attenuation, and scatter are included in the reconstruction. This work proposes an accelerated SPECT reconstruction algorithm based on graphics processing unit (GPU) processing. Ordered subset expectation maximization (OSEM) algorithm with CT-based attenuation modelling, depth-dependent Gaussian convolution-based collimator-detector response modelling, and Monte Carlo-based scatter compensation was implemented using OpenCL. The OpenCL implementation was compared against the existing multi-threaded OSEM implementation running on a central processing unit (CPU) in terms of scatter-to-primary ratios, standardized uptake values (SUVs), and processing speed using mathematical phantoms and clinical multi-bed bone SPECT/CT studies. The difference in scatter-to-primary ratios, visual appearance, and SUVs between GPU and CPU implementations was minor. On the other hand, at its best, the GPU implementation was noticed to be 24 times faster than the multi-threaded CPU version on a normal 128 × 128 matrix size 3 bed bone SPECT/CT data set when compensations for collimator and detector response, attenuation, and scatter were included. GPU SPECT reconstructions show great promise as an every day clinical reconstruction tool.

  11. Near-realtime simulations of biolelectric activity in small mammalian hearts using graphical processing units

    PubMed Central

    Vigmond, Edward J.; Boyle, Patrick M.; Leon, L. Joshua; Plank, Gernot

    2014-01-01

    Simulations of cardiac bioelectric phenomena remain a significant challenge despite continual advancements in computational machinery. Spanning large temporal and spatial ranges demands millions of nodes to accurately depict geometry, and a comparable number of timesteps to capture dynamics. This study explores a new hardware computing paradigm, the graphics processing unit (GPU), to accelerate cardiac models, and analyzes results in the context of simulating a small mammalian heart in real time. The ODEs associated with membrane ionic flow were computed on traditional CPU and compared to GPU performance, for one to four parallel processing units. The scalability of solving the PDE responsible for tissue coupling was examined on a cluster using up to 128 cores. Results indicate that the GPU implementation was between 9 and 17 times faster than the CPU implementation and scaled similarly. Solving the PDE was still 160 times slower than real time. PMID:19964295

  12. Vector computer memory bank contention

    NASA Technical Reports Server (NTRS)

    Bailey, D. H.

    1985-01-01

    A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.

  13. New Focal Plane Array Controller for the Instruments of the Subaru Telescope

    NASA Astrophysics Data System (ADS)

    Nakaya, Hidehiko; Komiyama, Yutaka; Miyazaki, Satoshi; Yamashita, Takuya; Yagi, Masafumi; Sekiguchi, Maki

    2006-03-01

    We have developed a next-generation data acquisition system, MESSIA5 (Modularized Extensible System for Image Acquisition), which comprises the digital part of a focal plane array controller. The new data acquisition system was constructed based on a 64 bit, 66 MHz PCI (peripheral component interconnect) bus architecture and runs on an x86 CPU computer with (non-real-time) Linux. The system, including the CPU board, is placed at the telescope focus, and standard gigabit Ethernet is adopted for the data transfer, as opposed to a dedicated fiber link. During the summer of 2002, we installed the new system for the first time on the Subaru prime-focus camera Suprime-Cam and successfully improved the observing performance.

  14. Vector computer memory bank contention

    NASA Technical Reports Server (NTRS)

    Bailey, David H.

    1987-01-01

    A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.

  15. Behavior of Cackling Canada Geese during brood rearing

    USGS Publications Warehouse

    Fowler, Ada C.; Ely, Craig R.

    1997-01-01

    We studied behavior of Cackling Canada Goose (Branta canadensis minima, cacklers) broods between 1992 and 1996 on the Yukon Delta National Wildlife Refuge in western Alaska. An increase in time spent foraging by goslings during our study was weakly correlated with an increase in the size of the local breeding population. Amount of time spent feeding by adults and goslings increased throughout the brood rearing period. Overall, goslings spent more time feeding than either adult females or males, and adult males spent the most time alert. Time alert varied among brood rearing areas and increased with brood size, but there was no variation in time spent alert among years. Increases in feeding or alert behaviors were at a cost to time spent in all other behaviors. We suggest that there is not a simple trade-off between feeding and alert behavior in cacklers, but instead that time spent feeding and alert are optimized against all other behaviors. We suggest that forage quality and availability determines the amount of time spent feeding, whereas the threat of predation or disturbance determines the amount of time spent alert.

  16. A numerical code for the simulation of non-equilibrium chemically reacting flows on hybrid CPU-GPU clusters

    NASA Astrophysics Data System (ADS)

    Kudryavtsev, Alexey N.; Kashkovsky, Alexander V.; Borisov, Semyon P.; Shershnev, Anton A.

    2017-10-01

    In the present work a computer code RCFS for numerical simulation of chemically reacting compressible flows on hybrid CPU/GPU supercomputers is developed. It solves 3D unsteady Euler equations for multispecies chemically reacting flows in general curvilinear coordinates using shock-capturing TVD schemes. Time advancement is carried out using the explicit Runge-Kutta TVD schemes. Program implementation uses CUDA application programming interface to perform GPU computations. Data between GPUs is distributed via domain decomposition technique. The developed code is verified on the number of test cases including supersonic flow over a cylinder.

  17. Newmark local time stepping on high-performance computing architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch; Institute of Geophysics, ETH Zurich; Grote, Marcus, E-mail: marcus.grote@unibas.ch

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strongmore » element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.« less

  18. Generalized conjugate-gradient methods for the Navier-Stokes equations

    NASA Technical Reports Server (NTRS)

    Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

    1991-01-01

    A generalized conjugate-gradient method is used to solve the two-dimensional, compressible Navier-Stokes equations of fluid flow. The equations are discretized with an implicit, upwind finite-volume formulation. Preconditioning techniques are incorporated into the new solver to accelerate convergence of the overall iterative method. The superiority of the new solver is demonstrated by comparisons with a conventional line Gauss-Siedel Relaxation solver. Computational test results for transonic flow (trailing edge flow in a transonic turbine cascade) and hypersonic flow (M = 6.0 shock-on-shock phenoena on a cylindrical leading edge) are presented. When applied to the transonic cascade case, the new solver is 4.4 times faster in terms of number of iterations and 3.1 times faster in terms of CPU time than the Relaxation solver. For the hypersonic shock case, the new solver is 3.0 times faster in terms of number of iterations and 2.2 times faster in terms of CPU time than the Relaxation solver.

  19. CPU-GPU mixed implementation of virtual node method for real-time interactive cutting of deformable objects using OpenCL.

    PubMed

    Jia, Shiyu; Zhang, Weizhong; Yu, Xiaokang; Pan, Zhenkuan

    2015-09-01

    Surgical simulators need to simulate interactive cutting of deformable objects in real time. The goal of this work was to design an interactive cutting algorithm that eliminates traditional cutting state classification and can work simultaneously with real-time GPU-accelerated deformation without affecting its numerical stability. A modified virtual node method for cutting is proposed. Deformable object is modeled as a real tetrahedral mesh embedded in a virtual tetrahedral mesh, and the former is used for graphics rendering and collision, while the latter is used for deformation. Cutting algorithm first subdivides real tetrahedrons to eliminate all face and edge intersections, then splits faces, edges and vertices along cutting tool trajectory to form cut surfaces. Next virtual tetrahedrons containing more than one connected real tetrahedral fragments are duplicated, and connectivity between virtual tetrahedrons is updated. Finally, embedding relationship between real and virtual tetrahedral meshes is updated. Co-rotational linear finite element method is used for deformation. Cutting and collision are processed by CPU, while deformation is carried out by GPU using OpenCL. Efficiency of GPU-accelerated deformation algorithm was tested using block models with varying numbers of tetrahedrons. Effectiveness of our cutting algorithm under multiple cuts and self-intersecting cuts was tested using a block model and a cylinder model. Cutting of a more complex liver model was performed, and detailed performance characteristics of cutting, deformation and collision were measured and analyzed. Our cutting algorithm can produce continuous cut surfaces when traditional minimal element creation algorithm fails. Our GPU-accelerated deformation algorithm remains stable with constant time step under multiple arbitrary cuts and works on both NVIDIA and AMD GPUs. GPU-CPU speed ratio can be as high as 10 for models with 80,000 tetrahedrons. Forty to sixty percent real-time performance and 100-200 Hz simulation rate are achieved for the liver model with 3,101 tetrahedrons. Major bottlenecks for simulation efficiency are cutting, collision processing and CPU-GPU data transfer. Future work needs to improve on these areas.

  20. Exploring the use of I/O nodes for computation in a MIMD multiprocessor

    NASA Technical Reports Server (NTRS)

    Kotz, David; Cai, Ting

    1995-01-01

    As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.

  1. System for processing an encrypted instruction stream in hardware

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Griswold, Richard L.; Nickless, William K.; Conrad, Ryan C.

    A system and method of processing an encrypted instruction stream in hardware is disclosed. Main memory stores the encrypted instruction stream and unencrypted data. A central processing unit (CPU) is operatively coupled to the main memory. A decryptor is operatively coupled to the main memory and located within the CPU. The decryptor decrypts the encrypted instruction stream upon receipt of an instruction fetch signal from a CPU core. Unencrypted data is passed through to the CPU core without decryption upon receipt of a data fetch signal.

  2. Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units

    NASA Astrophysics Data System (ADS)

    Corral, Roque; Gisbert, Fernando; Pueblas, Jesus

    2017-02-01

    The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.

  3. A comparison of native GPU computing versus OpenACC for implementing flow-routing algorithms in hydrological applications

    NASA Astrophysics Data System (ADS)

    Rueda, Antonio J.; Noguera, José M.; Luque, Adrián

    2016-02-01

    In recent years GPU computing has gained wide acceptance as a simple low-cost solution for speeding up computationally expensive processing in many scientific and engineering applications. However, in most cases accelerating a traditional CPU implementation for a GPU is a non-trivial task that requires a thorough refactorization of the code and specific optimizations that depend on the architecture of the device. OpenACC is a promising technology that aims at reducing the effort required to accelerate C/C++/Fortran code on an attached multicore device. Virtually with this technology the CPU code only has to be augmented with a few compiler directives to identify the areas to be accelerated and the way in which data has to be moved between the CPU and GPU. Its potential benefits are multiple: better code readability, less development time, lower risk of errors and less dependency on the underlying architecture and future evolution of the GPU technology. Our aim with this work is to evaluate the pros and cons of using OpenACC against native GPU implementations in computationally expensive hydrological applications, using the classic D8 algorithm of O'Callaghan and Mark for river network extraction as case-study. We implemented the flow accumulation step of this algorithm in CPU, using OpenACC and two different CUDA versions, comparing the length and complexity of the code and its performance with different datasets. We advance that although OpenACC can not match the performance of a CUDA optimized implementation (×3.5 slower in average), it provides a significant performance improvement against a CPU implementation (×2-6) with by far a simpler code and less implementation effort.

  4. Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics.

    PubMed

    Zheng, Mo; Li, Xiaoxia; Guo, Li

    2013-04-01

    Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10-50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.'s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps' C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. A survey of CPU-GPU heterogeneous computing techniques

    DOE PAGES

    Mittal, Sparsh; Vetter, Jeffrey S.

    2015-07-04

    As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less

  6. A survey of CPU-GPU heterogeneous computing techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S.

    As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less

  7. Evaluation of user input methods for manipulating a tablet personal computer in sterile techniques.

    PubMed

    Yamada, Akira; Komatsu, Daisuke; Suzuki, Takeshi; Kurozumi, Masahiro; Fujinaga, Yasunari; Ueda, Kazuhiko; Kadoya, Masumi

    2017-02-01

    To determine a quick and accurate user input method for manipulating tablet personal computers (PCs) in sterile techniques. We evaluated three different manipulation methods, (1) Computer mouse and sterile system drape, (2) Fingers and sterile system drape, and (3) Digitizer stylus and sterile ultrasound probe cover with a pinhole, in terms of the central processing unit (CPU) performance, manipulation performance, and contactlessness. A significant decrease in CPU score ([Formula: see text]) and an increase in CPU temperature ([Formula: see text]) were observed when a system drape was used. The respective mean times taken to select a target image from an image series (ST) and the mean times for measuring points on an image (MT) were [Formula: see text] and [Formula: see text] s for the computer mouse method, [Formula: see text] and [Formula: see text] s for the finger method, and [Formula: see text] and [Formula: see text] s for the digitizer stylus method, respectively. The ST for the finger method was significantly longer than for the digitizer stylus method ([Formula: see text]). The MT for the computer mouse method was significantly longer than for the digitizer stylus method ([Formula: see text]). The mean success rate for measuring points on an image was significantly lower for the finger method when the diameter of the target was equal to or smaller than 8 mm than for the other methods. No significant difference in the adenosine triphosphate amount at the surface of the tablet PC was observed before, during, or after manipulation via the digitizer stylus method while wearing starch-powdered sterile gloves ([Formula: see text]). Quick and accurate manipulation of tablet PCs in sterile techniques without CPU load is feasible using a digitizer stylus and sterile ultrasound probe cover with a pinhole.

  8. Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

    NASA Astrophysics Data System (ADS)

    Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

    2017-08-01

    We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)

  9. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo; Sterpin, Edmond

    2016-04-15

    Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithmmore » of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.« less

  10. Continuous piecewise-linear, reduced-order electrochemical model for lithium-ion batteries in real-time applications

    NASA Astrophysics Data System (ADS)

    Farag, Mohammed; Fleckenstein, Matthias; Habibi, Saeid

    2017-02-01

    Model-order reduction and minimization of the CPU run-time while maintaining the model accuracy are critical requirements for real-time implementation of lithium-ion electrochemical battery models. In this paper, an isothermal, continuous, piecewise-linear, electrode-average model is developed by using an optimal knot placement technique. The proposed model reduces the univariate nonlinear function of the electrode's open circuit potential dependence on the state of charge to continuous piecewise regions. The parameterization experiments were chosen to provide a trade-off between extensive experimental characterization techniques and purely identifying all parameters using optimization techniques. The model is then parameterized in each continuous, piecewise-linear, region. Applying the proposed technique cuts down the CPU run-time by around 20%, compared to the reduced-order, electrode-average model. Finally, the model validation against real-time driving profiles (FTP-72, WLTP) demonstrates the ability of the model to predict the cell voltage accurately with less than 2% error.

  11. The Effect of Multigrid Parameters in a 3D Heat Diffusion Equation

    NASA Astrophysics Data System (ADS)

    Oliveira, F. De; Franco, S. R.; Pinto, M. A. Villela

    2018-02-01

    The aim of this paper is to reduce the necessary CPU time to solve the three-dimensional heat diffusion equation using Dirichlet boundary conditions. The finite difference method (FDM) is used to discretize the differential equations with a second-order accuracy central difference scheme (CDS). The algebraic equations systems are solved using the lexicographical and red-black Gauss-Seidel methods, associated with the geometric multigrid method with a correction scheme (CS) and V-cycle. Comparisons are made between two types of restriction: injection and full weighting. The used prolongation process is the trilinear interpolation. This work is concerned with the study of the influence of the smoothing value (v), number of mesh levels (L) and number of unknowns (N) on the CPU time, as well as the analysis of algorithm complexity.

  12. An incomplete assembly with thresholding algorithm for systems of reaction-diffusion equations in three space dimensions IAT for reaction-diffusion systems

    NASA Astrophysics Data System (ADS)

    Moore, Peter K.

    2003-07-01

    Solving systems of reaction-diffusion equations in three space dimensions can be prohibitively expensive both in terms of storage and CPU time. Herein, I present a new incomplete assembly procedure that is designed to reduce storage requirements. Incomplete assembly is analogous to incomplete factorization in that only a fixed number of nonzero entries are stored per row and a drop tolerance is used to discard small values. The algorithm is incorporated in a finite element method-of-lines code and tested on a set of reaction-diffusion systems. The effect of incomplete assembly on CPU time and storage and on the performance of the temporal integrator DASPK, algebraic solver GMRES and preconditioner ILUT is studied.

  13. Analysis and improvements of Adaptive Particle Refinement (APR) through CPU time, accuracy and robustness considerations

    NASA Astrophysics Data System (ADS)

    Chiron, L.; Oger, G.; de Leffe, M.; Le Touzé, D.

    2018-02-01

    While smoothed-particle hydrodynamics (SPH) simulations are usually performed using uniform particle distributions, local particle refinement techniques have been developed to concentrate fine spatial resolutions in identified areas of interest. Although the formalism of this method is relatively easy to implement, its robustness at coarse/fine interfaces can be problematic. Analysis performed in [16] shows that the radius of refined particles should be greater than half the radius of unrefined particles to ensure robustness. In this article, the basics of an Adaptive Particle Refinement (APR) technique, inspired by AMR in mesh-based methods, are presented. This approach ensures robustness with alleviated constraints. Simulations applying the new formalism proposed achieve accuracy comparable to fully refined spatial resolutions, together with robustness, low CPU times and maintained parallel efficiency.

  14. Fast polyenergetic forward projection for image formation using OpenCL on a heterogeneous parallel computing platform.

    PubMed

    Zhou, Lili; Clifford Chao, K S; Chang, Jenghwa

    2012-11-01

    Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation. The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ∕ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray∕white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL∕GPGPU-based implementation with the CPU-based implementation. The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL∕GPGPU-based implementations was on the order of 10(-6) and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively. The authors have successfully built digital phantoms using anatomic CT images and NIST μ∕ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU∕OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.

  15. 5 CFR 551.422 - Time spent traveling.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Time spent traveling. 551.422 Section 551... Activities § 551.422 Time spent traveling. (a) Time spent traveling shall be considered hours of work if: (1... who is permitted to use an alternative mode of transportation, or an employee who travels at a time...

  16. 5 CFR 551.422 - Time spent traveling.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 5 Administrative Personnel 1 2013-01-01 2013-01-01 false Time spent traveling. 551.422 Section 551... Activities § 551.422 Time spent traveling. (a) Time spent traveling shall be considered hours of work if: (1... who is permitted to use an alternative mode of transportation, or an employee who travels at a time...

  17. 5 CFR 551.422 - Time spent traveling.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 5 Administrative Personnel 1 2011-01-01 2011-01-01 false Time spent traveling. 551.422 Section 551... Activities § 551.422 Time spent traveling. (a) Time spent traveling shall be considered hours of work if: (1... who is permitted to use an alternative mode of transportation, or an employee who travels at a time...

  18. 5 CFR 551.422 - Time spent traveling.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 5 Administrative Personnel 1 2012-01-01 2012-01-01 false Time spent traveling. 551.422 Section 551... Activities § 551.422 Time spent traveling. (a) Time spent traveling shall be considered hours of work if: (1... who is permitted to use an alternative mode of transportation, or an employee who travels at a time...

  19. 5 CFR 551.422 - Time spent traveling.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 5 Administrative Personnel 1 2014-01-01 2014-01-01 false Time spent traveling. 551.422 Section 551... Activities § 551.422 Time spent traveling. (a) Time spent traveling shall be considered hours of work if: (1... who is permitted to use an alternative mode of transportation, or an employee who travels at a time...

  20. The applicability of turbulence models to aerodynamic and propulsion flowfields at McDonnell-Douglas Aerospace

    NASA Technical Reports Server (NTRS)

    Kral, Linda D.; Ladd, John A.; Mani, Mori

    1995-01-01

    The objective of this viewgraph presentation is to evaluate turbulence models for integrated aircraft components such as the forebody, wing, inlet, diffuser, nozzle, and afterbody. The one-equation models have replaced the algebraic models as the baseline turbulence models. The Spalart-Allmaras one-equation model consistently performs better than the Baldwin-Barth model, particularly in the log-layer and free shear layers. Also, the Sparlart-Allmaras model is not grid dependent like the Baldwin-Barth model. No general turbulence model exists for all engineering applications. The Spalart-Allmaras one-equation model and the Chien k-epsilon models are the preferred turbulence models. Although the two-equation models often better predict the flow field, they may take from two to five times the CPU time. Future directions are in further benchmarking the Menter blended k-w/k-epsilon and algorithmic improvements to reduce CPU time of the two-equation model.

  1. GPU Particle Tracking and MHD Simulations with Greatly Enhanced Computational Speed

    NASA Astrophysics Data System (ADS)

    Ziemba, T.; O'Donnell, D.; Carscadden, J.; Cash, M.; Winglee, R.; Harnett, E.

    2008-12-01

    GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for less cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU, and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. 3-D particle tracking and MHD codes have been developed using NVIDIA's CUDA and have demonstrated speed up of nearly a factor of 20 over equivalent CPU versions of the codes. Such a speed up enables new applications to develop, including real time running of radiation belt simulations and real time running of global magnetospheric simulations, both of which could provide important space weather prediction tools.

  2. RTOS kernel in portable electrocardiograph

    NASA Astrophysics Data System (ADS)

    Centeno, C. A.; Voos, J. A.; Riva, G. G.; Zerbini, C.; Gonzalez, E. A.

    2011-12-01

    This paper presents the use of a Real Time Operating System (RTOS) on a portable electrocardiograph based on a microcontroller platform. All medical device digital functions are performed by the microcontroller. The electrocardiograph CPU is based on the 18F4550 microcontroller, in which an uCOS-II RTOS can be embedded. The decision associated with the kernel use is based on its benefits, the license for educational use and its intrinsic time control and peripherals management. The feasibility of its use on the electrocardiograph is evaluated based on the minimum memory requirements due to the kernel structure. The kernel's own tools were used for time estimation and evaluation of resources used by each process. After this feasibility analysis, the migration from cyclic code to a structure based on separate processes or tasks able to synchronize events is used; resulting in an electrocardiograph running on one Central Processing Unit (CPU) based on RTOS.

  3. Efficient Scalable Median Filtering Using Histogram-Based Operations.

    PubMed

    Green, Oded

    2018-05-01

    Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.

  4. 5 CFR 551.425 - Time spent receiving medical attention.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Time spent receiving medical attention... Relation to Other Activities § 551.425 Time spent receiving medical attention. (a) Time spent waiting for and receiving medical attention for illness or injury shall be considered hours of work if: (1) The...

  5. 5 CFR 551.425 - Time spent receiving medical attention.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 5 Administrative Personnel 1 2011-01-01 2011-01-01 false Time spent receiving medical attention... REGULATIONS PAY ADMINISTRATION UNDER THE FAIR LABOR STANDARDS ACT Hours of Work Application of Principles in Relation to Other Activities § 551.425 Time spent receiving medical attention. (a) Time spent waiting for...

  6. 5 CFR 551.425 - Time spent receiving medical attention.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 5 Administrative Personnel 1 2012-01-01 2012-01-01 false Time spent receiving medical attention... REGULATIONS PAY ADMINISTRATION UNDER THE FAIR LABOR STANDARDS ACT Hours of Work Application of Principles in Relation to Other Activities § 551.425 Time spent receiving medical attention. (a) Time spent waiting for...

  7. 5 CFR 551.425 - Time spent receiving medical attention.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 5 Administrative Personnel 1 2014-01-01 2014-01-01 false Time spent receiving medical attention... REGULATIONS PAY ADMINISTRATION UNDER THE FAIR LABOR STANDARDS ACT Hours of Work Application of Principles in Relation to Other Activities § 551.425 Time spent receiving medical attention. (a) Time spent waiting for...

  8. 5 CFR 551.425 - Time spent receiving medical attention.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 5 Administrative Personnel 1 2013-01-01 2013-01-01 false Time spent receiving medical attention... REGULATIONS PAY ADMINISTRATION UNDER THE FAIR LABOR STANDARDS ACT Hours of Work Application of Principles in Relation to Other Activities § 551.425 Time spent receiving medical attention. (a) Time spent waiting for...

  9. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  10. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  11. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  12. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  13. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  14. Online performance evaluation of RAID 5 using CPU utilization

    NASA Astrophysics Data System (ADS)

    Jin, Hai; Yang, Hua; Zhang, Jiangling

    1998-09-01

    Redundant arrays of independent disks (RAID) technology is the efficient way to solve the bottleneck problem between CPU processing ability and I/O subsystem. For the system point of view, the most important metric of on line performance is the utilization of CPU. This paper first employs the way to calculate the CPU utilization of system connected with RAID level 5 using statistic average method. From the simulation results of CPU utilization of system connected with RAID level 5 subsystem can we see that using multiple disks as an array to access data in parallel is the efficient way to enhance the on-line performance of disk storage system. USing high-end disk drivers to compose the disk array is the key to enhance the on-line performance of system.

  15. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

    PubMed Central

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

    2012-01-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787

  16. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

    PubMed

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

    2011-07-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  17. Comparison of Conjugate Gradient Density Matrix Search and Chebyshev Expansion Methods for Avoiding Diagonalization in Large-Scale Electronic Structure Calculations

    NASA Technical Reports Server (NTRS)

    Bates, Kevin R.; Daniels, Andrew D.; Scuseria, Gustavo E.

    1998-01-01

    We report a comparison of two linear-scaling methods which avoid the diagonalization bottleneck of traditional electronic structure algorithms. The Chebyshev expansion method (CEM) is implemented for carbon tight-binding calculations of large systems and its memory and timing requirements compared to those of our previously implemented conjugate gradient density matrix search (CG-DMS). Benchmark calculations are carried out on icosahedral fullerenes from C60 to C8640 and the linear scaling memory and CPU requirements of the CEM demonstrated. We show that the CPU requisites of the CEM and CG-DMS are similar for calculations with comparable accuracy.

  18. GPU accelerated Monte-Carlo simulation of SEM images for metrology

    NASA Astrophysics Data System (ADS)

    Verduin, T.; Lokhorst, S. R.; Hagen, C. W.

    2016-03-01

    In this work we address the computation times of numerical studies in dimensional metrology. In particular, full Monte-Carlo simulation programs for scanning electron microscopy (SEM) image acquisition are known to be notoriously slow. Our quest in reducing the computation time of SEM image simulation has led us to investigate the use of graphics processing units (GPUs) for metrology. We have succeeded in creating a full Monte-Carlo simulation program for SEM images, which runs entirely on a GPU. The physical scattering models of this GPU simulator are identical to a previous CPU-based simulator, which includes the dielectric function model for inelastic scattering and also refinements for low-voltage SEM applications. As a case study for the performance, we considered the simulated exposure of a complex feature: an isolated silicon line with rough sidewalls located on a at silicon substrate. The surface of the rough feature is decomposed into 408 012 triangles. We have used an exposure dose of 6 mC/cm2, which corresponds to 6 553 600 primary electrons on average (Poisson distributed). We repeat the simulation for various primary electron energies, 300 eV, 500 eV, 800 eV, 1 keV, 3 keV and 5 keV. At first we run the simulation on a GeForce GTX480 from NVIDIA. The very same simulation is duplicated on our CPU-based program, for which we have used an Intel Xeon X5650. Apart from statistics in the simulation, no difference is found between the CPU and GPU simulated results. The GTX480 generates the images (depending on the primary electron energy) 350 to 425 times faster than a single threaded Intel X5650 CPU. Although this is a tremendous speedup, we actually have not reached the maximum throughput because of the limited amount of available memory on the GTX480. Nevertheless, the speedup enables the fast acquisition of simulated SEM images for metrology. We now have the potential to investigate case studies in CD-SEM metrology, which otherwise would take unreasonable amounts of computation time.

  19. Time Spent on Social Network Sites and Psychological Well-Being: A Meta-Analysis.

    PubMed

    Huang, Chiungjung

    2017-06-01

    This meta-analysis examines the relationship between time spent on social networking sites and psychological well-being factors, namely self-esteem, life satisfaction, loneliness, and depression. Sixty-one studies consisting of 67 independent samples involving 19,652 participants were identified. The mean correlation between time spent on social networking sites and psychological well-being was low at r = -0.07. The correlations between time spent on social networking sites and positive indicators (self-esteem and life satisfaction) were close to 0, whereas those between time spent on social networking sites and negative indicators (depression and loneliness) were weak. The effects of publication outlet, site on which users spent time, scale of time spent, and participant age and gender were not significant. As most included studies used student samples, future research should be conducted to examine this relationship for adults.

  20. The GPU implementation of micro - Doppler period estimation

    NASA Astrophysics Data System (ADS)

    Yang, Liyuan; Wang, Junling; Bi, Ran

    2018-03-01

    Aiming at the problem that the computational complexity and the deficiency of real-time of the wideband radar echo signal, a program is designed to improve the performance of real-time extraction of micro-motion feature in this paper based on the CPU-GPU heterogeneous parallel structure. Firstly, we discuss the principle of the micro-Doppler effect generated by the rolling of the scattering points on the orbiting satellite, analyses how to use Kalman filter to compensate the translational motion of tumbling satellite and how to use the joint time-frequency analysis and inverse Radon transform to extract the micro-motion features from the echo after compensation. Secondly, the advantages of GPU in terms of real-time processing and the working principle of CPU-GPU heterogeneous parallelism are analysed, and a program flow based on GPU to extract the micro-motion feature from the radar echo signal of rolling satellite is designed. At the end of the article the results of extraction are given to verify the correctness of the program and algorithm.

  1. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    PubMed

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation.

  2. Regional variations in pedal cyclist injuries in New Zealand: safety in numbers or risk in scarcity?

    PubMed

    Tin, Sandar Tin; Woodward, Alistair; Thornley, Simon; Ameratunga, Shanthi

    2011-08-01

    To assess regional variations in rates of traffic injuries to pedal cyclists resulting in death or hospital inpatient treatment, in relation to time spent cycling and time spent travelling in a car. Cycling injuries were identified from the Mortality Collection and the National Minimum Dataset. Time spent cycling and time spent travelling as a driver or passenger in a car/van/ute/SUV were computed from National Household Travel Surveys. There are 16 census regions in New Zealand, some of which were combined for this analysis to ensure an adequate sample size, resulting in eight regional groups. Analyses were undertaken for 1996-99 and 2003-07. Injury rates, per million hours spent cycling, varied widely across regions (11 to 33 injuries during 1996-99 and 12 to 78 injuries during 2003-07). The injury rate increased with decreasing per capita time spent cycling. The rate also increased with increasing per capita time spent travelling in a car. There was an inverse association between the injury rate and the ratio of time spent cycling to time spent travelling in a car. The expected number of cycling injuries increased with increasing total time spent cycling but at a decreasing rate particularly after adjusting for total time spent travelling in a car. The findings indicate a 'risk in scarcity' effect for New Zealand cyclists such that risk profiles of cyclists are likely to deteriorate if fewer people use a bicycle and more use a car. Cooperative efforts to promote cycling and its safety and to restrict car use may reverse the risk in scarcity effect. © 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia.

  3. Agglomeration Multigrid for an Unstructured-Grid Flow Solver

    NASA Technical Reports Server (NTRS)

    Frink, Neal; Pandya, Mohagna J.

    2004-01-01

    An agglomeration multigrid scheme has been implemented into the sequential version of the NASA code USM3Dns, tetrahedral cell-centered finite volume Euler/Navier-Stokes flow solver. Efficiency and robustness of the multigrid-enhanced flow solver have been assessed for three configurations assuming an inviscid flow and one configuration assuming a viscous fully turbulent flow. The inviscid studies include a transonic flow over the ONERA M6 wing and a generic business jet with flow-through nacelles and a low subsonic flow over a high-lift trapezoidal wing. The viscous case includes a fully turbulent flow over the RAE 2822 rectangular wing. The multigrid solutions converged with 12%-33% of the Central Processing Unit (CPU) time required by the solutions obtained without multigrid. For all of the inviscid cases, multigrid in conjunction with an explicit time-stepping scheme performed the best with regard to the run time memory and CPU time requirements. However, for the viscous case multigrid had to be used with an implicit backward Euler time-stepping scheme that increased the run time memory requirement by 22% as compared to the run made without multigrid.

  4. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

    PubMed Central

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

    2012-01-01

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient’s skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures. PMID:24027616

  5. Real-time unmanned aircraft systems surveillance video mosaicking using GPU

    NASA Astrophysics Data System (ADS)

    Camargo, Aldo; Anderson, Kyle; Wang, Yi; Schultz, Richard R.; Fevig, Ronald A.

    2010-04-01

    Digital video mosaicking from Unmanned Aircraft Systems (UAS) is being used for many military and civilian applications, including surveillance, target recognition, border protection, forest fire monitoring, traffic control on highways, monitoring of transmission lines, among others. Additionally, NASA is using digital video mosaicking to explore the moon and planets such as Mars. In order to compute a "good" mosaic from video captured by a UAS, the algorithm must deal with motion blur, frame-to-frame jitter associated with an imperfectly stabilized platform, perspective changes as the camera tilts in flight, as well as a number of other factors. The most suitable algorithms use SIFT (Scale-Invariant Feature Transform) to detect the features consistent between video frames. Utilizing these features, the next step is to estimate the homography between two consecutives video frames, perform warping to properly register the image data, and finally blend the video frames resulting in a seamless video mosaick. All this processing takes a great deal of resources of resources from the CPU, so it is almost impossible to compute a real time video mosaic on a single processor. Modern graphics processing units (GPUs) offer computational performance that far exceeds current CPU technology, allowing for real-time operation. This paper presents the development of a GPU-accelerated digital video mosaicking implementation and compares it with CPU performance. Our tests are based on two sets of real video captured by a small UAS aircraft; one video comes from Infrared (IR) and Electro-Optical (EO) cameras. Our results show that we can obtain a speed-up of more than 50 times using GPU technology, so real-time operation at a video capture of 30 frames per second is feasible.

  6. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures.

    PubMed

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R

    2012-02-23

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  7. Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

    PubMed

    Eslami, Taban; Saeed, Fahad

    2018-04-20

    Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.

  8. Multigrid direct numerical simulation of the whole process of flow transition in 3-D boundary layers

    NASA Technical Reports Server (NTRS)

    Liu, Chaoqun; Liu, Zhining

    1993-01-01

    A new technology was developed in this study which provides a successful numerical simulation of the whole process of flow transition in 3-D boundary layers, including linear growth, secondary instability, breakdown, and transition at relatively low CPU cost. Most other spatial numerical simulations require high CPU cost and blow up at the stage of flow breakdown. A fourth-order finite difference scheme on stretched and staggered grids, a fully implicit time marching technique, a semi-coarsening multigrid based on the so-called approximate line-box relaxation, and a buffer domain for the outflow boundary conditions were all used for high-order accuracy, good stability, and fast convergence. A new fine-coarse-fine grid mapping technique was developed to keep the code running after the laminar flow breaks down. The computational results are in good agreement with linear stability theory, secondary instability theory, and some experiments. The cost for a typical case with 162 x 34 x 34 grid is around 2 CRAY-YMP CPU hours for 10 T-S periods.

  9. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

    NASA Astrophysics Data System (ADS)

    Lyakh, Dmitry I.

    2015-04-01

    An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typically appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the naïve scattering algorithm (no memory access optimization). The tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).

  10. Short-term dopaminergic regulation of GABA release in dopamine deafferented caudate-putamen is not directly associated with glutamic acid decarboxylase gene expression.

    PubMed

    O'Connor, W T; Lindefors, N; Brené, S; Herrera-Marschitz, M; Persson, H; Ungerstedt, U

    1991-07-08

    In vivo microdialysis and in situ hybridization were combined to study dopaminergic regulation of gamma-amino butyric acid (GABA) neurons in rat caudate-putamen (CPu). Potassium-stimulated GABA release in CPu was elevated following a dopamine deafferentation. Local perfusion with exogenous dopamine (50 microM) for 3 h via the microdialysis probe attenuated the potassium-stimulated increase in extracellular GABA in CPu. Expression of glutamic acid decarboxylase (GAD) mRNA was also increased in the dopamine deafferented CPu. However, local perfusion with dopamine had no significant attenuating effect on the increased GAD mRNA expression. These findings indicate that dopaminergic regulation of GABA neurons in the dopamine deafferented CPu includes both a short-term effect at the level of GABA release independent of changes in GAD mRNA expression and a long-term modulation at the level of GAD gene expression.

  11. The DISTO data acquisition system at SATURNE

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balestra, F.; Bedfer, Y.; Bertini, R.

    1998-06-01

    The DISTO collaboration has built a large-acceptance magnetic spectrometer designed to provide broad kinematic coverage of multiparticle final states produced in pp scattering. The spectrometer has been installed in the polarized proton beam of the Saturne accelerator in Saclay to study polarization observables in the {rvec p}p {yields} pK{sup +}{rvec Y} (Y = {Lambda}, {Sigma}{sup 0} or Y{sup *}) reaction and vector meson production ({psi}, {omega} and {rho}) in pp collisions. The data acquisition system is based on a VME 68030 CPU running the OS/9 operating system, housed in a single VME crate together with the CAMAC interface, the triplemore » port ECL memories, and four RISC R3000 CPU. The digitization of signals from the detectors is made by PCOS III and FERA front-end electronics. Data of several events belonging to a single Saturne extraction are stored in VME triple-port ECL memories using a hardwired fast sequencer. The buffer, optionally filtered by the RISC R3000 CPU, is recorded on a DLT cassette by DAQ CPU using the on-board SCSI interface during the acceleration cycle. Two UNIX workstations are connected to the VME CPUs through a fast parallel bus and the Local Area Network. They analyze a subset of events for on-line monitoring. The data acquisition system is able to read and record 3,500 ev/burst in the present configuration with a dead time of 15%.« less

  12. Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

    NASA Astrophysics Data System (ADS)

    Eriksen, Janus J.

    2017-09-01

    It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order Møller-Plesset (MP2) model in its resolution-of-the-identity approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimised device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly optimised to such a degree that the final implementations (using either double and/or single precision arithmetics) are capable of scaling to as large systems as allowed for by the capacity of the host central processing unit (CPU) main memory. The performance of the hybrid CPU/GPU implementations is assessed through calculations on test systems of alanine amino acid chains using one-electron basis sets of increasing size (ranging from double- to pentuple-ζ quality). For all but the smallest problem sizes of the present study, the optimised accelerated codes (using a single multi-core CPU host node in conjunction with six GPUs) are found to be capable of reducing the total time-to-solution by at least an order of magnitude over optimised, OpenMP-threaded CPU-only reference implementations.

  13. Time spent in physical activity and sedentary behaviors on the working day: the American time use survey.

    PubMed

    Tudor-Locke, Catrine; Leonardi, Claudia; Johnson, William D; Katzmarzyk, Peter T

    2011-12-01

    To determine time spent on the working day in sleep, work, sedentary behaviors, and light-, moderate-, and vigorous-intensity behaviors by occupation intensity. Data came from 30,758 working respondents to the 2003 to 2009 American Time Use Survey. Mean ± SEM time spent in work, sedentary behaviors, light-, moderate-, and vigorous-intensity activities, and sleep were computed by occupations classified as sedentary, light, moderate, and vigorous intensity. On average, approximately 32% of the 24-hour day was spent sleeping and approximately 31% was spent at work. Time spent in sedentary behaviors outside of work was higher, and light-intensity time was lower, with higher levels of intensity-defined occupation. Those employed in sedentary occupations were sedentary for approximately 11 hours per day, leaving little time to achieve recommended levels of physical activity for overall health.

  14. A new nonlinear conjugate gradient coefficient under strong Wolfe-Powell line search

    NASA Astrophysics Data System (ADS)

    Mohamed, Nur Syarafina; Mamat, Mustafa; Rivaie, Mohd

    2017-08-01

    A nonlinear conjugate gradient method (CG) plays an important role in solving a large-scale unconstrained optimization problem. This method is widely used due to its simplicity. The method is known to possess sufficient descend condition and global convergence properties. In this paper, a new nonlinear of CG coefficient βk is presented by employing the Strong Wolfe-Powell inexact line search. The new βk performance is tested based on number of iterations and central processing unit (CPU) time by using MATLAB software with Intel Core i7-3470 CPU processor. Numerical experimental results show that the new βk converge rapidly compared to other classical CG method.

  15. Hypermatrix scheme for finite element systems on CDC STAR-100 computer

    NASA Technical Reports Server (NTRS)

    Noor, A. K.; Voigt, S. J.

    1975-01-01

    A study is made of the adaptation of the hypermatrix (block matrix) scheme for solving large systems of finite element equations to the CDC STAR-100 computer. Discussion is focused on the organization of the hypermatrix computation using Cholesky decomposition and the mode of storage of the different submatrices to take advantage of the STAR pipeline (streaming) capability. Consideration is also given to the associated data handling problems and the means of balancing the I/Q and cpu times in the solution process. Numerical examples are presented showing anticipated gain in cpu speed over the CDC 6600 to be obtained by using the proposed algorithms on the STAR computer.

  16. An efficient implementation of semi-numerical computation of the Hartree-Fock exchange on the Intel Phi processor

    NASA Astrophysics Data System (ADS)

    Liu, Fenglai; Kong, Jing

    2018-07-01

    Unique technical challenges and their solutions for implementing semi-numerical Hartree-Fock exchange on the Phil Processor are discussed, especially concerning the single- instruction-multiple-data type of processing and small cache size. Benchmark calculations on a series of buckyball molecules with various Gaussian basis sets on a Phi processor and a six-core CPU show that the Phi processor provides as much as 12 times of speedup with large basis sets compared with the conventional four-center electron repulsion integration approach performed on the CPU. The accuracy of the semi-numerical scheme is also evaluated and found to be comparable to that of the resolution-of-identity approach.

  17. Real-Time Agent-Based Modeling Simulation with in-situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing.

    PubMed

    Seekhao, Nuttiiya; Shung, Caroline; JaJa, Joseph; Mongeau, Luc; Li-Jessen, Nicole Y K

    2016-05-01

    We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.

  18. Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

    NASA Astrophysics Data System (ADS)

    Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

    2017-04-01

    Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40 Hz for a model of 2001 by 600 grid points with 5 m grid spacing and 5000 time steps, in less than 2.5 minutes per source. In practice, using 15 nodes (30 GPUs) to model 101 sources, each iteration took approximately 9 minutes. For reference, the same inversion run with our CPU code uses two hours per iteration. This was done using only a very simple wavefield interpolation technique, saving every second timestep. Using a more sophisticated checkpointing or wavefield reconstruction method would allow us to increase this model size significantly. Our results show that ordinary gaming GPUs are a viable alternative to the expensive professional GPUs often used today, when performing large scale modeling and inversion in geophysics.

  19. Two-dimensional Euler and Navier-Stokes Time accurate simulations of fan rotor flows

    NASA Technical Reports Server (NTRS)

    Boretti, A. A.

    1990-01-01

    Two numerical methods are presented which describe the unsteady flow field in the blade-to-blade plane of an axial fan rotor. These methods solve the compressible, time-dependent, Euler and the compressible, turbulent, time-dependent, Navier-Stokes conservation equations for mass, momentum, and energy. The Navier-Stokes equations are written in Favre-averaged form and are closed with an approximate two-equation turbulence model with low Reynolds number and compressibility effects included. The unsteady aerodynamic component is obtained by superposing inflow or outflow unsteadiness to the steady conditions through time-dependent boundary conditions. The integration in space is performed by using a finite volume scheme, and the integration in time is performed by using k-stage Runge-Kutta schemes, k = 2,5. The numerical integration algorithm allows the reduction of the computational cost of an unsteady simulation involving high frequency disturbances in both CPU time and memory requirements. Less than 200 sec of CPU time are required to advance the Euler equations in a computational grid made up of about 2000 grid during 10,000 time steps on a CRAY Y-MP computer, with a required memory of less than 0.3 megawords.

  20. Developing infrared array controller with software real time operating system

    NASA Astrophysics Data System (ADS)

    Sako, Shigeyuki; Miyata, Takashi; Nakamura, Tomohiko; Motohara, Kentaro; Uchimoto, Yuka Katsuno; Onaka, Takashi; Kataza, Hirokazu

    2008-07-01

    Real-time capabilities are required for a controller of a large format array to reduce a dead-time attributed by readout and data transfer. The real-time processing has been achieved by dedicated processors including DSP, CPLD, and FPGA devices. However, the dedicated processors have problems with memory resources, inflexibility, and high cost. Meanwhile, a recent PC has sufficient resources of CPUs and memories to control the infrared array and to process a large amount of frame data in real-time. In this study, we have developed an infrared array controller with a software real-time operating system (RTOS) instead of the dedicated processors. A Linux PC equipped with a RTAI extension and a dual-core CPU is used as a main computer, and one of the CPU cores is allocated to the real-time processing. A digital I/O board with DMA functions is used for an I/O interface. The signal-processing cores are integrated in the OS kernel as a real-time driver module, which is composed of two virtual devices of the clock processor and the frame processor tasks. The array controller with the RTOS realizes complicated operations easily, flexibly, and at a low cost.

  1. Time Spent in Home Production Activities by Married Couples and Single Adults with Children.

    ERIC Educational Resources Information Center

    Douthitt, Robin A.

    1988-01-01

    A study found that, over time, married women employed full time have not decreased the time spent working in the home. Married men with young children have increased the time spent on home work. Single parents' time most closely resembled that of married women. (JOW)

  2. GPU accelerated generation of digitally reconstructed radiographs for 2-D/3-D image registration.

    PubMed

    Dorgham, Osama M; Laycock, Stephen D; Fisher, Mark H

    2012-09-01

    Recent advances in programming languages for graphics processing units (GPUs) provide developers with a convenient way of implementing applications which can be executed on the CPU and GPU interchangeably. GPUs are becoming relatively cheap, powerful, and widely available hardware components, which can be used to perform intensive calculations. The last decade of hardware performance developments shows that GPU-based computation is progressing significantly faster than CPU-based computation, particularly if one considers the execution of highly parallelisable algorithms. Future predictions illustrate that this trend is likely to continue. In this paper, we introduce a way of accelerating 2-D/3-D image registration by developing a hybrid system which executes on the CPU and utilizes the GPU for parallelizing the generation of digitally reconstructed radiographs (DRRs). Based on the advancements of the GPU over the CPU, it is timely to exploit the benefits of many-core GPU technology by developing algorithms for DRR generation. Although some previous work has investigated the rendering of DRRs using the GPU, this paper investigates approximations which reduce the computational overhead while still maintaining a quality consistent with that needed for 2-D/3-D registration with sufficient accuracy to be clinically acceptable in certain applications of radiation oncology. Furthermore, by comparing implementations of 2-D/3-D registration on the CPU and GPU, we investigate current performance and propose an optimal framework for PC implementations addressing the rigid registration problem. Using this framework, we are able to render DRR images from a 256×256×133 CT volume in ~24 ms using an NVidia GeForce 8800 GTX and in ~2 ms using NVidia GeForce GTX 580. In addition to applications requiring fast automatic patient setup, these levels of performance suggest image-guided radiation therapy at video frame rates is technically feasible using relatively low cost PC architecture.

  3. Time Spent, Workload, and Student and Faculty Perceptions in a Blended Learning Environment

    PubMed Central

    Schumacher, Christie; Arif, Sally

    2016-01-01

    Objective. To evaluate student perception and time spent on asynchronous online lectures in a blended learning environment (BLE) and to assess faculty workload and perception. Methods. Students (n=427) time spent viewing online lectures was measured in three courses. Students and faculty members completed a survey to assess perceptions of a BLE. Faculty members recorded time spent creating BLEs. Results. Total time spent in the BLE was less than the allocated time for two of the three courses by 3-15%. Students preferred online lectures for their flexibility, students’ ability to apply information learned, and congruence with their learning styles. Faculty members reported the BLE facilitated higher levels of learning during class sessions but noted an increase in workload. Conclusion. A BLE increased faculty workload but was well received by students. Time spent viewing online lectures was less than what was allocated in two of the three courses. PMID:27667839

  4. Reduced order model based on principal component analysis for process simulation and optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lang, Y.; Malacina, A.; Biegler, L.

    2009-01-01

    It is well-known that distributed parameter computational fluid dynamics (CFD) models provide more accurate results than conventional, lumped-parameter unit operation models used in process simulation. Consequently, the use of CFD models in process/equipment co-simulation offers the potential to optimize overall plant performance with respect to complex thermal and fluid flow phenomena. Because solving CFD models is time-consuming compared to the overall process simulation, we consider the development of fast reduced order models (ROMs) based on CFD results to closely approximate the high-fidelity equipment models in the co-simulation. By considering process equipment items with complicated geometries and detailed thermodynamic property models,more » this study proposes a strategy to develop ROMs based on principal component analysis (PCA). Taking advantage of commercial process simulation and CFD software (for example, Aspen Plus and FLUENT), we are able to develop systematic CFD-based ROMs for equipment models in an efficient manner. In particular, we show that the validity of the ROM is more robust within well-sampled input domain and the CPU time is significantly reduced. Typically, it takes at most several CPU seconds to evaluate the ROM compared to several CPU hours or more to solve the CFD model. Two case studies, involving two power plant equipment examples, are described and demonstrate the benefits of using our proposed ROM methodology for process simulation and optimization.« less

  5. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    PubMed

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  6. Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

    NASA Astrophysics Data System (ADS)

    Laracuente, Nicholas; Grossman, Carl

    2013-03-01

    We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College

  7. SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xiao, K; Chen, D. Z; Hu, X. S

    Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this proceduremore » into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF-1217906, and also in part by a research contract from the Sandia National Laboratories.« less

  8. Exploring compression techniques for ROOT IO

    NASA Astrophysics Data System (ADS)

    Zhang, Z.; Bockelman, B.

    2017-10-01

    ROOT provides an flexible format used throughout the HEP community. The number of use cases - from an archival data format to end-stage analysis - has required a number of tradeoffs to be exposed to the user. For example, a high “compression level” in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU time when read). At the scale of the LHC experiment, poor design choices can result in terabytes of wasted space or wasted CPU time. We explore and attempt to quantify some of these tradeoffs. Specifically, we explore: the use of alternate compressing algorithms to optimize for read performance; an alternate method of compressing individual events to allow efficient random access; and a new approach to whole-file compression. Quantitative results are given, as well as guidance on how to make compression decisions for different use cases.

  9. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    NASA Astrophysics Data System (ADS)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  10. Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

    NASA Astrophysics Data System (ADS)

    Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

    2017-10-01

    Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.

  11. A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models

    NASA Astrophysics Data System (ADS)

    Li, Qia; Micchelli, Charles A.; Shen, Lixin; Xu, Yuesheng

    2012-09-01

    Our goal in this paper is to improve the computational performance of the proximity algorithms for the L1/TV denoising model. This leads us to a new characterization of all solutions to the L1/TV model via fixed-point equations expressed in terms of the proximity operators. Based upon this observation we develop an algorithm for solving the model and establish its convergence. Furthermore, we demonstrate that the proposed algorithm can be accelerated through the use of the componentwise Gauss-Seidel iteration so that the CPU time consumed is significantly reduced. Numerical experiments using the proposed algorithm for impulsive noise removal are included, with a comparison to three recently developed algorithms. The numerical results show that while the proposed algorithm enjoys a high quality of the restored images, as the other three known algorithms do, it performs significantly better in terms of computational efficiency measured in the CPU time consumed.

  12. Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Ramirez, Andres; Rahnemoonfar, Maryam

    2017-04-01

    A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.

  13. Accelerated Monte Carlo Simulation on the Chemical Stage in Water Radiolysis using GPU

    PubMed Central

    Tian, Zhen; Jiang, Steve B.; Jia, Xun

    2018-01-01

    The accurate simulation of water radiolysis is an important step to understand the mechanisms of radiobiology and quantitatively test some hypotheses regarding radiobiological effects. However, the simulation of water radiolysis is highly time consuming, taking hours or even days to be completed by a conventional CPU processor. This time limitation hinders cell-level simulations for a number of research studies. We recently initiated efforts to develop gMicroMC, a GPU-based fast microscopic MC simulation package for water radiolysis. The first step of this project focused on accelerating the simulation of the chemical stage, the most time consuming stage in the entire water radiolysis process. A GPU-friendly parallelization strategy was designed to address the highly correlated many-body simulation problem caused by the mutual competitive chemical reactions between the radiolytic molecules. Two cases were tested, using a 750 keV electron and a 5 MeV proton incident in pure water, respectively. The time-dependent yields of all the radiolytic species during the chemical stage were used to evaluate the accuracy of the simulation. The relative differences between our simulation and the Geant4-DNA simulation were on average 5.3% and 4.4% for the two cases. Our package, executed on an Nvidia Titan black GPU card, successfully completed the chemical stage simulation of the two cases within 599.2 s and 489.0 s. As compared with Geant4-DNA that was executed on an Intel i7-5500U CPU processor and needed 28.6 h and 26.8 h for the two cases using a single CPU core, our package achieved a speed-up factor of 171.1-197.2. PMID:28323637

  14. Quasi-elastic light scattering: Signal storage, correlation, and spectrum analysis under control of an 8-bit microprocessor

    NASA Astrophysics Data System (ADS)

    Glatter, Otto; Fuchs, Heribert; Jorde, Christian; Eigner, Wolf-Dieter

    1987-03-01

    The microprocessor of an 8-bit PC system is used as a central control unit for the acquisition and evaluation of data from quasi-elastic light scattering experiments. Data are sampled with a width of 8 bits under control of the CPU. This limits the minimum sample time to 20 μs. Shorter sample times would need a direct memory access channel. The 8-bit CPU can address a 64-kbyte RAM without additional paging. Up to 49 000 sample points can be measured without interruption. After storage, a correlation function or a power spectrum can be calculated from such a primary data set. Furthermore access is provided to the primary data for stability control, statistical tests, and for comparison of different evaluation methods for the same experiment. A detailed analysis of the signal (histogram) and of the effect of overflows is possible and shows that the number of pulses but not the number of overflows determines the error in the result. The correlation function can be computed with reasonable accuracy from data with a mean pulse rate greater than one, the power spectrum needs a three times higher pulse rate for convergence. The statistical accuracy of the results from 49 000 sample points is of the order of a few percent. Additional averages are necessary to improve their quality. The hardware extensions for the PC system are inexpensive. The main disadvantage of the present system is the high minimum sampling time of 20 μs and the fact that the correlogram or the power spectrum cannot be computed on-line as it can be done with hardware correlators or spectrum analyzers. These shortcomings and the storage size restrictions can be removed with a faster 16/32-bit CPU.

  15. Accelerated Monte Carlo simulation on the chemical stage in water radiolysis using GPU

    NASA Astrophysics Data System (ADS)

    Tian, Zhen; Jiang, Steve B.; Jia, Xun

    2017-04-01

    The accurate simulation of water radiolysis is an important step to understand the mechanisms of radiobiology and quantitatively test some hypotheses regarding radiobiological effects. However, the simulation of water radiolysis is highly time consuming, taking hours or even days to be completed by a conventional CPU processor. This time limitation hinders cell-level simulations for a number of research studies. We recently initiated efforts to develop gMicroMC, a GPU-based fast microscopic MC simulation package for water radiolysis. The first step of this project focused on accelerating the simulation of the chemical stage, the most time consuming stage in the entire water radiolysis process. A GPU-friendly parallelization strategy was designed to address the highly correlated many-body simulation problem caused by the mutual competitive chemical reactions between the radiolytic molecules. Two cases were tested, using a 750 keV electron and a 5 MeV proton incident in pure water, respectively. The time-dependent yields of all the radiolytic species during the chemical stage were used to evaluate the accuracy of the simulation. The relative differences between our simulation and the Geant4-DNA simulation were on average 5.3% and 4.4% for the two cases. Our package, executed on an Nvidia Titan black GPU card, successfully completed the chemical stage simulation of the two cases within 599.2 s and 489.0 s. As compared with Geant4-DNA that was executed on an Intel i7-5500U CPU processor and needed 28.6 h and 26.8 h for the two cases using a single CPU core, our package achieved a speed-up factor of 171.1-197.2.

  16. Accelerated Monte Carlo simulation on the chemical stage in water radiolysis using GPU.

    PubMed

    Tian, Zhen; Jiang, Steve B; Jia, Xun

    2017-04-21

    The accurate simulation of water radiolysis is an important step to understand the mechanisms of radiobiology and quantitatively test some hypotheses regarding radiobiological effects. However, the simulation of water radiolysis is highly time consuming, taking hours or even days to be completed by a conventional CPU processor. This time limitation hinders cell-level simulations for a number of research studies. We recently initiated efforts to develop gMicroMC, a GPU-based fast microscopic MC simulation package for water radiolysis. The first step of this project focused on accelerating the simulation of the chemical stage, the most time consuming stage in the entire water radiolysis process. A GPU-friendly parallelization strategy was designed to address the highly correlated many-body simulation problem caused by the mutual competitive chemical reactions between the radiolytic molecules. Two cases were tested, using a 750 keV electron and a 5 MeV proton incident in pure water, respectively. The time-dependent yields of all the radiolytic species during the chemical stage were used to evaluate the accuracy of the simulation. The relative differences between our simulation and the Geant4-DNA simulation were on average 5.3% and 4.4% for the two cases. Our package, executed on an Nvidia Titan black GPU card, successfully completed the chemical stage simulation of the two cases within 599.2 s and 489.0 s. As compared with Geant4-DNA that was executed on an Intel i7-5500U CPU processor and needed 28.6 h and 26.8 h for the two cases using a single CPU core, our package achieved a speed-up factor of 171.1-197.2.

  17. hybrid\\scriptsize{{MANTIS}}: a CPU-GPU Monte Carlo method for modeling indirect x-ray detectors with columnar scintillators

    NASA Astrophysics Data System (ADS)

    Sharma, Diksha; Badal, Andreu; Badano, Aldo

    2012-04-01

    The computational modeling of medical imaging systems often requires obtaining a large number of simulated images with low statistical uncertainty which translates into prohibitive computing times. We describe a novel hybrid approach for Monte Carlo simulations that maximizes utilization of CPUs and GPUs in modern workstations. We apply the method to the modeling of indirect x-ray detectors using a new and improved version of the code \\scriptsize{{MANTIS}}, an open source software tool used for the Monte Carlo simulations of indirect x-ray imagers. We first describe a GPU implementation of the physics and geometry models in fast\\scriptsize{{DETECT}}2 (the optical transport model) and a serial CPU version of the same code. We discuss its new features like on-the-fly column geometry and columnar crosstalk in relation to the \\scriptsize{{MANTIS}} code, and point out areas where our model provides more flexibility for the modeling of realistic columnar structures in large area detectors. Second, we modify \\scriptsize{{PENELOPE}} (the open source software package that handles the x-ray and electron transport in \\scriptsize{{MANTIS}}) to allow direct output of location and energy deposited during x-ray and electron interactions occurring within the scintillator. This information is then handled by optical transport routines in fast\\scriptsize{{DETECT}}2. A load balancer dynamically allocates optical transport showers to the GPU and CPU computing cores. Our hybrid\\scriptsize{{MANTIS}} approach achieves a significant speed-up factor of 627 when compared to \\scriptsize{{MANTIS}} and of 35 when compared to the same code running only in a CPU instead of a GPU. Using hybrid\\scriptsize{{MANTIS}}, we successfully hide hours of optical transport time by running it in parallel with the x-ray and electron transport, thus shifting the computational bottleneck from optical to x-ray transport. The new code requires much less memory than \\scriptsize{{MANTIS}} and, as a result, allows us to efficiently simulate large area detectors.

  18. Synthesis and Characterization of Biodegradable Polyurethane for Hypopharyngeal Tissue Engineering

    PubMed Central

    Shen, Zhisen; Lu, Dakai; Li, Qun; Zhang, Zongyong

    2015-01-01

    Biodegradable crosslinked polyurethane (cPU) was synthesized using polyethylene glycol (PEG), L-lactide (L-LA), and hexamethylene diisocyanate (HDI), with iron acetylacetonate (Fe(acac)3) as the catalyst and PEG as the extender. Chemical components of the obtained polymers were characterized by FTIR spectroscopy, 1H NMR spectra, and Gel Permeation Chromatography (GPC). The thermodynamic properties, mechanical behaviors, surface hydrophilicity, degradability, and cytotoxicity were tested via differential scanning calorimetry (DSC), tensile tests, contact angle measurements, and cell culture. The results show that the synthesized cPU possessed good flexibility with quite low glass transition temperature (T g, −22°C) and good wettability. Water uptake measured as high as 229.7 ± 18.7%. These properties make cPU a good candidate material for engineering soft tissues such as the hypopharynx. In vitro and in vivo tests showed that cPU has the ability to support the growth of human hypopharyngeal fibroblasts and angiogenesis was observed around cPU after it was implanted subcutaneously in SD rats. PMID:25839041

  19. Is our medical school socially accountable? The case of Faculty of Medicine, Suez Canal University.

    PubMed

    Hosny, Somaya; Ghaly, Mona; Boelen, Charles

    2015-04-01

    Faculty of Medicine, Suez Canal University (FOM/SCU) was established as community oriented school with innovative educational strategies. Social accountability represents the commitment of the medical school towards the community it serves. To assess FOM/SCU compliance to social accountability using the "Conceptualization, Production, Usability" (CPU) model. FOM/SCU's practice was reviewed against CPU model parameters. CPU consists of three domains, 11 sections and 31 parameters. Data were collected through unstructured interviews with the main stakeholders and documents review since 2005 to 2013. FOM/SCU shows general compliance to the three domains of the CPU. Very good compliance was shown to the "P" domain of the model through FOM/SCU's innovative educational system, students and faculty members. More work is needed on the "C" and "U" domains. FOM/SCU complies with many parameters of the CPU model; however, more work should be accomplished to comply with some items in the C and U domains so that FOM/SCU can be recognized as a proactive socially accountable school.

  20. Synthesis and characterization of biodegradable polyurethane for hypopharyngeal tissue engineering.

    PubMed

    Shen, Zhisen; Lu, Dakai; Li, Qun; Zhang, Zongyong; Zhu, Yabin

    2015-01-01

    Biodegradable crosslinked polyurethane (cPU) was synthesized using polyethylene glycol (PEG), L-lactide (L-LA), and hexamethylene diisocyanate (HDI), with iron acetylacetonate (Fe(acac)3) as the catalyst and PEG as the extender. Chemical components of the obtained polymers were characterized by FTIR spectroscopy, (1)H NMR spectra, and Gel Permeation Chromatography (GPC). The thermodynamic properties, mechanical behaviors, surface hydrophilicity, degradability, and cytotoxicity were tested via differential scanning calorimetry (DSC), tensile tests, contact angle measurements, and cell culture. The results show that the synthesized cPU possessed good flexibility with quite low glass transition temperature (T g , -22°C) and good wettability. Water uptake measured as high as 229.7 ± 18.7%. These properties make cPU a good candidate material for engineering soft tissues such as the hypopharynx. In vitro and in vivo tests showed that cPU has the ability to support the growth of human hypopharyngeal fibroblasts and angiogenesis was observed around cPU after it was implanted subcutaneously in SD rats.

  1. Seasonal variation in time budgets and milk yield for Jersey, Friesland and crossbred cows raised in a pasture-based system.

    PubMed

    Dodzi, Madodana S; Muchenje, Voster

    2012-10-01

    The time budgets and daily milk yield of Jersey and Friesland cows and their crosses were compared in a pasture-based system by recording the time spent grazing, drinking, lying, standing and walking in four seasons of the year (cool-dry, hot-dry, hot-wet and post-rainy). Observations were made from 0800 to 1400 hours on seven cows per breed. Seven observers monitored the cows at 10-min intervals for 6 h using stop watches. Time spent standing was higher (P < 0.05) for Friesland compared to Jersey cows and the crossbred cows during the hot-wet season. Time spent walking differed among the three genotypes with the Jersey spending more time (P < 0.05) in both hot-wet and cool-dry seasons. No differences were noted on time spent lying down (P > 0.05) across the genotypes in the hot-wet season. In the cool-dry season, differences in time spent grazing (P < 0.05) were noted with the Jersey cows spending more time. The Friesland and the crossbred spent more time lying down (P < 0.05) than the Jersey cows in the cool-dry season. No time differences were noted for time spent standing (P > 0.05) in the same season. The Jersey cows spent the longest time walking (P < 0.05) during the cool-dry period. There were seasonal differences in time spent in all activities (P < 0.05). Time spent on grazing was longest in post-rainy season and lowest in hot-wet season. Differences were observed in the time spent lying down (P < 0.05). The longest period was observed in the hot-dry season and lowest in the hot-wet season. Daily milk yield varied (P < 0.05) with breed with the Friesland and Jersey producing higher yields than the crosses. The highest amount was produced in hot-dry and the least in hot-wet season. Milk yield and lying down were positively correlated (P < 0.05) in Jersey and Friesland cows. Standing was negatively correlated with milk yield (P < 0.05) in both Friesland and Jersey cows. No significant relationship was observed for the crossbred cows. It was concluded that the genotypes show different levels of sensitivity to seasons and that a relationship exists between milk yield and time budgets.

  2. Associations between maternal employment and time spent in nutrition-related behaviours among German children and mothers.

    PubMed

    Möser, Anke; Chen, Susan E; Jilcott, Stephanie B; Nayga, Rodolfo M

    2012-07-01

    To examine associations between maternal employment and time spent engaging in nutrition-related behaviours among mothers and children using a nationally representative sample of households in West and East Germany. A cross-sectional analysis was performed using time-use data for a sample of mother-child dyads. Associations between maternal employment and time spent in nutrition-related activities such as eating at home, eating away from home and food preparation were estimated using a double-hurdle model. German Time Budget Survey 2001/02. The overall sample included 1071 households with a child between 10 and 17 years of age. The time-use data were collected for a 3 d period of observation (two weekdays and one weekend day). Maternal employment was associated with the time children spent on nutrition-related behaviours. In households with employed mothers, children spent more time eating alone at home and less time eating meals with their mothers. Moreover, employed mothers spent less time on meal preparation compared with non-employed mothers. There were regional differences in time spent on nutrition-related behaviours, such that East German children were more likely to eat at home alone than West German children. Maternal employment was associated with less time spent eating with children and preparing food, which may be related to the increasing childhood obesity rates in Germany. Future national surveys that collect both time-use data and health outcomes could yield further insight into mechanisms by which maternal time use might be associated with health outcomes among children.

  3. The Reel Deal In 3D: The Spatio-Temporal Evolution of YSO Jets

    NASA Astrophysics Data System (ADS)

    Frank, Adam

    2014-10-01

    Jets are a ubiquitous phenomena in astrophysics, though in most cases their central engines are unresolvable. Thus the structure of the jets often acts as a proxy for understanding the objects creating them. Jets are also of interest in their own right, serving as critical examples of rapidly evolving astrophysical magnetized plasma systems. And while millions of CPU hours {at least} have been spent simulating the kinds of astrophysical plasma dynamics that occur routinely in jets, we rarely have had the chance to study their real-time evolution. In this proposal we seek to use a unique multi-epoch HST dataset of protostellar jets to carry forward an innovative theoretical, numerical and laboratory-based study of magnetized outflows and the plasma processes which determine their evolution. Our work will make direct and detailed contact with these HST data sets and will articulate newly-observed features of jet dynamics that have not been possible to explore before. Using numerical simulations and laboratory plasma studies we seek to articulate the full 3-D nature of new behaviors seen in the HST data. Our collaboration includes the use of scaled laboratory plasma experiments with hypersonic magnetized radiative jets. The MHD experiments have explored how jets break up into clumps via kink-mode instabilities. Therefore such experiments are directly relevant to the initial conditions in our models.

  4. Time Spent Walking and Risk of Diabetes in Japanese Adults: The Japan Public Health Center-Based Prospective Diabetes Study.

    PubMed

    Kabeya, Yusuke; Goto, Atsushi; Kato, Masayuki; Matsushita, Yumi; Takahashi, Yoshihiko; Isogawa, Akihiro; Inoue, Manami; Mizoue, Tetsuya; Tsugane, Shoichiro; Kadowaki, Takashi; Noda, Mitsuhiko

    2016-01-01

    The association between time spent walking and risk of diabetes was investigated in a Japanese population-based cohort. Data from the Japan Public Health Center-based Prospective Diabetes cohort were analyzed. The surveys of diabetes were performed at baseline and at the 5-year follow-up. Time spent walking per day was assessed using a self-reported questionnaire (<30 minutes, 30 minutes to <1 hour, 1 to <2 hours, or ≥2 hours). A cross-sectional analysis was performed among 26 488 adults in the baseline survey. Logistic regression was used to examine the association between time spent walking and the presence of unrecognized diabetes. We then performed a longitudinal analysis that was restricted to 11 101 non-diabetic adults who participated in both the baseline and 5-year surveys. The association between time spent walking and the incidence of diabetes during the 5 years was examined. In the cross-sectional analysis, 1058 participants had unrecognized diabetes. Those with time spent walking of <30 minutes per day had increased odds of having diabetes in relation to those with time spent walking of ≥2 hours (adjusted odds ratio [OR] 1.23; 95% CI, 1.02-1.48). In the longitudinal analysis, 612 participants developed diabetes during the 5 years of follow-up. However, a significant association between time spent walking and the incidence of diabetes was not observed. Increased risk of diabetes was implied in those with time spent walking of <30 minutes per day, although the longitudinal analysis failed to show a significant result.

  5. FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks

    PubMed Central

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out. PMID:25602758

  6. FastGCN: a GPU accelerated tool for fast gene co-expression networks.

    PubMed

    Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun

    2015-01-01

    Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.

  7. GPU-Q-J, a fast method for calculating root mean square deviation (RMSD) after optimal superposition

    PubMed Central

    2011-01-01

    Background Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH. Findings The Nutritious Rice for the World Project (NRW) on World Community Grid predicted de novo, the structures of over 62,000 small proteins and protein domains returning a total of 10 billion candidate structures. Clustering ensembles of structures on this scale requires calculation of large similarity matrices consisting of RMSDs between each pair of structures in the set. As a real-world test, we calculated the matrices for 6 different ensembles from NRW. The GPU method was 260 times faster that the fastest existing CPU based method and over 500 times faster than the method that had been previously used. Conclusions GPU-Q-J is a significant advance over previous CPU methods. It relieves a major bottleneck in the clustering of large numbers of structures for NRW. It also has applications in structure comparison methods that involve multiple superposition and RMSD determination steps, particularly when such methods are applied on a proteome and genome wide scale. PMID:21453553

  8. Determinants of children's use of and time spent in fast-food and full-service restaurants.

    PubMed

    McIntosh, Alex; Kubena, Karen S; Tolle, Glen; Dean, Wesley; Kim, Mi-Jeong; Jan, Jie-Sheng; Anding, Jenna

    2011-01-01

    Identify parental and children's determinants of children's use of and time spent in fast-food (FF) and full-service (FS) restaurants. Analysis of cross-sectional data. Parents were interviewed by phone; children were interviewed in their homes. Parents and children ages 9-11 or 13-15 from 312 families were obtained via random-digit dialing. Dependent variables were the use of and the time spent in FF and FS restaurants by children. Determinants included parental work schedules, parenting style, and family meal ritual perceptions. Logistic regression was used for multivariate analysis of use of restaurants. Least squares regression was used for multivariate analysis of time spent in restaurants. Significance set at P < .05. Factors related to use of and time spent in FF and FS restaurants included parental work schedules, fathers' use of such restaurants, and children's time spent in the family automobile. Parenting style, parental work, parental eating habits and perceptions of family meals, and children's other uses of their time influence children's use of and time spent in FF and FS restaurants. Copyright © 2011 Society for Nutrition Education. Published by Elsevier Inc. All rights reserved.

  9. Preliminary Study of Image Reconstruction Algorithm on a Digital Signal Processor

    DTIC Science & Technology

    2014-03-01

    5.2 Comparison of CPU-GPU, CPU-FPGA, and CPU-DSP Designs The work for implementing VHDL description of the back-projection algorithm on a physical...FPGA was not complete. Hence, the DSP implementation results are compared with the simulated results for the VHDL design. Simulating VHDL provides an...rather than at the software level. Depending on an application’s characteristics, FPGA implementations can provide a significant performance

  10. Places where children are active: A longitudinal examination of children's physical activity.

    PubMed

    Perry, Cynthia K; Ackert, Elizabeth; Sallis, James F; Glanz, Karen; Saelens, Brian E

    2016-12-01

    Using two-year longitudinal data, we examined locations where children spent time and were active, whether location patterns were stable, and relationships between spending time in their home neighborhood and moderate to vigorous physical activity (MVPA). At two time points (2007-2009 and 2009-2011), children living in the metropolitans areas of either San Diego, CA or Seattle, WA wore an accelerometer, and parents recorded their child's locations for seven days. Across two years, global average proportion of time spent in each location was stable, but total time and proportion of time in each location spent in MVPA decreased significantly across all locations. Children spent the largest proportion of time in MVPA in their home neighborhood at both time points, although they spent little time in their home neighborhood. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

    PubMed

    Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

    2011-05-01

    Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

  12. Source parameter inversion of compound earthquakes on GPU/CPU hybrid platform

    NASA Astrophysics Data System (ADS)

    Wang, Y.; Ni, S.; Chen, W.

    2012-12-01

    Source parameter of earthquakes is essential problem in seismology. Accurate and timely determination of the earthquake parameters (such as moment, depth, strike, dip and rake of fault planes) is significant for both the rupture dynamics and ground motion prediction or simulation. And the rupture process study, especially for the moderate and large earthquakes, is essential as the more detailed kinematic study has became the routine work of seismologists. However, among these events, some events behave very specially and intrigue seismologists. These earthquakes usually consist of two similar size sub-events which occurred with very little time interval, such as mb4.5 Dec.9, 2003 in Virginia. The studying of these special events including the source parameter determination of each sub-events will be helpful to the understanding of earthquake dynamics. However, seismic signals of two distinctive sources are mixed up bringing in the difficulty of inversion. As to common events, the method(Cut and Paste) has been proven effective for resolving source parameters, which jointly use body wave and surface wave with independent time shift and weights. CAP could resolve fault orientation and focal depth using a grid search algorithm. Based on this method, we developed an algorithm(MUL_CAP) to simultaneously acquire parameters of two distinctive events. However, the simultaneous inversion of both sub-events make the computation very time consuming, so we develop a hybrid GPU and CPU version of CAP(HYBRID_CAP) to improve the computation efficiency. Thanks to advantages on multiple dimension storage and processing in GPU, we obtain excellent performance of the revised code on GPU-CPU combined architecture and the speedup factors can be as high as 40x-90x compared to classical cap on traditional CPU architecture.As the benchmark, we take the synthetics as observation and inverse the source parameters of two given sub-events and the inversion results are very consistent with the true parameters. For the events in Virginia, USA on 9 Dec, 2003, we re-invert source parameters and detailed analysis of regional waveform indicates that Virginia earthquake included two sub-events which are Mw4.05 and Mw4.25 at the same depth of 10km with focal mechanism of strike65/dip32/rake135, which are consistent with previous study. Moreover, compared to traditional two-source model method, MUL_CAP is more automatic with no need for human intervention.

  13. Compared to Canadians, U.S. physicians spend nearly four times as much money interacting with payers.

    PubMed

    Zimmerman, Christina

    2011-11-01

    (1) In Canadian office practices, physi­cians spent 2.2 hours per week interacting with payers, nurses spent 2.5 hours, and clerical staff spent 15.9 hours. In U.S. practices, physicians spent 3.4 hours per week interacting with payers, nurses spent 20.6 hours, and clerical staff spent 53.1 hours. (2) Canadian physician practices spent $22,205 per physician per year on interactions with health plans. U.S. physician practices spent $82,975 per physician per year. (3) U.S. physician practices spend $60,770 per physician per year more (approximately four times as much) than their Canadian counterparts.

  14. The effects of automated scatter feeders on captive grizzly bear activity budgets.

    PubMed

    Andrews, Nathan L P; Ha, James C

    2014-01-01

    Although captive bears are popular zoo attractions, they are known to exhibit high levels of repetitive behaviors (RBs). These behaviors have also made them particularly popular subjects for welfare research. To date, most research on ursid welfare has focused on various feeding methods that seek to increase time spent searching for, extracting, or consuming food. Prior research indicates an average of a 50% reduction in RBs when attempts are successful and, roughly, a 50% success rate across studies. This research focused on decreasing time spent in an RB while increasing the time spent active by increasing time spent searching for, extracting, and consuming food. The utility of timed, automated scatter feeders was examined for use with captive grizzly bears (Ursis arctos horribilis). Findings include a significant decrease in time spent in RB and a significant increase in time spent active while the feeders were in use. Further, the bears exhibited a wider range of behaviors and a greater use of their enclosure.

  15. The composition of intern work while on call.

    PubMed

    Fletcher, Kathlyn E; Visotcky, Alexis M; Slagle, Jason M; Tarima, Sergey; Weinger, Matthew B; Schapira, Marilyn M

    2012-11-01

    The work of house staff is being increasingly scrutinized as duty hours continue to be restricted. To describe the distribution of work performed by internal medicine interns while on call. Prospective time motion study on general internal medicine wards at a VA hospital affiliated with a tertiary care medical center and internal medicine residency program. Internal medicine interns. Trained observers followed interns during a "call" day. The observers continuously recorded the tasks performed by interns, using customized task analysis software. We measured the amount of time spent on each task. We calculated means and standard deviations for the amount of time spent on six categories of tasks: clinical computer work (e.g., writing orders and notes), non-patient communication, direct patient care (work done at the bedside), downtime, transit and teaching/learning. We also calculated means and standard deviations for time spent on specific tasks within each category. We compared the amount of time spent on the top three categories using analysis of variance. The largest proportion of intern time was spent in clinical computer work (40 %). Thirty percent of time was spent on non-patient communication. Only 12 % of intern time was spent at the bedside. Downtime activities, transit and teaching/learning accounted for 11 %, 5 % and 2 % of intern time, respectively. Our results suggest that during on call periods, relatively small amounts of time are spent on direct patient care and teaching/learning activities. As intern duty hours continue to decrease, attention should be directed towards preserving time with patients and increasing time in education.

  16. Many-integrated core (MIC) technology for accelerating Monte Carlo simulation of radiation transport: A study based on the code DPM

    NASA Astrophysics Data System (ADS)

    Rodriguez, M.; Brualla, L.

    2018-04-01

    Monte Carlo simulation of radiation transport is computationally demanding to obtain reasonably low statistical uncertainties of the estimated quantities. Therefore, it can benefit in a large extent from high-performance computing. This work is aimed at assessing the performance of the first generation of the many-integrated core architecture (MIC) Xeon Phi coprocessor with respect to that of a CPU consisting of a double 12-core Xeon processor in Monte Carlo simulation of coupled electron-photonshowers. The comparison was made twofold, first, through a suite of basic tests including parallel versions of the random number generators Mersenne Twister and a modified implementation of RANECU. These tests were addressed to establish a baseline comparison between both devices. Secondly, through the p DPM code developed in this work. p DPM is a parallel version of the Dose Planning Method (DPM) program for fast Monte Carlo simulation of radiation transport in voxelized geometries. A variety of techniques addressed to obtain a large scalability on the Xeon Phi were implemented in p DPM. Maximum scalabilities of 84 . 2 × and 107 . 5 × were obtained in the Xeon Phi for simulations of electron and photon beams, respectively. Nevertheless, in none of the tests involving radiation transport the Xeon Phi performed better than the CPU. The disadvantage of the Xeon Phi with respect to the CPU owes to the low performance of the single core of the former. A single core of the Xeon Phi was more than 10 times less efficient than a single core of the CPU for all radiation transport simulations.

  17. Research on fast Fourier transforms algorithm of huge remote sensing image technology with GPU and partitioning technology.

    PubMed

    Yang, Xue; Li, Xue-You; Li, Jia-Guo; Ma, Jun; Zhang, Li; Yang, Jan; Du, Quan-Ye

    2014-02-01

    Fast Fourier transforms (FFT) is a basic approach to remote sensing image processing. With the improvement of capacity of remote sensing image capture with the features of hyperspectrum, high spatial resolution and high temporal resolution, how to use FFT technology to efficiently process huge remote sensing image becomes the critical step and research hot spot of current image processing technology. FFT algorithm, one of the basic algorithms of image processing, can be used for stripe noise removal, image compression, image registration, etc. in processing remote sensing image. CUFFT function library is the FFT algorithm library based on CPU and FFTW. FFTW is a FFT algorithm developed based on CPU in PC platform, and is currently the fastest CPU based FFT algorithm function library. However there is a common problem that once the available memory or memory is less than the capacity of image, there will be out of memory or memory overflow when using the above two methods to realize image FFT arithmetic. To address this problem, a CPU and partitioning technology based Huge Remote Fast Fourier Transform (HRFFT) algorithm is proposed in this paper. By improving the FFT algorithm in CUFFT function library, the problem of out of memory and memory overflow is solved. Moreover, this method is proved rational by experiment combined with the CCD image of HJ-1A satellite. When applied to practical image processing, it improves effect of the image processing, speeds up the processing, which saves the time of computation and achieves sound result.

  18. GPU: the biggest key processor for AI and parallel processing

    NASA Astrophysics Data System (ADS)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  19. What keeps family physicians busy in Portugal? A multicentre observational study of work other than direct patient contacts

    PubMed Central

    Granja, Mónica; Ponte, Carla; Cavadas, Luís Filipe

    2014-01-01

    Objectives To quantify the time spent by family physicians (FP) on tasks other than direct patient contact, to evaluate job satisfaction, to analyse the association between time spent on tasks and physician characteristics, the association between the number of tasks performed and physician characteristics and the association between time spent on tasks and job satisfaction. Design Cross-sectional, using time-and-motion techniques. Two workdays were documented by direct observation. A significance level of 0.05 was adopted. Setting Multicentric in 104 Portuguese family practices. Participants A convenience sample of FP, with lists of over 1000 patients, teaching senior medical students and first-year family medicine residents in 2012, was obtained. Of the 217 FP invited to participate, 155 completed the study. Main outcomes measured Time spent on tasks other than direct patient contact and on the performance of more than one task simultaneously, the number of direct patient contacts in the office, the number of indirect patient contacts, job satisfaction, demographic and professional characteristics associated with time spent on tasks and the number of different tasks performed, and the association between time spent on tasks and job satisfaction. Results FP (n=155) spent a mean of 143.6 min/day (95% CI 135.2 to 152.0) performing tasks such as prescription refills, teaching, meetings, management and communication with other professionals (33.4% of their workload). FP with larger patient lists spent less time on these tasks (p=0.002). Older FP (p=0.021) and those with larger lists (p=0.011) performed fewer tasks. The mean job satisfaction score was 3.5 (out of 5). No association was found between job satisfaction and time spent on tasks. Conclusions FP spent one-third of their workday in coordinating care, teaching and managing. Time devoted to these tasks decreases with increasing list size and physician age. PMID:24934208

  20. Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skinner, David; Verdier, Francesca; Anand, Harsh

    This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems.more » An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.« less

  1. Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks

    PubMed Central

    Naveros, Francisco; Garrido, Jesus A.; Carrillo, Richard R.; Ros, Eduardo; Luque, Niceto R.

    2017-01-01

    Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under increasing levels of neural complexity. PMID:28223930

  2. 5 CFR 551.426 - Time spent in charitable activities.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... working hours is not hours of work. Special Situations ... PAY ADMINISTRATION UNDER THE FAIR LABOR STANDARDS ACT Hours of Work Application of Principles in Relation to Other Activities § 551.426 Time spent in charitable activities. Time spent working for public...

  3. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

    DOE PAGES

    Lyakh, Dmitry I.

    2015-01-05

    An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less

  4. Time Spent Outdoors, Depressive Symptoms, and Variation by Race and Ethnicity.

    PubMed

    Beyer, Kirsten M M; Szabo, Aniko; Nattinger, Ann B

    2016-09-01

    Numerous studies have explored neighborhood environmental correlates of mental illnesses, presuming that the time individuals spend in their environment can confer benefit or harm based on environmental characteristics. However, few population-based studies have directly examined the relationship between time spent outdoors and mental health, and little work has been done to explore how experiences differ by race and ethnicity. Though some have proposed "doses of outdoor time" to improve health, the absence of information about the benefits conferred by particular "doses," and expected baseline levels of outdoor time, are needed to inform the development of recommendations and interventions. This study examined the relationship between time spent outdoors and depression among a population-based sample of American adults, characterized current levels of time spent outdoors by race and ethnicity, and examined how the relationship between time spent outdoors and depression varies by race and ethnicity. Descriptive statistics and survey regression models were used to examine data from the National Health and Nutrition Examination Survey for 2009-2012. Findings provide evidence that time spent outdoors is associated with fewer depressive symptoms, but this benefit may not be equally distributed by race and ethnicity. Descriptive analyses also reveal differences in time spent outdoors among different racial and ethnic groups. Study findings support the notion that increasing time spent outdoors may result in mental health benefits. However, this study questions whether that benefit is experienced equally among different groups, particularly given differences in occupational experiences and environmental characteristics of neighborhoods. Copyright © 2016 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  5. Time Spent Walking and Risk of Diabetes in Japanese Adults: The Japan Public Health Center-Based Prospective Diabetes Study

    PubMed Central

    Kabeya, Yusuke; Goto, Atsushi; Kato, Masayuki; Matsushita, Yumi; Takahashi, Yoshihiko; Isogawa, Akihiro; Inoue, Manami; Mizoue, Tetsuya; Tsugane, Shoichiro; Kadowaki, Takashi; Noda, Mitsuhiko

    2016-01-01

    Background The association between time spent walking and risk of diabetes was investigated in a Japanese population-based cohort. Methods Data from the Japan Public Health Center-based Prospective Diabetes cohort were analyzed. The surveys of diabetes were performed at baseline and at the 5-year follow-up. Time spent walking per day was assessed using a self-reported questionnaire (<30 minutes, 30 minutes to <1 hour, 1 to <2 hours, or ≥2 hours). A cross-sectional analysis was performed among 26 488 adults in the baseline survey. Logistic regression was used to examine the association between time spent walking and the presence of unrecognized diabetes. We then performed a longitudinal analysis that was restricted to 11 101 non-diabetic adults who participated in both the baseline and 5-year surveys. The association between time spent walking and the incidence of diabetes during the 5 years was examined. Results In the cross-sectional analysis, 1058 participants had unrecognized diabetes. Those with time spent walking of <30 minutes per day had increased odds of having diabetes in relation to those with time spent walking of ≥2 hours (adjusted odds ratio [OR] 1.23; 95% CI, 1.02–1.48). In the longitudinal analysis, 612 participants developed diabetes during the 5 years of follow-up. However, a significant association between time spent walking and the incidence of diabetes was not observed. Conclusions Increased risk of diabetes was implied in those with time spent walking of <30 minutes per day, although the longitudinal analysis failed to show a significant result. PMID:26725285

  6. Time Investment in Drug Supply Problems by Flemish Community Pharmacies.

    PubMed

    De Weerdt, Elfi; Simoens, Steven; Casteels, Minne; Huys, Isabelle

    2017-01-01

    Introduction: Drug supply problems are a known problem for pharmacies. Community and hospital pharmacies do everything they can to minimize impact on patients. This study aims to quantify the time spent by Flemish community pharmacies on drug supply problems. Materials and Methods: During 18 weeks, employees of 25 community pharmacies filled in a template with the total time spent on drug supply problems. The template stated all the steps community pharmacies could undertake to manage drug supply problems. Results: Considering the median over the study period, the median time spent on drug supply problems was 25 min per week, with a minimum of 14 min per week and a maximum of 38 min per week. After calculating the median of each pharmacy, large differences were observed between pharmacies: about 25% spent less than 15 min per week and one-fifth spent more than 1 h per week. The steps on which community pharmacists spent most time are: (i) "check missing products from orders," (ii) "contact wholesaler/manufacturers regarding potential drug shortages," and (iii) "communicating to patients." These three steps account for about 50% of the total time spent on drug supply problems during the study period. Conclusion: Community pharmacies spend about half an hour per week on drug supply problems. Although 25 min per week does not seem that much, the time spent is not delineated and community pharmacists are constantly confronted with drug supply problems.

  7. Combined Effects of Time Spent in Physical Activity, Sedentary Behaviors and Sleep on Obesity and Cardio-Metabolic Health Markers: A Novel Compositional Data Analysis Approach

    PubMed Central

    Chastin, Sebastien F. M.; Palarea-Albaladejo, Javier; Dontje, Manon L.; Skelton, Dawn A.

    2015-01-01

    The associations between time spent in sleep, sedentary behaviors (SB) and physical activity with health are usually studied without taking into account that time is finite during the day, so time spent in each of these behaviors are codependent. Therefore, little is known about the combined effect of time spent in sleep, SB and physical activity, that together constitute a composite whole, on obesity and cardio-metabolic health markers. Cross-sectional analysis of NHANES 2005–6 cycle on N = 1937 adults, was undertaken using a compositional analysis paradigm, which accounts for this intrinsic codependence. Time spent in SB, light intensity (LIPA) and moderate to vigorous activity (MVPA) was determined from accelerometry and combined with self-reported sleep time to obtain the 24 hour time budget composition. The distribution of time spent in sleep, SB, LIPA and MVPA is significantly associated with BMI, waist circumference, triglycerides, plasma glucose, plasma insulin (all p<0.001), and systolic (p<0.001) and diastolic blood pressure (p<0.003), but not HDL or LDL. Within the composition, the strongest positive effect is found for the proportion of time spent in MVPA. Strikingly, the effects of MVPA replacing another behavior and of MVPA being displaced by another behavior are asymmetric. For example, re-allocating 10 minutes of SB to MVPA was associated with a lower waist circumference by 0.001% but if 10 minutes of MVPA is displaced by SB this was associated with a 0.84% higher waist circumference. The proportion of time spent in LIPA and SB were detrimentally associated with obesity and cardiovascular disease markers, but the association with SB was stronger. For diabetes risk markers, replacing SB with LIPA was associated with more favorable outcomes. Time spent in MVPA is an important target for intervention and preventing transfer of time from LIPA to SB might lessen the negative effects of physical inactivity. PMID:26461112

  8. 45 CFR 1635.3 - Timekeeping requirement.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... out in accordance with 45 CFR part 1630. (b) Time spent by attorneys and paralegals must be documented by time records which record the amount of time spent on each case, matter, or supporting activity... which compensation is paid by the recipient. (2) Each record of time spent must contain: for a case, a...

  9. Contact to Nature Benefits Health: Mixed Effectiveness of Different Mechanisms.

    PubMed

    Hofmann, Mathias; Young, Christopher; Binz, Tina M; Baumgartner, Markus R; Bauer, Nicole

    2017-12-25

    How can urban nature contribute to the reduction of chronic stress? We twice measured the concentration of the "stress hormone" cortisol in the hair of 85 volunteer gardeners (six months apart), relating cortisol level change to (self-reported) characteristics of their recreational activities. Both time spent in nature and physical activity led to decreases in cortisol, while time spent being idle led to an increase. At high levels of present stressors, however, the relationship for time spent in nature and for idleness was reversed. Time spent with social interaction had no effect on cortisol levels. Our results indicate that physical activity is an effective means of mitigating the negative effects of chronic stress. The results regarding the time spent in nature and time spent being idle are less conclusive, suggesting the need for more research. We conclude that if chronic stress cannot be abolished by eradicating its sources, public health may take to measures to reduce it-providing urban nature being one effective possibility.

  10. Contact to Nature Benefits Health: Mixed Effectiveness of Different Mechanisms

    PubMed Central

    2017-01-01

    How can urban nature contribute to the reduction of chronic stress? We twice measured the concentration of the “stress hormone” cortisol in the hair of 85 volunteer gardeners (six months apart), relating cortisol level change to (self-reported) characteristics of their recreational activities. Both time spent in nature and physical activity led to decreases in cortisol, while time spent being idle led to an increase. At high levels of present stressors, however, the relationship for time spent in nature and for idleness was reversed. Time spent with social interaction had no effect on cortisol levels. Our results indicate that physical activity is an effective means of mitigating the negative effects of chronic stress. The results regarding the time spent in nature and time spent being idle are less conclusive, suggesting the need for more research. We conclude that if chronic stress cannot be abolished by eradicating its sources, public health may take to measures to reduce it—providing urban nature being one effective possibility. PMID:29295586

  11. Time spent on home food preparation and indicators of healthy eating.

    PubMed

    Monsivais, Pablo; Aggarwal, Anju; Drewnowski, Adam

    2014-12-01

    The amount of time spent on food preparation and cooking may have implications for diet quality and health. However, little is known about how food-related time use relates to food consumption and spending, either at restaurants or for food consumed at home. To quantitatively assess the associations among the amount of time habitually spent on food preparation and patterns of self-reported food consumption, food spending, and frequency of restaurant use. This was a cross-sectional study of 1,319 adults in a population-based survey conducted in 2008-2009. The sample was stratified into those who spent <1 hour/day, 1-2 hours/day, and >2 hours/day on food preparation and cleanup. Descriptive statistics and multivariable regression models examined differences between time-use groups. Analyses were conducted in 2011-2013. Individuals who spent the least amount of time on food preparation tended to be working adults who placed a high priority on convenience. Greater amount of time spent on home food preparation was associated with indicators of higher diet quality, including significantly more frequent intake of vegetables, salads, fruits, and fruit juices. Spending <1 hour/day on food preparation was associated with significantly more money spent on food away from home and more frequent use of fast food restaurants compared to those who spent more time on food preparation. The findings indicate that time might be an essential ingredient in the production of healthier eating habits among adults. Further research should investigate the determinants of spending time on food preparation. Copyright © 2014 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.

  12. Application of high-performance computing to numerical simulation of human movement

    NASA Technical Reports Server (NTRS)

    Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.

    1995-01-01

    We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.

  13. Disk-based k-mer counting on a PC

    PubMed Central

    2013-01-01

    Background The k-mer counting problem, which is to build the histogram of occurrences of every k-symbol long substring in a given text, is important for many bioinformatics applications. They include developing de Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection. Results We propose a simple, yet efficient, parallel disk-based algorithm for counting k-mers. Experiments show that it usually offers the fastest solution to the considered problem, while demanding a relatively small amount of memory. In particular, it is capable of counting the statistics for short-read human genome data, in input gzipped FASTQ file, in less than 40 minutes on a PC with 16 GB of RAM and 6 CPU cores, and for long-read human genome data in less than 70 minutes. On a more powerful machine, using 32 GB of RAM and 32 CPU cores, the tasks are accomplished in less than half the time. No other algorithm for most tested settings of this problem and mammalian-size data can accomplish this task in comparable time. Our solution also belongs to memory-frugal ones; most competitive algorithms cannot efficiently work on a PC with 16 GB of memory for such massive data. Conclusions By making use of cheap disk space and exploiting CPU and I/O parallelism we propose a very competitive k-mer counting procedure, called KMC. Our results suggest that judicious resource management may allow to solve at least some bioinformatics problems with massive data on a commodity personal computer. PMID:23679007

  14. Effect of Fiber Orientation on Dynamic Compressive Properties of an Ultra-High Performance Concrete

    DTIC Science & Technology

    2017-08-01

    measurements for LSFfiberOrient function for multiple cores. Elapsed time is the total time taken to run ; CPU time is the number of cores times the...Superscripts Maximum value during a test Measured value from a calibration run ...movement left or right. Before cutting, the Cor-Tuf Baseline beam was placed on the table and squared with the blade . The blade was then moved into

  15. Rt-Space: A Real-Time Stochastically-Provisioned Adaptive Container Environment

    DTIC Science & Technology

    2017-08-04

    SECURITY CLASSIFICATION OF: This project was directed at component-based soft real- time (SRT) systems implemented on multicore platforms. To facilitate...upon average-case or near- average-case task execution times . The main intellectual contribution of this project was the development of methods for...allocating CPU time to components and associated analysis for validating SRT correctness. 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND SUBTITLE 13

  16. Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

    NASA Astrophysics Data System (ADS)

    Jiang, Jinpeng; Zhu, Peimin

    2018-05-01

    Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.

  17. Fidelity Optimization of Microprocessor System Simulations.

    DTIC Science & Technology

    1981-03-01

    effort feasible in terms of required CPU time would be to employ a separate clock with an artificially compressed time base in the serial...RETURN ILINCR -NU𔃾OPS D.% PROt.ESSING 900 IF IIERP2.NF.41 GO TO 1000 IFRCOD - L CALL VAIRCO 1A(61,NUMVALLEPCOOl IEPRZ -IEACCO IF hEARR .GT. 01 RETURN I

  18. Evaluation of Time Spent by Pharmacists and Nurses Based on the Location of Pharmacist Involvement in Medication History Collection.

    PubMed

    Chhabra, Anmol; Quinn, Andrea; Ries, Amanda

    2018-01-01

    Accurate history collection is integral to medication reconciliation. Studies support pharmacy involvement in the process, but assessment of global time spent is limited. The authors hypothesized the location of a medication-focused interview would impact time spent. The objective was to compare time spent by pharmacists and nurses based on the location of a medication-focused interview. Time spent by the interviewing pharmacist, admitting nurse, and centralized pharmacist verifying admission orders was collected. Patient groups were based on whether the interview was conducted in the emergency department (ED) or medical floor. The primary end point was a composite of the 3 time points. Secondary end points were individual time components and number and types of transcription discrepancies identified during medical floor interviews. Pharmacists and nurses spent an average of ten fewer minutes per ED patient versus a medical floor patient ( P = .028). Secondary end points were not statistically significant. Transcription discrepancies were identified at a rate of 1 in 4 medications. Post hoc analysis revealed the time spent by pharmacists and nurses was 2.4 minutes shorter per medication when interviewed in the ED ( P < .001). The primary outcome was statistically and clinically significant. Limitations included inability to blind and lack of cost-saving analysis. Pharmacist involvement in ED medication reconciliation leads to time savings during the admission process.

  19. The Relationship between Five Aspects of the Home Environment and Students Reading above Grade Level.

    ERIC Educational Resources Information Center

    Wynstra, Jennifer E.

    A study investigated five aspects of the home environment (time spent viewing television, time spent doing homework, time involved in recreational reading, time spent with a non-parental caregiver, and bedtime) of first- through fifth-grade students to see if any common experience existed among those students reading above grade level. Subjects…

  20. Generational Differences in Children's Externalizing Behavior Problems

    PubMed Central

    Hofferth, Sandra L.

    2016-01-01

    This study examines the effects of time spent with parents and peers on generational differences in children's externalizing behavior problems in immigrant families. Using the Child Development Supplement and Time Diaries from the Panel Study of Income Dynamics, we found that first and second generation children exhibited fewer externalizing behavior problems than did third generation children, despite their lower socioeconomic status. First and second generation children spent more time with either one or both parents, and less time with peers, on the weekend day than did third generation children. We found a marginal but beneficial effect of time spent with fathers on the weekday, but not on the weekend day. The implications are that time spent with fathers on weekdays differs from time spent with fathers on the weekend, and that promoting immigrant father involvement on the weekday through school or community programs could benefit immigrant children. PMID:27350766

  1. 5 CFR 337.101 - Rating applicants.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... factor in determining eligibility, OPM shall credit a preference eligible with: (1) Time spent in the military service (i) as an extension of time spent in the position in which he was employed immediately... military service, or (iii) as a combination of both methods. OPM shall credit time spent in the military...

  2. 5 CFR 337.101 - Rating applicants.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... factor in determining eligibility, OPM shall credit a preference eligible with: (1) Time spent in the military service (i) as an extension of time spent in the position in which he was employed immediately... military service, or (iii) as a combination of both methods. OPM shall credit time spent in the military...

  3. 5 CFR 337.101 - Rating applicants.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... factor in determining eligibility, OPM shall credit a preference eligible with: (1) Time spent in the military service (i) as an extension of time spent in the position in which he was employed immediately... military service, or (iii) as a combination of both methods. OPM shall credit time spent in the military...

  4. 5 CFR 337.101 - Rating applicants.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... factor in determining eligibility, OPM shall credit a preference eligible with: (1) Time spent in the military service (i) as an extension of time spent in the position in which he was employed immediately... military service, or (iii) as a combination of both methods. OPM shall credit time spent in the military...

  5. 5 CFR 337.101 - Rating applicants.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... factor in determining eligibility, OPM shall credit a preference eligible with: (1) Time spent in the military service (i) as an extension of time spent in the position in which he was employed immediately... military service, or (iii) as a combination of both methods. OPM shall credit time spent in the military...

  6. The effects of employment status and daily stressors on time spent on daily household chores in middle-aged and older adults.

    PubMed

    Wong, Jen D; Almeida, David M

    2013-02-01

    This study examines how employment status (worker vs. retiree) and life course influences (age, gender, and marital status) are associated with time spent on daily household chores. Second, this study assesses whether the associations between daily stressors and time spent on daily household chores differ as a function of employment status and life course influences. Men and women aged 55-74 from the National Study of Daily Experiences (N = 268; 133 workers and 135 retirees), a part of the National Survey of Midlife in the United States (MIDUS), completed telephone interviews regarding their daily experiences across 8 consecutive evenings. Working women spent more than double the amount of time on daily household chores than working men. Unmarried retirees spent the most time on daily household chores in comparison to their counterparts. There was a trend toward significance for the association between home stressors from the previous day and time spent on daily household chores as a function of employment and marital status. These findings highlight the importance of gender and marital status in the associations between employment status and time spent on daily household chores and the role that daily stressors, in particular home stressful events, have on daily household chore participation.

  7. After-school time use in Taiwan: effects on educational achievement and well-being.

    PubMed

    Chen, Su Yen; Lu, Luo

    2009-01-01

    Western studies have linked adolescents' time spent on homework, structured activities, various kinds of leisure involvement, and part-time employment with their academic achievement and psychological adjustment, but little is known about the after-school pursuits of Chinese students and their associations with adolescents' development. Using a nationally representative sample in Taiwan, this study investigated how time spent on nine after-school activities during the eleventh grade helped predict educational achievement and depression symptoms during the twelfth grade, in addition to previous achievement and depression level and background variables. The findings of this study confirmed and extended the extant literature that time spent on homework, after-class academic-enrichment programs, and private cram schools positively affected adolescents' educational achievement; however, time spent on private cram schools was negatively associated with their psychological well-being. In addition, inconsistent with the findings of many Western studies, this study's results did not support a positive effect of participating in school-based extracurricular activities on educational achievement and psychological well-being. Finally, time spent on working part-time and watching TV was found to be detrimental to achievement, but time spent playing Internet games appeared to be negatively associated with depression symptoms.

  8. New core-reflector boundary conditions for transient nodal reactor calculations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, E.K.; Kim, C.H.; Joo, H.K.

    1995-09-01

    New core-reflector boundary conditions designed for the exclusion of the reflector region in transient nodal reactor calculations are formulated. Spatially flat frequency approximations for the temporal neutron behavior and two types of transverse leakage approximations in the reflector region are introduced to solve the transverse-integrated time-dependent one-dimensional diffusion equation and then to obtain relationships between net current and flux at the core-reflector interfaces. To examine the effectiveness of new core-reflector boundary conditions in transient nodal reactor computations, nodal expansion method (NEM) computations with and without explicit representation of the reflector are performed for Laboratorium fuer Reaktorregelung und Anlagen (LRA) boilingmore » water reactor (BWR) and Nuclear Energy Agency Committee on Reactor Physics (NEACRP) pressurized water reactor (PWR) rod ejection kinetics benchmark problems. Good agreement between two NEM computations is demonstrated in all the important transient parameters of two benchmark problems. A significant amount of CPU time saving is also demonstrated with the boundary condition model with transverse leakage (BCMTL) approximations in the reflector region. In the three-dimensional LRA BWR, the BCMTL and the explicit reflector model computations differ by {approximately}4% in transient peak power density while the BCMTL results in >40% of CPU time saving by excluding both the axial and the radial reflector regions from explicit computational nodes. In the NEACRP PWR problem, which includes six different transient cases, the largest difference is 24.4% in the transient maximum power in the one-node-per-assembly B1 transient results. This difference in the transient maximum power of the B1 case is shown to reduce to 11.7% in the four-node-per-assembly computations. As for the computing time, BCMTL is shown to reduce the CPU time >20% in all six transient cases of the NEACRP PWR.« less

  9. Thirteen Hundred and Thirty Days. A Pilot Study of Teacher Time in Key Stage 1. Final Report.

    ERIC Educational Resources Information Center

    Campbell, R. J.; Neill, S. St. J.

    Ninety-five teachers in Key Stage 1 in England and Wales completed a questionnaire and records of time spent on work over a period of 14 consecutive days, resulting in detailed records of 1,330 days of teachers' time. The data are analyzed in terms of overall time spent on work; time distribution; and time spent specifically on teaching,…

  10. Ground Shock Effects from Accidental Explosions

    DTIC Science & Technology

    1976-11-01

    1,200 P0 A = V P cp 8 Horizontal Dh = Dv tannin " 1 (cp/U)] Vh = Vv tan [sin" 1 (cp/U)] \\ - \\ tanfainŕ (cp/U)] For tan sin (c /U...explosive are not included in the present analysis . This effect will limit the credibility of the direct- induced ground shock predictions, but if the... analysis . Dr. D. R. Richmond of Lovelace Foundation provided data on human shock tolerances. 26 REFERENCES 1. "Structures to Resist the Effects of

  11. Sedentary behaviours among Australian adolescents.

    PubMed

    Hardy, Louise L; Dobbins, Timothy; Booth, Michael L; Denney-Wilson, Elizabeth; Okely, Anthony D

    2006-12-01

    To describe the prevalence and distribution (by demographic characteristics and body mass index [BMI] category) of sedentary behaviour among Australian adolescents aged 11-15 years. Cross-sectional representative population survey of school students (n = 2,750) in New South Wales, conducted in 2004. Students' self-reported time spent during a usual week in five categories of sedentary behaviour (small screen recreation [SSR], education, cultural, social and non-active travel). Height and weight were measured. Grade 6, 8 and 10 students spent approximately 34 hours, 41 hours and 45 hours/week of their discretionary time, respectively, engaged in sedentary behaviour. Urban students and students from Asian-speaking backgrounds spent significantly more time sedentary than students from rural areas or other cultural backgrounds. SSR accounted for 60% and 54% of sedentary behaviour among primary and high school students, respectively. Overweight and obese students spent more time in SSR than healthy weight students. Out-of-school hours educational activities accounted for approximately 20% of sedentary behaviour and increased with age. Girls spent twice the time in social activities compared with boys. Time spent in cultural activities declined with age. Sedentary behaviours among young people differ according to sex, age and cultural background. At least half of all time spent in sedentary behaviours was spent engaged in SSR. BMI was significantly associated with sedentary behaviour among some children, but not consistently across age groups. A clear understanding of young people's patterns of sedentary behaviour is required to develop effective and sustainable intervention programs to promote healthy living.

  12. Time Investment in Drug Supply Problems by Flemish Community Pharmacies

    PubMed Central

    De Weerdt, Elfi; Simoens, Steven; Casteels, Minne; Huys, Isabelle

    2017-01-01

    Introduction: Drug supply problems are a known problem for pharmacies. Community and hospital pharmacies do everything they can to minimize impact on patients. This study aims to quantify the time spent by Flemish community pharmacies on drug supply problems. Materials and Methods: During 18 weeks, employees of 25 community pharmacies filled in a template with the total time spent on drug supply problems. The template stated all the steps community pharmacies could undertake to manage drug supply problems. Results: Considering the median over the study period, the median time spent on drug supply problems was 25 min per week, with a minimum of 14 min per week and a maximum of 38 min per week. After calculating the median of each pharmacy, large differences were observed between pharmacies: about 25% spent less than 15 min per week and one-fifth spent more than 1 h per week. The steps on which community pharmacists spent most time are: (i) “check missing products from orders,” (ii) “contact wholesaler/manufacturers regarding potential drug shortages,” and (iii) “communicating to patients.” These three steps account for about 50% of the total time spent on drug supply problems during the study period. Conclusion: Community pharmacies spend about half an hour per week on drug supply problems. Although 25 min per week does not seem that much, the time spent is not delineated and community pharmacists are constantly confronted with drug supply problems. PMID:28878679

  13. The effect of the 16-hour intern workday restriction on surgical residents' in-hospital activities.

    PubMed

    Dennis, Bradley M; Long, Eric L; Zamperini, Katherine M; Nakayama, Don K

    2013-01-01

    To observe the effects of the 2011 Accreditation Council on Graduate Medical Education 16-hour intern workday restrictions on surgical residents' clinical and educational activities. All the residents recorded the following weekly in-hospital activities during February and March 2011 (year before intern work restrictions) and 2012 (first year under new requirements): operating room (OR) and clinic; bedside procedures; rounds and ward work; on-call duties in hospital; communication (e.g., checkouts and family and patient discussions); education (conferences and study); and personal (rest and meals). Descriptive statistics were calculated in 3 resident groups (interns, first postgraduate year [PGY1]; junior, PGY2 and 3; and senior, PGY4 and 5). The unpaired t test was used to compare data between 2011 and 2012; significance was set at p< 0.05. Medical school affiliated hospital. Categorical resident trainees in surgery, PGY1-5, 4 residents per level, with all 20 residents participating in the study. From 2011 to 2012, time spent in the hospital by the intern did not change (all results in h/wk, mean±standard deviation: 68.5±13.8 to 72.8±15.8, respectively) but the time devoted to specific activities changed significantly. In-hospital personal time decreased by 50% (5.3±4.6 to 2.6±2.0, p = 0.004). Interns spent less time placing central lines (2.1±2.2 to 0.9±1.2, p = 0.006) and more on rounds (8.8±8.8 to 14.2±9.8, p = 0.027), which included supervision with upper level residents. There was no change in the total time spent in the OR, the clinic, performing bedside procedures, and educational activities. Changes in intern work did not affect the time junior and senior residents spent on bedside procedures, time spent in the clinic, and total time spent in the hospital. In 2012, junior residents spent less time in educational activities (11.4±8.5 to 7.0±4.5, p = 0.0007) and the seniors spent more time in the OR (13.7±7.5 to 20.6±10.7, p = 0.0002). The 16-hour restriction preserved interns' educational activities and time spent in the OR and clinic, but changed resident work activities at all levels. The time spent on rounds increased, time spent by the juniors on conferences decreased, and time spent by senior residents in the OR increased. Duty restrictions in general and intern supervision requirements demand ongoing adjustments in resident work schedules. Copyright © 2013 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  14. GPU-Accelerated Voxelwise Hepatic Perfusion Quantification

    PubMed Central

    Wang, H; Cao, Y

    2012-01-01

    Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645

  15. Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM

    NASA Astrophysics Data System (ADS)

    de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl

    2002-03-01

    We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.

  16. SU-F-BRD-02: Application of ARCHERRT-- A GPU-Based Monte Carlo Dose Engine for Radiation Therapy -- to Tomotherapy and Patient-Independent IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Su, L; Du, X; Liu, T

    Purpose: As a module of ARCHER -- Accelerated Radiation-transport Computations in Heterogeneous EnviRonments, ARCHER{sub RT} is designed for RadioTherapy (RT) dose calculation. This paper describes the application of ARCHERRT on patient-dependent TomoTherapy and patient-independent IMRT. It also conducts a 'fair' comparison of different GPUs and multicore CPU. Methods: The source input used for patient-dependent TomoTherapy is phase space file (PSF) generated from optimized plan. For patient-independent IMRT, the open filed PSF is used for different cases. The intensity modulation is simulated by fluence map. The GEANT4 code is used as benchmark. DVH and gamma index test are employed to evaluatemore » the accuracy of ARCHER{sub RT} code. Some previous studies reported misleading speedups by comparing GPU code with serial CPU code. To perform a fairer comparison, we write multi-thread code with OpenMP to fully exploit computing potential of CPU. The hardware involved in this study are a 6-core Intel E5-2620 CPU and 6 NVIDIA M2090 GPUs, a K20 GPU and a K40 GPU. Results: Dosimetric results from ARCHER{sub RT} and GEANT4 show good agreement. The 2%/2mm gamma test pass rates for different clinical cases are 97.2% to 99.7%. A single M2090 GPU needs 50~79 seconds for the simulation to achieve a statistical error of 1% in the PTV. The K40 card is about 1.7∼1.8 times faster than M2090 card. Using 6 M2090 card, the simulation can be finished in about 10 seconds. For comparison, Intel E5-2620 needs 507∼879 seconds for the same simulation. Conclusion: We successfully applied ARCHER{sub RT} to Tomotherapy and patient-independent IMRT, and conducted a fair comparison between GPU and CPU performance. The ARCHER{sub RT} code is both accurate and efficient and may be used towards clinical applications.« less

  17. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    NASA Astrophysics Data System (ADS)

    Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

    2014-12-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU-GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.

  18. Which factors predict the time spent answering queries to a drug information centre?

    PubMed Central

    Reppe, Linda A.; Spigset, Olav

    2010-01-01

    Objective To develop a model based upon factors able to predict the time spent answering drug-related queries to Norwegian drug information centres (DICs). Setting and method Drug-related queries received at 5 DICs in Norway from March to May 2007 were randomly assigned to 20 employees until each of them had answered a minimum of five queries. The employees reported the number of drugs involved, the type of literature search performed, and whether the queries were considered judgmental or not, using a specifically developed scoring system. Main outcome measures The scores of these three factors were added together to define a workload score for each query. Workload and its individual factors were subsequently related to the measured time spent answering the queries by simple or multiple linear regression analyses. Results Ninety-six query/answer pairs were analyzed. Workload significantly predicted the time spent answering the queries (adjusted R2 = 0.22, P < 0.001). Literature search was the individual factor best predicting the time spent answering the queries (adjusted R2 = 0.17, P < 0.001), and this variable also contributed the most in the multiple regression analyses. Conclusion The most important workload factor predicting the time spent handling the queries in this study was the type of literature search that had to be performed. The categorisation of queries as judgmental or not, also affected the time spent answering the queries. The number of drugs involved did not significantly influence the time spent answering drug information queries. PMID:20922480

  19. Relation of Adolescent Video Game Play to Time Spent in Other Activities

    PubMed Central

    Cummings, Hope M.; Vandewater, Elizabeth A.

    2017-01-01

    Objective To examine the notion that playing video games is negatively related to the time adolescents spend in more developmentally appropriate activities. Design Nonexperimental study. Setting Survey data collected during the 2002–2003 school year. Participants A nationally representative sample of 1491 children aged 10 to 19 years. Main Outcome Measure Twenty-four–hour time-use diaries were collected on 1 weekday and 1 weekend day, both randomly chosen. Time-use diaries were used to determine adolescents’ time spent playing video games, with parents and friends, reading and doing homework, and in sports and active leisure. Results Differences in time spent between game players and nonplayers as well as the magnitude of the relationships among game time and activity time among adolescent game players were assessed. Thirty-six percent of adolescents (80% of boys and 20% of girls) played video games. On average, gamers played for an hour on the weekdays and an hour and a half on the weekends. Compared with nongamers, adolescent gamers spent 30% less time reading and 34% less time doing homework. Among gamers (both genders), time spent playing video games without parents or friends was negatively related to time spent with parents and friends in other activities. Conclusions Although gamers and nongamers did not differ in the amount of time they spent interacting with family and friends, concerns regarding gamers’ neglect of school responsibilities (reading and homework) are warranted. Although only a small percentage of girls played video games, our findings suggest that playing video games may have different social implications for girls than for boys. PMID:17606832

  20. It takes longer than you think: librarian time spent on systematic review tasks*

    PubMed Central

    Bullers, Krystal; Howard, Allison M.; Hanson, Ardis; Kearns, William D.; Orriola, John J.; Polo, Randall L.; Sakmar, Kristen A.

    2018-01-01

    Introduction The authors examined the time that medical librarians spent on specific tasks for systematic reviews (SRs): interview process, search strategy development, search strategy translation, documentation, deliverables, search methodology writing, and instruction. We also investigated relationships among the time spent on SR tasks, years of experience, and number of completed SRs to gain a better understanding of the time spent on SR tasks from time, staffing, and project management perspectives. Methods A confidential survey and study description were sent to medical library directors who were members of the Association of Academic Health Sciences Libraries as well as librarians serving members of the Association of American Medical Colleges or American Osteopathic Association. Results Of the 185 participants, 143 (77%) had worked on an SR within the last 5 years. The number of SRs conducted by participants during their careers ranged from 1 to 500, with a median of 5. The major component of time spent was on search strategy development and translation. Average aggregated time for standard tasks was 26.9 hours, with a median of 18.5 hours. Task time was unrelated to the number of SRs but was positively correlated with years of SR experience. Conclusion The time required to conduct the librarian’s discrete tasks in an SR varies substantially, and there are no standard time frames. Librarians with more SR experience spent more time on instruction and interviews; time spent on all other tasks varied widely. Librarians also can expect to spend a significant amount of their time on search strategy development, translation, and writing. PMID:29632442

  1. The VLBA correlator: Real-time in the distributed era

    NASA Technical Reports Server (NTRS)

    Wells, D. C.

    1992-01-01

    The correlator is the signal processing engine of the Very Long Baseline Array (VLBA). Radio signals are recorded on special wideband (128 Mb/s) digital recorders at the 10 telescopes, with sampling times controlled by hydrogen maser clocks. The magnetic tapes are shipped to the Array Operations Center in Socorro, New Mexico, where they are played back simultaneously into the correlator. Real-time software and firmware controls the playback drives to achieve synchronization, compute models of the wavefront delay, control the numerous modules of the correlator, and record FITS files of the fringe visibilities at the back-end of the correlator. In addition to the more than 3000 custom VLSI chips which handle the massive data flow of the signal processing, the correlator contains a total of more than 100 programmable computers, 8-, 16- and 32-bit CPUs. Code is downloaded into front-end CPU's dependent on operating mode. Low-level code is assembly language, high-level code is C running under a RT OS. We use VxWorks on Motorola MVME147 CPU's. Code development is on a complex of SPARC workstations connected to the RT CPU's by Ethernet. The overall management of the correlation process is dependent on a database management system. We use Ingres running on a Sparcstation-2. We transfer logging information from the database of the VLBA Monitor and Control System to our database using Ingres/NET. Job scripts are computed and are transferred to the real-time computers using NFS, and correlation job execution logs and status flow back by the route. Operator status and control displays use windows on workstations, interfaced to the real-time processes by network protocols. The extensive network protocol support provided by VxWorks is invaluable. The VLBA Correlator's dependence on network protocols is an example of the radical transformation of the real-time world over the past five years. Real-time is becoming more like conventional computing. Paradoxically, 'conventional' computing is also adopting practices from the real-time world: semaphores, shared memory, light-weight threads, and concurrency. This appears to be a convergence of thinking.

  2. 48 CFR 852.271-72 - Time spent by counselee in counseling process.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... counseling process. 852.271-72 Section 852.271-72 Federal Acquisition Regulations System DEPARTMENT OF... Clauses 852.271-72 Time spent by counselee in counseling process. As prescribed in 871.212, insert the following clause: Time Spent by Counselee in Counseling Process (APR 1984) The contractor agrees that no...

  3. 48 CFR 852.271-72 - Time spent by counselee in counseling process.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... counseling process. 852.271-72 Section 852.271-72 Federal Acquisition Regulations System DEPARTMENT OF... Clauses 852.271-72 Time spent by counselee in counseling process. As prescribed in 871.212, insert the following clause: Time Spent by Counselee in Counseling Process (APR 1984) The contractor agrees that no...

  4. 48 CFR 852.271-72 - Time spent by counselee in counseling process.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... counseling process. 852.271-72 Section 852.271-72 Federal Acquisition Regulations System DEPARTMENT OF... Clauses 852.271-72 Time spent by counselee in counseling process. As prescribed in 871.212, insert the following clause: Time Spent by Counselee in Counseling Process (APR 1984) The contractor agrees that no...

  5. 48 CFR 852.271-72 - Time spent by counselee in counseling process.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... counseling process. 852.271-72 Section 852.271-72 Federal Acquisition Regulations System DEPARTMENT OF... Clauses 852.271-72 Time spent by counselee in counseling process. As prescribed in 871.212, insert the following clause: Time Spent by Counselee in Counseling Process (APR 1984) The contractor agrees that no...

  6. 48 CFR 852.271-72 - Time spent by counselee in counseling process.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... counseling process. 852.271-72 Section 852.271-72 Federal Acquisition Regulations System DEPARTMENT OF... Clauses 852.271-72 Time spent by counselee in counseling process. As prescribed in 871.212, insert the following clause: Time Spent by Counselee in Counseling Process (APR 1984) The contractor agrees that no...

  7. 5 CFR 734.503 - Allocation and reimbursement of costs associated with political activities.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... employee covered under this subpart must apportion the costs of mixed travel based on the time spent on political activities and the time spent performing official duties. Prorating the cost of travel involves..., receptions, rallies, and similar activities. Time spent in actual travel, private study, or rest and...

  8. 5 CFR 551.412 - Preparatory or concluding activities.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... that activity, including the 10 minutes, as hours of work. (2) If the time spent in a preparatory or... employee to perform that activity. An employee shall be credited with the actual time spent in that..., and is indispensable to the performance of the principal activities, and that the total time spent in...

  9. Adolescent Depression and Time Spent with Parents and Siblings

    ERIC Educational Resources Information Center

    Desha, Laura N.; Nicholson, Jan M.; Ziviani, Jenny M.

    2011-01-01

    This study examines adolescent depressive symptoms and the quantity and quality of time spent by adolescents with their parents and siblings. We use measures of the quality of relationships with parents and siblings as proxy indicators for the quality of time spent with these social partners. The study emphasizes the salience of parent…

  10. Video Games, Adolescents, and the Displacement Effect

    ERIC Educational Resources Information Center

    Fisher, Carla Christine

    2012-01-01

    The displacement effect (the idea that time spent in one activity displaces time spent in other activities) was examined within the lens of adolescents' video game use and their time spent reading, doing homework, in physically active sports and activities, in creative play, and with parents and friends. Data were drawn from the Panel Study…

  11. Examining the architecture of cellular computing through a comparative study with a computer

    PubMed Central

    Wang, Degeng; Gribskov, Michael

    2005-01-01

    The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179

  12. Examining the architecture of cellular computing through a comparative study with a computer.

    PubMed

    Wang, Degeng; Gribskov, Michael

    2005-06-22

    The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.

  13. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    PubMed

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  14. Radiation hardened microprocessor for small payloads

    NASA Technical Reports Server (NTRS)

    Shah, Ravi

    1993-01-01

    The RH-3000 program is developing a rad-hard space qualified 32-bit MIPS R-3000 RISC processor under the Naval Research Lab sponsorship. In addition, under IR&D Harris is developing RHC-3000 for embedded control applications where low cost and radiation tolerance are primary concerns. The development program leverages heavily from commercial development of the MIPS R-3000. The commercial R-3000 has a large installed user base and several foundry partners are currently producing a wide variety of R-3000 derivative products. One of the MIPS derivative products, the LR33000 from LSI Logic, was used as the basis for the design of the RH-3000 chipset. The RH-3000 chipset consists of three core chips and two support chips. The core chips include the CPU, which is the R-3000 integer unit and the FPA/MD chip pair, which performs the R-3010 floating point functions. The two support whips contain all the support functions required for fault tolerance support, real-time support, memory management, timers, and other functions. The Harris development effort had first passed silicon success in June, 1992 with the first rad-hard 32-bit RH-3000 CPU chip. The CPU device is 30 kgates, has a 508 mil by 503 mil die size and is fabricated at Harris Semiconductor on the rad-hard CMOS Silicon on Sapphire (SOS) process. The CPU device successfully passed tesing against 600,000 test vectors derived directly on the LSI/MIPS test suite and has been operational as a single board computer running C code for the past year. In addition, the RH-3000 program has developed the methodology for converting commercially developed designs utilizing logic synthesis techniques based on a combination of VHDK and schematic data bases.

  15. Using all of your CPU's in HIPE

    NASA Astrophysics Data System (ADS)

    Jacobson, J. D.; Fadda, D.

    2012-09-01

    Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.

  16. Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

    PubMed

    Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

    2012-11-13

    The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.

  17. Optimal endothelialisation of a new compliant poly(carbonate-urea)urethane vascular graft with effect of physiological shear stress.

    PubMed

    Salacinski, H J; Tai, N R; Punshon, G; Giudiceandrea, A; Hamilton, G; Seifalian, A M

    2000-10-01

    to define the optimal seeding conditions of a new stress free poly(carbonate-urea)urethane (CPU) graft with compliance similar to that of human artery with honeycomb structure engineered during the manufacturing process to enhance adhesion and growth of endothelial cells. (111)Indium-oxine radiolabeled human umbilical vein endothelial cells (HUVEC) were seeded onto CPU grafts at (a) concentrations from 2-24x10(5)cells/cm(2)and (b) incubated for 0.5, 1, 2, 4 and 6 h. Following incubation, graft segments were subjected to three washing/gamma counting procedures and scanning electron microscopy (SEM). Cell viability was measured using a modified Alamar blue(TM)assay. To test physiological retention a pulsatile flow phantom was used to subject optimally seeded (16x10(5), 4 h) CPU grafts to arterial shear stress for 6 h with real time acquisition of scintigraphic images of seeded grafts using a nuclear medicine gamma camera system. the seeding efficiency of 54+/-13% post three washes was achieved using 16x10(5)cells/cm(2). Similarly in SEM micrographs a seeding density of 16x10(5)cells/cm(2)resulted in a confluent monolayer. Seeded CPU segments incubated for 4 h exhibited significantly higher resistance to wash-off than segments incubated for 30 min (p <0.05). Exposure of seeded grafts to pulsatile shear stress resulted in some cell loss with 67+/-3% of cells adherent following 6 h of perfusion with ongoing metabolic activity. Thus, optimal conditions were 16x10(5)cells/cm(2)at 4 h. the optimal seeding conditions have been defined for "tissue-engineered" vascular graft which allow complete endothelialisation and high cell-to-substrate strength that resists hydrodynamic stress. Copyright 2000 Harcourt Publishers Ltd.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cohen, J; Dossa, D; Gokhale, M

    Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less

  19. Symplectic multi-particle tracking on GPUs

    NASA Astrophysics Data System (ADS)

    Liu, Zhicong; Qiang, Ji

    2018-05-01

    A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.

  20. RXIO: Design and implementation of high performance RDMA-capable GridFTP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tian, Yuan; Yu, Weikuan; Vetter, Jeffrey S.

    2011-12-21

    For its low-latency, high bandwidth, and low CPU utilization, Remote Direct Memory Access (RDMA) has established itself as an effective data movement technology in many networking environments. However, the transport protocols of grid run-time systems, such as GridFTP in Globus, are not yet capable of utilizing RDMA. In this study, we examine the architecture of GridFTP for the feasibility of enabling RDMA. An RDMA-capable XIO (RXIO) framework is designed and implemented to extend its XIO system and match the characteristics of RDMA. Our experimental results demonstrate that RDMA can significantly improve the performance of GridFTP, reducing the latency by 32%more » and increasing the bandwidth by more than three times. In achieving such performance improvements, RDMA dramatically cuts down CPU utilization of GridFTP clients and servers. In conclusion, these results demonstrate that RXIO can effectively exploit the benefits of RDMA for GridFTP. It offers a good prototype to further leverage GridFTP on wide-area RDMA networks.« less

  1. The density matrix renormalization group algorithm on kilo-processor architectures: Implementation and trade-offs

    NASA Astrophysics Data System (ADS)

    Nemes, Csaba; Barcza, Gergely; Nagy, Zoltán; Legeza, Örs; Szolgay, Péter

    2014-06-01

    In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize the computing power residing in novel kilo-processor architectures. In the paper a smart hybrid CPU-GPU implementation is presented, which exploits the power of both CPU and GPU and tolerates problems exceeding the GPU memory size. Furthermore, a new CUDA kernel has been designed for asymmetric matrix-vector multiplication to accelerate the rest of the diagonalization. Besides the evaluation of the GPU implementation, the practical limits of an FPGA implementation are also discussed.

  2. Time Costs of Fertility Care: The Hidden Hardship of Building a Family

    PubMed Central

    Wu, Alex K; Elliott, Peter; Katz, Patricia P.; Smith, James F.

    2013-01-01

    Objective To determine the time infertile couples spend seeking and utilizing fertility care. Design Prospective cohort. Setting 8 community and academic infertility practices. Patients 319 couples presenting for a fertility evaluation. Interventions Face-to-face and telephone interviews and questionnaires. Main Outcome Measures Participants recorded diaries of time spent on provider visits, travel, telephone, and miscellaneous activities. Participants also recorded time off of work due to the physical and mental stress related to fertility care. Linear regression was used to assess relationship between fertility characteristics and time spent pursuing care. Results Diaries were completed by 319 subjects. Over an 18 month time period, the average time spent on fertility care was 125 hours, equating to 15.6 days, assuming an 8 hour work day. For couples utilizing cycle-based treatments (CBT), overall time spent pursuing care averaged 142 hours versus 58 hours for couples using other therapies, with the majority of time spent on provider visits (73 hours). After multivariable adjustment for clinical and sociodemographic characteristics, possessing a college degree and intensity of fertility treatment were independently associated with increased time spent pursuing fertility care. Furthermore, couples that spent the most time on care were significantly more likely to experience fertility related stress. Conclusions Over the course of 18 months of observation, couples pursuing fertility treatment dedicated large amounts of time to attaining their family building goals. This burden on couples adds to the already significant financial and emotional burdens of fertility treatment and provides new insight into the difficulties these couples face. PMID:23454007

  3. Computer hardware for radiologists: Part I

    PubMed Central

    Indrajit, IK; Alam, A

    2010-01-01

    Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration. PMID:21042437

  4. Fast and high-order numerical algorithms for the solution of multidimensional nonlinear fractional Ginzburg-Landau equation

    NASA Astrophysics Data System (ADS)

    Mohebbi, Akbar

    2018-02-01

    In this paper we propose two fast and accurate numerical methods for the solution of multidimensional space fractional Ginzburg-Landau equation (FGLE). In the presented methods, to avoid solving a nonlinear system of algebraic equations and to increase the accuracy and efficiency of method, we split the complex problem into simpler sub-problems using the split-step idea. For a homogeneous FGLE, we propose a method which has fourth-order of accuracy in time component and spectral accuracy in space variable and for nonhomogeneous one, we introduce another scheme based on the Crank-Nicolson approach which has second-order of accuracy in time variable. Due to using the Fourier spectral method for fractional Laplacian operator, the resulting schemes are fully diagonal and easy to code. Numerical results are reported in terms of accuracy, computational order and CPU time to demonstrate the accuracy and efficiency of the proposed methods and to compare the results with the analytical solutions. The results show that the present methods are accurate and require low CPU time. It is illustrated that the numerical results are in good agreement with the theoretical ones.

  5. Far-field radiation patterns of aperture antennas by the Winograd Fourier transform algorithm

    NASA Technical Reports Server (NTRS)

    Heisler, R.

    1978-01-01

    A more time-efficient algorithm for computing the discrete Fourier transform, the Winograd Fourier transform (WFT), is described. The WFT algorithm is compared with other transform algorithms. Results indicate that the WFT algorithm in antenna analysis appears to be a very successful application. Significant savings in cpu time will improve the computer turn around time and circumvent the need to resort to weekend runs.

  6. The relationship between time spent communicating and communication outcomes on a hospital medicine service.

    PubMed

    Rothberg, Michael B; Steele, John R; Wheeler, John; Arora, Ashish; Priya, Aruna; Lindenauer, Peter K

    2012-02-01

    Quality care depends on effective communication between caregivers, but it is unknown whether time spent communicating is associated with communication outcomes. To assess the association between time spent communicating, agreement on plan of care, and patient satisfaction. Time-motion study with cross-sectional survey. Academic medical center. Physicians, patients, and nurses on a hospital medicine service. Hospitalists' forms of communication were timed with a stopwatch. Physician-nurse agreement on the plan of care and patient satisfaction with physician communication were assessed via survey. Eighteen hospitalists were observed caring for 379 patients. On average, physicians spent more time per patient on written than verbal communication (median: 9.2 min. vs. 6.3 min, p<0.001). Verbal communication was greatest with patients (mean time 5.3 min, range 0-37 min), then other physicians (1.4 min), families (1.1 min), nurses (1.1 min), and case managers (0.4 min). There was no verbal communication with nurses in 30% of cases. Nurses and physicians agreed most about planned procedures (87%), principal diagnosis (74%), tests ordered (73%), anticipated discharge date (69%) and least regarding medication changes (59%). There was no association between time spent communicating and agreement on plan of care. Among 123 patients who completed surveys (response rate 32%), time physicians spent talking to patients was not correlated with patients' satisfaction with physician communication (Pearson correlation coefficient = 0.09, p=0.30). Hospitalists vary in the amount of time they spend communicating, but we found no association between time spent and either patient satisfaction or nurse-physician agreement on plan of care.

  7. Naturally Occurring Changes in Time Spent Watching Television Are Inversely Related to Frequency of Physical Activity during Early Adolescence

    ERIC Educational Resources Information Center

    Motl, Robert W.; McAuley, Edward; Birnbaum, Amanda S.; Lytle, Leslie A.

    2006-01-01

    In this longitudinal study, we examined the relationship between changes in time spent watching television and playing video games with frequency of leisure-time physical activity across a 2-year period among adolescent boys and girls (N=4594). Latent growth modelling indicated that a decrease in time spent watching television was associated with…

  8. 5 CFR 551.431 - Time spent on standby duty or in an on-call status.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... § 551.431 Time spent on standby duty or in an on-call status. (a)(1) An employee is on duty, and time... 5 Administrative Personnel 1 2011-01-01 2011-01-01 false Time spent on standby duty or in an on-call status. 551.431 Section 551.431 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL...

  9. 5 CFR 551.431 - Time spent on standby duty or in an on-call status.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... § 551.431 Time spent on standby duty or in an on-call status. (a)(1) An employee is on duty, and time... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Time spent on standby duty or in an on-call status. 551.431 Section 551.431 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL...

  10. Measurements of neuron soma size and density in rat dorsal striatum, nucleus accumbens core and nucleus accumbens shell: differences between striatal region and brain hemisphere, but not sex.

    PubMed

    Meitzen, John; Pflepsen, Kelsey R; Stern, Christopher M; Meisel, Robert L; Mermelstein, Paul G

    2011-01-07

    Both hemispheric bias and sex differences exist in striatal-mediated behaviors and pathologies. The extent to which these dimorphisms can be attributed to an underlying neuroanatomical difference is unclear. We therefore quantified neuron soma size and density in the dorsal striatum (CPu) as well as the core (AcbC) and shell (AcbS) subregions of the nucleus accumbens to determine whether these anatomical measurements differ by region, hemisphere, or sex in adult Sprague-Dawley rats. Neuron soma size was larger in the CPu than the AcbC or AcbS. Neuron density was greatest in the AcbS, intermediate in the AcbC, and least dense in the CPu. CPu neuron density was greater in the left in comparison to the right hemisphere. No attribute was sexually dimorphic. These results provide the first evidence that hemispheric bias in the striatum and striatal-mediated behaviors can be attributed to a lateralization in neuronal density within the CPu. In contrast, sexual dimorphisms appear mediated by factors other than gross anatomical differences. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  11. 48 CFR 252.204-7011 - Alternative Line Item Structure.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Unit Unit price Amount 0001 Computer, Desktop with CPU, Monitor, Keyboard and Mouse 20 EA Alternative... Unit Unit Price Amount 0001 Computer, Desktop with CPU, Keyboard and Mouse 20 EA 0002 Monitor 20 EA...

  12. 48 CFR 252.204-7011 - Alternative Line Item Structure.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Unit Unit price Amount 0001 Computer, Desktop with CPU, Monitor, Keyboard and Mouse 20 EA Alternative... Unit Unit Price Amount 0001 Computer, Desktop with CPU, Keyboard and Mouse 20 EA 0002 Monitor 20 EA...

  13. 48 CFR 252.204-7011 - Alternative Line Item Structure.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Unit Unit price Amount 0001 Computer, Desktop with CPU, Monitor, Keyboard and Mouse 20 EA Alternative... Unit Unit Price Amount 0001 Computer, Desktop with CPU, Keyboard and Mouse 20 EA 0002 Monitor 20 EA...

  14. 48 CFR 252.204-7011 - Alternative Line Item Structure.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Unit Unit price Amount 0001 Computer, Desktop with CPU, Monitor, Keyboard and Mouse 20 EA Alternative... Unit Unit Price Amount 0001 Computer, Desktop with CPU, Keyboard and Mouse 20 EA 0002 Monitor 20 EA...

  15. Time Spent on Homework, Mathematics Anxiety and Mathematics Achievement: Evidence from a US Sample

    ERIC Educational Resources Information Center

    Cheema, Jehanzeb R.; Sheridan, Kimberly

    2015-01-01

    This study investigated the effect of time spent on homework and mathematics anxiety on mathematics achievement. Data from a nationally representative US sample consisting of 4,978 cases was used to predict mathematics achievement from time spent on homework and mathematics anxiety while controlling for demographic differences such as gender,…

  16. 29 CFR 778.318 - Productive and nonproductive hours of work.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 29 Labor 3 2011-07-01 2011-07-01 false Productive and nonproductive hours of work. 778.318 Section... nonproductive hours of work. (a) Failure to pay for nonproductive time worked. Some agreements provide for payment only for the hours spent in productive work; the work hours spent in waiting time, time spent in...

  17. Teacher Time Spent on Student Health Issues and School Nurse Presence

    ERIC Educational Resources Information Center

    Hill, Nina Jean; Hollis, Marianne

    2012-01-01

    Elementary school teacher time spent on student health issues and the relationship to school nurse services was the focus of this 2-year study. A cross-sectional design was used to survey traditional and exceptional (special needs) classroom teachers about the time they spent on health issues and their perception of school nurse presence. The…

  18. 5 CFR 551.423 - Time spent in training or attending a lecture, meeting, or conference.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... working hours shall be considered hours of work. (2) Time spent in training outside regular working hours... performance of the duties and responsibilities of his or her current position. (3) Time spent in... training under the Veterans Recruitment Act (5 CFR part 307) outside regular working hours shall not be...

  19. The Relationship between Video Game Use and a Performance-Based Measure of Persistence

    ERIC Educational Resources Information Center

    Ventura, Matthew; Shute, Valerie; Zhao, Weinan

    2013-01-01

    An online performance-based measure of persistence was developed using anagrams and riddles. Persistence was measured by recording the time spent on unsolved anagrams and riddles. Time spent on unsolved problems was correlated to a self-report measure of persistence. Additionally, frequent video game players spent longer times on unsolved problems…

  20. Parental Involvement, Child Temperament, and Parents' Work Hours: Differential Relations for Mothers and Fathers

    ERIC Educational Resources Information Center

    Brown, Geoffrey L.; McBride, Brent A.; Bost, Kelly K.; Shin, Nana

    2011-01-01

    This study examined how child temperament was related to parents' time spent accessible to and interacting with their 2-year-olds. Bivariate analyses indicated that both fathers and mothers spent more time with temperamentally challenging children than easier children on workdays, but fathers spent less time with challenging children than easier…

  1. Toward GPGPU accelerated human electromechanical cardiac simulations

    PubMed Central

    Vigueras, Guillermo; Roy, Ishani; Cookson, Andrew; Lee, Jack; Smith, Nicolas; Nordsletten, David

    2014-01-01

    In this paper, we look at the acceleration of weakly coupled electromechanics using the graphics processing unit (GPU). Specifically, we port to the GPU a number of components of Heart—a CPU-based finite element code developed for simulating multi-physics problems. On the basis of a criterion of computational cost, we implemented on the GPU the ODE and PDE solution steps for the electrophysiology problem and the Jacobian and residual evaluation for the mechanics problem. Performance of the GPU implementation is then compared with single core CPU (SC) execution as well as multi-core CPU (MC) computations with equivalent theoretical performance. Results show that for a human scale left ventricle mesh, GPU acceleration of the electrophysiology problem provided speedups of 164 × compared with SC and 5.5 times compared with MC for the solution of the ODE model. Speedup of up to 72 × compared with SC and 2.6 × compared with MC was also observed for the PDE solve. Using the same human geometry, the GPU implementation of mechanics residual/Jacobian computation provided speedups of up to 44 × compared with SC and 2.0 × compared with MC. © 2013 The Authors. International Journal for Numerical Methods in Biomedical Engineering published by John Wiley & Sons, Ltd. PMID:24115492

  2. Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation.

    PubMed

    Ziegenhein, Peter; Pirner, Sven; Ph Kamerling, Cornelis; Oelfke, Uwe

    2015-08-07

    Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37[Formula: see text] compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25[Formula: see text] and 1.95[Formula: see text] faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.

  3. A Study on the Effectiveness of Lockup-Free Caches for a Reduced Instruction Set Computer (RISC) Processor

    DTIC Science & Technology

    1992-09-01

    to acquire or develop effective simulation tools to observe the behavior of a RISC implementation as it executes different types of programs . We choose...Performance Computer performance is measured by the amount of the time required to execute a program . Performance encompasses two types of time, elapsed time...and CPU time. Elapsed time is the time required to execute a program from start to finish. It includes latency of input/output activities such as

  4. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

    PubMed Central

    Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

    2014-01-01

    The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545

  5. Rapid and semi-analytical design and simulation of a toroidal magnet made with YBCO and MgB 2 superconductors

    DOE PAGES

    Dimitrov, I. K.; Zhang, X.; Solovyov, V. F.; ...

    2015-07-07

    Recent advances in second-generation (YBCO) high-temperature superconducting wire could potentially enable the design of super high performance energy storage devices that combine the high energy density of chemical storage with the high power of superconducting magnetic storage. However, the high aspect ratio and the considerable filament size of these wires require the concomitant development of dedicated optimization methods that account for the critical current density in type-II superconductors. In this study, we report on the novel application and results of a CPU-efficient semianalytical computer code based on the Radia 3-D magnetostatics software package. Our algorithm is used to simulate andmore » optimize the energy density of a superconducting magnetic energy storage device model, based on design constraints, such as overall size and number of coils. The rapid performance of the code is pivoted on analytical calculations of the magnetic field based on an efficient implementation of the Biot-Savart law for a large variety of 3-D “base” geometries in the Radia package. The significantly reduced CPU time and simple data input in conjunction with the consideration of realistic input variables, such as material-specific, temperature, and magnetic-field-dependent critical current densities, have enabled the Radia-based algorithm to outperform finite-element approaches in CPU time at the same accuracy levels. Comparative simulations of MgB 2 and YBCO-based devices are performed at 4.2 K, in order to ascertain the realistic efficiency of the design configurations.« less

  6. Robotic goalie with 3 ms reaction time at 4% CPU load using event-based dynamic vision sensor

    PubMed Central

    Delbruck, Tobi; Lang, Manuel

    2013-01-01

    Conventional vision-based robotic systems that must operate quickly require high video frame rates and consequently high computational costs. Visual response latencies are lower-bound by the frame period, e.g., 20 ms for 50 Hz frame rate. This paper shows how an asynchronous neuromorphic dynamic vision sensor (DVS) silicon retina is used to build a fast self-calibrating robotic goalie, which offers high update rates and low latency at low CPU load. Independent and asynchronous per pixel illumination change events from the DVS signify moving objects and are used in software to track multiple balls. Motor actions to block the most “threatening” ball are based on measured ball positions and velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output map during idle periods so that it can plan open-loop arm movements to desired visual locations. Blocking capability is about 80% for balls shot from 1 m from the goal even with the fastest-shots, and approaches 100% accuracy when the ball does not beat the limits of the servo motor to move the arm to the necessary position in time. Running with standard USB buses under a standard preemptive multitasking operating system (Windows), the goalie robot achieves median update rates of 550 Hz, with latencies of 2.2 ± 2 ms from ball movement to motor command at a peak CPU load of less than 4%. Practical observations and measurements of USB device latency are provided1. PMID:24311999

  7. Brain and behaviour phenotyping of a mouse model of neurofibromatosis type-1: an MRI/DTI study on social cognition.

    PubMed

    Petrella, L I; Cai, Y; Sereno, J V; Gonçalves, S I; Silva, A J; Castelo-Branco, M

    2016-09-01

    Neurofibromatosis type-1 (NF1) is a common neurogenetic disorder and an important cause of intellectual disability. Brain-behaviour associations can be examined in vivo using morphometric magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI) to study brain structure. Here, we studied structural and behavioural phenotypes in heterozygous Nf1 mice (Nf1(+/-) ) using T2-weighted imaging MRI and DTI, with a focus on social recognition deficits. We found that Nf1(+/-) mice have larger volumes than wild-type (WT) mice in regions of interest involved in social cognition, the prefrontal cortex (PFC) and the caudate-putamen (CPu). Higher diffusivity was found across a distributed network of cortical and subcortical brain regions, within and beyond these regions. Significant differences were observed for the social recognition test. Most importantly, significant structure-function correlations were identified concerning social recognition performance and PFC volumes in Nf1(+/-) mice. Analyses of spatial learning corroborated the previously known deficits in the mutant mice, as corroborated by platform crossings, training quadrant time and average proximity measures. Moreover, linear discriminant analysis of spatial performance identified 2 separate sub-groups in Nf1(+/-) mice. A significant correlation between quadrant time and CPu volumes was found specifically for the sub-group of Nf1(+/-) mice with lower spatial learning performance, suggesting additional evidence for reorganization of this region. We found strong evidence that social and spatial cognition deficits can be associated with PFC/CPu structural changes and reorganization in NF1. © 2016 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  8. SU-E-T-493: Accelerated Monte Carlo Methods for Photon Dosimetry Using a Dual-GPU System and CUDA.

    PubMed

    Liu, T; Ding, A; Xu, X

    2012-06-01

    To develop a Graphics Processing Unit (GPU) based Monte Carlo (MC) code that accelerates dose calculations on a dual-GPU system. We simulated a clinical case of prostate cancer treatment. A voxelized abdomen phantom derived from 120 CT slices was used containing 218×126×60 voxels, and a GE LightSpeed 16-MDCT scanner was modeled. A CPU version of the MC code was first developed in C++ and tested on Intel Xeon X5660 2.8GHz CPU, then it was translated into GPU version using CUDA C 4.1 and run on a dual Tesla m 2 090 GPU system. The code was featured with automatic assignment of simulation task to multiple GPUs, as well as accurate calculation of energy- and material- dependent cross-sections. Double-precision floating point format was used for accuracy. Doses to the rectum, prostate, bladder and femoral heads were calculated. When running on a single GPU, the MC GPU code was found to be ×19 times faster than the CPU code and ×42 times faster than MCNPX. These speedup factors were doubled on the dual-GPU system. The dose Result was benchmarked against MCNPX and a maximum difference of 1% was observed when the relative error is kept below 0.1%. A GPU-based MC code was developed for dose calculations using detailed patient and CT scanner models. Efficiency and accuracy were both guaranteed in this code. Scalability of the code was confirmed on the dual-GPU system. © 2012 American Association of Physicists in Medicine.

  9. GPU acceleration towards real-time image reconstruction in 3D tomographic diffractive microscopy

    NASA Astrophysics Data System (ADS)

    Bailleul, J.; Simon, B.; Debailleul, M.; Liu, H.; Haeberlé, O.

    2012-06-01

    Phase microscopy techniques regained interest in allowing for the observation of unprepared specimens with excellent temporal resolution. Tomographic diffractive microscopy is an extension of holographic microscopy which permits 3D observations with a finer resolution than incoherent light microscopes. Specimens are imaged by a series of 2D holograms: their accumulation progressively fills the range of frequencies of the specimen in Fourier space. A 3D inverse FFT eventually provides a spatial image of the specimen. Consequently, acquisition then reconstruction are mandatory to produce an image that could prelude real-time control of the observed specimen. The MIPS Laboratory has built a tomographic diffractive microscope with an unsurpassed 130nm resolution but a low imaging speed - no less than one minute. Afterwards, a high-end PC reconstructs the 3D image in 20 seconds. We now expect an interactive system providing preview images during the acquisition for monitoring purposes. We first present a prototype implementing this solution on CPU: acquisition and reconstruction are tied in a producer-consumer scheme, sharing common data into CPU memory. Then we present a prototype dispatching some reconstruction tasks to GPU in order to take advantage of SIMDparallelization for FFT and higher bandwidth for filtering operations. The CPU scheme takes 6 seconds for a 3D image update while the GPU scheme can go down to 2 or > 1 seconds depending on the GPU class. This opens opportunities for 4D imaging of living organisms or crystallization processes. We also consider the relevance of GPU for 3D image interaction in our specific conditions.

  10. Multi-Threaded Algorithms for GPGPU in the ATLAS High Level Trigger

    NASA Astrophysics Data System (ADS)

    Conde Muíño, P.; ATLAS Collaboration

    2017-10-01

    General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with Level-1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz Level-1 acceptance rate to 1.5 kHz for recording, requiring an average per-event processing time of ∼ 250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant challenge that will increase significantly with future LHC upgrades. During the LHC data taking period starting in 2021, luminosity will reach up to three times the original design value. Luminosity will increase further to 7.5 times the design value in 2026 following LHC and ATLAS upgrades. Corresponding improvements in the speed of the reconstruction code will be needed to provide the required trigger selection power within affordable computing resources. Key factors determining the potential benefit of including GPGPU as part of the HLT processor farm are: the relative speed of the CPU and GPGPU algorithm implementations; the relative execution times of the GPGPU algorithms and serial code remaining on the CPU; the number of GPGPU required, and the relative financial cost of the selected GPGPU. We give a brief overview of the algorithms implemented and present new measurements that compare the performance of various configurations exploiting GPGPU cards.

  11. WARP3D-Release 10.8: Dynamic Nonlinear Analysis of Solids using a Preconditioned Conjugate Gradient Software Architecture

    NASA Technical Reports Server (NTRS)

    Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.

    1998-01-01

    This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.

  12. Trends in Spouses’ Shared Time in the United States, 1965–2012

    PubMed Central

    Genadek, Katie R.; Flood, Sarah M.; Roman, Joan Garcia

    2016-01-01

    Despite major demographic changes over the past 50 years and strong evidence that time spent with a spouse is important for marriages, we know very little about how time with a spouse has changed—or not—in the United States. Using time diary data from 1965–2012, we examine trends in couples’ shared time in the United States during a period of major changes in American marriages and families. We find that couples without children spent more total time together and time alone together n 2012 than they did in 1965, with total time and time alone together both peaking in 1975. For parents, time spent together increased between 1965 and 2012, most dramatically for time spent with a spouse and children. Decomposition analyses show that changes in behavior rather than changing demographics explain these trends, and we find that the increases in couples’ shared time are primarily concentrated in leisure activities. PMID:27730493

  13. Comparison of Sedentary Behaviors between Children with Autism Spectrum Disorders and Typically Developing Children

    ERIC Educational Resources Information Center

    Must, Aviva; Phillips, Sarah M.; Curtin, Carol; Anderson, Sarah E.; Maslin, Melissa; Lividini, Keith; Bandini, Linda G.

    2014-01-01

    Time spent in sedentary behavior is largely due to time spent engaged with electronic screen media. Little is known about the extent to which sedentary behaviors for children with autism spectrum disorder differ from typically developing children. We used parental report to assess and compare time spent in sedentary behaviors for 53 children with…

  14. Comparison of combinations of sighting devices and target objects for establishing circular plots in the field

    Treesearch

    Sylvio Mannel; Mark A. Rumble; Maribeth Price; Thomas M. Juntti; Dong Hua

    2006-01-01

    Many aspects of ecological research require measurement of characteristics within plots. Often, the time spent establishing plots is small relative to the time spent collecting and recording data. However, some studies require larger numbers of plots, where the time spent establishing the plot is consequential to the field effort. In open habitats, circular plots are...

  15. Youth day in Los Angeles: connecting youth and nature with technology

    Treesearch

    Deborah J. Chavez

    2009-01-01

    In a statewide survey in Oregon, parents indicated how much time their child spent outdoors relative to their own outdoor childhood experiences. The results indicated children spent as much time as their parents did as children in structured outdoor activities (such as organized sports), but they spent much less time than their parents did as children in outdoor chores...

  16. Estimating the time for dissolution of spent fuel exposed to unlimited water

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leider, H.R.; Nguyen, S.N.; Stout, R.B.

    1991-12-01

    The release of radionuclides from spent fuel cannot be precisely predicted at this point because a satisfactory dissolution model based on specific chemical processes is not yet available. However, preliminary results on the dissolution rate of UO{sub 2} and spent fuel as a function of temperature and water composition have recently been reported. This information, together with data on fragment size distribution of spent fuel, are used to estimate the dissolution response of spent fuel in excess flowing water within the framework of a simple model. In this model, the reaction/dissolution front advances linearly with time and geometry is preserved.more » This also estimates the dissolution rate of the bulk of the fission products and higher actinides, which are uniformly distributed in the UO{sub 2} matrix and are presumed to dissolve congruently. We have used a fuel fragment distribution actually observed to calculate the time for total dissolution of spent fuel. A worst-case estimate was also made using the initial (maximum) rate of dissolution to predict the total dissolution time. The time for total dissolution of centimeter size particles is estimated to be 5.5 {times} 10{sup 4} years at 25{degrees}C.« less

  17. Time spent in home meal preparation affects energy and food group intakes among midlife women.

    PubMed

    Chu, Yen Li; Addo, O Yaw; Perry, Courtney D; Sudo, Noriko; Reicks, Marla

    2012-04-01

    Time spent in meal preparation may be indicative of the healthfulness of meals and therefore with weight status. The purpose of this study was to examine the association between amount of time spent preparing meals and meal food group and nutrient content by meal occasion (breakfast, lunch, and dinner) among 1036 midlife women. Participants completed a 1-day food record and eating occasion questionnaires for each meal occasion. ANCOVA was used to identify possible associations. Approximately half of the participants reported spending <5 min preparing breakfast and lunch, and <20 min preparing dinner. Less time spent preparing breakfast was associated with lower energy and fat intakes (p<0.0001), while less time spent preparing lunch and dinner was associated with lower vegetable and sodium intakes (p<0.0001). There were no apparent differences in the association between time spent preparing meals and meal content by weight status. Nutrition education should encourage home meal preparation while stressing the selection of healthier options. The differing associations by meal occasion suggest that interventions should be tailored according to meal type. Copyright © 2011 Elsevier Ltd. All rights reserved.

  18. Managing Contention and Timing Constraints in a Real-Time Database System

    DTIC Science & Technology

    1995-01-01

    In order to realize many of these goals, StarBase is constructed on top of RT-Mach, a real - time operating system developed at Carnegie Mellon...University [ll]. StarBase differs from previous RT-DBMS work [l, 2, 31 in that a) it relies on a real - time operating system which provides priority...CPU and resource scheduling pro- vided by tlhe underlying real - time operating system . Issues of data contention are dealt with by use of a priority

  19. Mental health and extracurricular education in Korean first graders: a school-based cross-sectional study.

    PubMed

    Hong, Hyun Ju; Kim, Young Shin; Jon, Duk-In; Soek, Jeong Ho; Hong, Narei; Harkavy-Friedman, Jill M; Miller, Ann M; Greenhill, Laurence L

    2011-06-01

    This study explores the results of mental health screening in Korean first graders in association with the amount of time the children spent in extracurricular education. The study included a community sample of 761 boys and girls, with a mean age of 6.6 years, collected from 5 elementary schools in Gunpo-si, South Korea, in July 2007. Primary caregivers completed a questionnaire that included information on demographic characteristics, the amount of time the children spent in extracurricular education and other activities, and an adapted form of the Behavior Assessment System for Children, Second Edition (BASC-2) to screen for mental health problems. These first graders spent a mean of a little over 2 hours each day in extracurricular education. Extracurricular education demonstrated positive correlations with 4 BASC-2 domains, including hyperactivity (r = 0.092, P < .05), aggression (r = 0.073, P < .05), conduct problems (r = 0.073, P < .05) and depression (r = 0.137, P < .01). A positive linear relationship between depression and extracurricular education was also evident in regression analyses (F = 2.25, R(2) = 0.022, P = .001). The relationship held true even when controlling for time spent with parents, time spent with friends, and time spent asleep. Post hoc analyses revealed that children receiving more than 4 hours of extracurricular education per day showed a sharp increase in depressive symptoms as well as a decrease in the amount of time spent with caregivers. Results of this study demonstrate that excessive amounts of time spent in extracurricular education (greater than 4 hours per day) may be associated with depression in school-aged children. These findings have relevance for mental health screening and educational policy. © Copyright 2011 Physicians Postgraduate Press, Inc.

  20. Accelerating next generation sequencing data analysis with system level optimizations.

    PubMed

    Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

    2017-08-22

    Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.

  1. A graphics-card implementation of Monte-Carlo simulations for cosmic-ray transport

    NASA Astrophysics Data System (ADS)

    Tautz, R. C.

    2016-05-01

    A graphics card implementation of a test-particle simulation code is presented that is based on the CUDA extension of the C/C++ programming language. The original CPU version has been developed for the calculation of cosmic-ray diffusion coefficients in artificial Kolmogorov-type turbulence. In the new implementation, the magnetic turbulence generation, which is the most time-consuming part, is separated from the particle transport and is performed on a graphics card. In this article, the modification of the basic approach of integrating test particle trajectories to employ the SIMD (single instruction, multiple data) model is presented and verified. The efficiency of the new code is tested and several language-specific accelerating factors are discussed. For the example of isotropic magnetostatic turbulence, sample results are shown and a comparison to the results of the CPU implementation is performed.

  2. Mitigating the Insider Threat with High-Dimensional Anomaly Detection

    DTIC Science & Technology

    2004-12-01

    a more serious attack. Various systems such as NSM [56], GrIDS [57], snort [58], Emerald [59], and Spice [60] generate alerts for portscan...reboot etc. The user measurements include the user profiles such as time of login , duration of user session, cumulative CPU time, names of files...already been implemented in a real-time system for information retrieval [3]. A technique developed at SRI in the Emerald system [22] uses historical

  3. Meeting the Challenge of Distributed Real-Time & Embedded (DRE) Systems

    DTIC Science & Technology

    2012-05-10

    IP RTOS Middleware Middleware Services DRE Applications Operating Sys & Protocols Hardware & Networks Middleware Middleware Services DRE...Services COTS & standards-based middleware, language, OS , network, & hardware platforms • Real-time CORBA (TAO) middleware • ADAPTIVE Communication...SPLs) F-15 product variant A/V 8-B product variant F/A 18 product variant UCAV product variant Software Produce-Line Hardware (CPU, Memory, I/O) OS

  4. The effect of food environments on fruit and vegetable intake as modified by time spent at home: a cross-sectional study.

    PubMed

    Chum, Antony; Farrell, Eddie; Vaivada, Tyler; Labetski, Anna; Bohnert, Arianne; Selvaratnam, Inthuja; Larsen, Kristian; Pinter, Theresa; O'Campo, Patricia

    2015-06-04

    There is a growing body of research that investigates how the residential neighbourhood context relates to individual diet. However, previous studies ignore participants' time spent in the residential environment and this may be a problem because time-use studies show that adults' time-use pattern can significantly vary. To better understand the role of exposure duration, we designed a study to examine 'time spent at home' as a moderator to the residential food environment-diet association. Cross-sectional observational study. City of Toronto, Ontario, Canada. 2411 adults aged 25-65. Frequency of vegetable and fruit intake (VFI) per day. To examine how time spent at home may moderate the relationship between residential food environment and VFI, the full sample was split into three equal subgroups--short, medium and long duration spent at home. We detected significant associations between density of food stores in the residential food environment and VFI for subgroups that spend medium and long durations at home (ie, spending a mean of 8.0 and 12.3 h at home, respectively--not including sleep time), but no associations exist for people who spend the lowest amount of time at home (mean=4.7 h). Also, no associations were detected in analyses using the full sample. Our study is the first to demonstrate that time spent at home may be an important variable to identify hidden population patterns regarding VFI. Time spent at home can impact the association between the residential food environment and individual VFI. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. Effect of rubber flooring in front of the feed bunk on the time budgets of dairy cattle.

    PubMed

    Fregonesi, Jose A; Tucker, Cassandra B; Weary, Daniel M; Flower, Frances C; Vittie, Tyler

    2004-05-01

    The objective of this experiment was to study the effect of rubber flooring in front of the feed bunk on the immediate behavioral response of dairy cattle. Four groups of 12 dairy cattle were alternately housed in sections of a free-stall barn with either 1.85 m of rubber flooring or grooved concrete in the area in front of the feed bunk. Rubber flooring did not affect time spent eating. However, animals showed a slight, but detectable, increase in time standing without eating on the rubber surface (5.5%) compared with concrete (4.8%). For reasons that are unclear, this increase in time spent standing was not limited to the area in front of the feed bunk; animals spent 11.0% of the available time standing elsewhere in the pen (outside of the free stall but not in front of the feed bunk) when they had access to the rubber flooring, compared with 9.0% when housed with access to only concrete floors. In addition, animals spent slightly less time lying in the free stall when they had access to rubber in front of the feed bunk (52.5 vs. 54.3%). Time spent engaged in behaviors such as standing elsewhere in the pen and eating were variable over time. For example, time spent eating declined from 23.1 to 17.4% over the 6-wk trial. In conclusion, dairy cattle with access to rubber flooring in front of the feeder showed small differences in where and how much time they spent standing, although the biological implications of these small changes are unclear.

  6. The effect of food environments on fruit and vegetable intake as modified by time spent at home: a cross-sectional study

    PubMed Central

    Chum, Antony; Farrell, Eddie; Vaivada, Tyler; Labetski, Anna; Selvaratnam, Inthuja; Larsen, Kristian; Pinter, Theresa; O'Campo, Patricia

    2015-01-01

    Objective There is a growing body of research that investigates how the residential neighbourhood context relates to individual diet. However, previous studies ignore participants’ time spent in the residential environment and this may be a problem because time-use studies show that adults’ time-use pattern can significantly vary. To better understand the role of exposure duration, we designed a study to examine ‘time spent at home’ as a moderator to the residential food environment-diet association. Design Cross-sectional observational study. Settings City of Toronto, Ontario, Canada. Participants 2411 adults aged 25–65. Primary outcome measure Frequency of vegetable and fruit intake (VFI) per day. Results To examine how time spent at home may moderate the relationship between residential food environment and VFI, the full sample was split into three equal subgroups—short, medium and long duration spent at home. We detected significant associations between density of food stores in the residential food environment and VFI for subgroups that spend medium and long durations at home (ie, spending a mean of 8.0 and 12.3 h at home, respectively—not including sleep time), but no associations exist for people who spend the lowest amount of time at home (mean=4.7 h). Also, no associations were detected in analyses using the full sample. Conclusions Our study is the first to demonstrate that time spent at home may be an important variable to identify hidden population patterns regarding VFI. Time spent at home can impact the association between the residential food environment and individual VFI. PMID:26044756

  7. Planning for youth days: planting the SEED to get youth outdoors in nature

    Treesearch

    Deborah J. Chavez; John D. Fehr

    2009-01-01

    In a statewide survey in Oregon, parents indicated how much time their child spent relative to their own outdoor childhood experiences. The results indicated children spent as much time as their parents at that age in structured outdoor activities, such as organized sports, but they spent much less time than their parents did at that age in outdoor chores and...

  8. The Effects of Employment Status and Daily Stressors on Time Spent on Daily Household Chores in Middle-Aged and Older Adults

    ERIC Educational Resources Information Center

    Wong, Jen D.; Almeida, David M.

    2013-01-01

    Purpose of the study: This study examines how employment status (worker vs. retiree) and life course influences (age, gender, and marital status) are associated with time spent on daily household chores. Second, this study assesses whether the associations between daily stressors and time spent on daily household chores differ as a function of…

  9. Application of queuing theory to patient satisfaction at a tertiary hospital in Nigeria

    PubMed Central

    Ameh, Nkeiruka; Sabo, B.; Oyefabi, M. O.

    2013-01-01

    Background: Queuing theory is the mathematical approach to the analysis of waiting lines in any setting where arrival rate of subjects is faster than the system can handle. It is applicable to healthcare settings where the systems have excess capacity to accommodate random variations. Materials and Methods: A cross-sectional descriptive survey was done. Questionnaires were administered to patients who attended the general outpatient department. Observations were also made on the queuing model and the service discipline at the clinic. Questions were meant to obtain demographic characteristics and the time spent on the queue by patients before being seen by a doctor, time spent with the doctor, their views about the time spent on the queue and useful suggestions on how to reduce the time spent on the queue. A total of 210 patients were surveyed. Results: Majority of the patients (164, 78.1%) spent 2 h or less on the queue before being seen by a doctor and less than 1 h to see the doctor. Majority of the patients (144, 68.5%) were satisfied with the time they spent on the queue before being seen by a doctor. Useful suggestions proffered by the patients to decrease the time spent on the queue before seeing a doctor at the clinic included: that more doctors be employed (46, 21.9%), that doctors should come to work on time (25, 11.9%), that first-come-first served be observed strictly (32, 15.2%) and others suggested that the records staff should desist from collecting bribes from patients in order to place their cards before others. The queuing method employed at the clinic is the multiple single channel type and the service discipline is priority service. The patients who spent less time on the queue (<1 h) before seeing the doctor were more satisfied than those who spent more time (P < 0.05). Conclusion: The study has revealed that majority of the patients were satisfied with the practice at the general outpatient department. However, there is a need to employ measures to respond to the suggestions given by the patients who are the beneficiaries of the hospital services. PMID:23661902

  10. Method for calculating the duration of vacuum drying of a metal-concrete container for spent nuclear fuel

    NASA Astrophysics Data System (ADS)

    Karyakin, Yu. E.; Nekhozhin, M. A.; Pletnev, A. A.

    2013-07-01

    A method for calculating the quantity of moisture in a metal-concrete container in the process of its charging with spent nuclear fuel is proposed. A computing method and results obtained by it for conservative estimation of the time of vacuum drying of a container charged with spent nuclear fuel by technologies with quantization and without quantization of the lower fuel element cluster are presented. It has been shown that the absence of quantization in loading spent fuel increases several times the time of vacuum drying of the metal-concrete container.

  11. Comparative analysis of LWR and FBR spent fuels for nuclear forensics evaluation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Permana, Sidik; Suzuki, Mitsutoshi; Su'ud, Zaki

    2012-06-06

    Some interesting issues are attributed to nuclide compositions of spent fuels from thermal reactors as well as fast reactors such as a potential to reuse as recycled fuel, and a possible capability to be manage as a fuel for destructive devices. In addition, analysis on nuclear forensics which is related to spent fuel compositions becomes one of the interesting topics to evaluate the origin and the composition of spent fuels from the spent fuel foot-prints. Spent fuel compositions of different fuel types give some typical spent fuel foot prints and can be estimated the origin of source of those spentmore » fuel compositions. Some technics or methods have been developing based on some science and technological capability including experimental and modeling or theoretical aspects of analyses. Some foot-print of nuclear forensics will identify the typical information of spent fuel compositions such as enrichment information, burnup or irradiation time, reactor types as well as the cooling time which is related to the age of spent fuels. This paper intends to evaluate the typical spent fuel compositions of light water (LWR) and fast breeder reactors (FBR) from the view point of some foot prints of nuclear forensics. An established depletion code of ORIGEN is adopted to analyze LWR spent fuel (SF) for several burnup constants and decay times. For analyzing some spent fuel compositions of FBR, some coupling codes such as SLAROM code, JOINT and CITATION codes including JFS-3-J-3.2R as nuclear data library have been adopted. Enriched U-235 fuel composition of oxide type is used for fresh fuel of LWR and a mixed oxide fuel (MOX) for FBR fresh fuel. Those MOX fuels of FBR come from the spent fuels of LWR. Some typical spent fuels from both LWR and FBR will be compared to distinguish some typical foot-prints of SF based on nuclear forensic analysis.« less

  12. Longitudinal associations between time spent using technology and sleep duration among adolescents.

    PubMed

    Mazzer, K; Bauducco, S; Linton, S J; Boersma, K

    2018-07-01

    Technology use has been the focus of much concern for adolescents' sleep health. However, few studies have investigated the bidirectional association between sleep duration and time spent using technology. The aim of this study was to test whether time spent using technology predicted shorter sleep duration, and/or vice versa using cross-lagged analyses over one year. Participants were 1620 high school students in the 8th and 9th grade at baseline from 17 public schools in three middle Sweden communities. Students completed questionnaires at school during the spring of 2015 and 2016. Time spent using technology was self-reported and sleep duration was calculated from reported bed-times, wake-times and sleep onset latency. Time spent using technology significantly predicted shorter subsequent sleep duration and vice versa. Public health advocates educating others about the negative impacts of technology on sleep must also be mindful of the opposite, that many young people may turn to technological devices when experiencing difficulty sleeping. Copyright © 2018 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  13. Expanding potential of radiofrequency nurse call systems to measure nursing time in patient rooms.

    PubMed

    Fahey, Linda; Dunn Lopez, Karen; Storfjell, Judith; Keenan, Gail

    2013-05-01

    The objective of this study was to determine the utility and feasibility of using data from a nurse call system equipped with radiofrequency identification data (RFID) to measure nursing time spent in patient rooms. Increasing the amount of time nurses spend with hospitalized patients has become a focus after several studies demonstrating that nurses spend most of their time in nondirect care activities rather than delivering patient care. Measurement of nursing time spent in direct care often involves labor-intensive time and motion studies, making frequent or continuous monitoring impractical. Mixed methods were used for this descriptive study. We used 30 days of data from an RFID nurse call system collected on 1 unit in a community hospital to examine nurses time spent in patient rooms. Descriptive statistics were applied to calculate this percentage by role and shift. Data technologists were surveyed to assess how practical the access of data would be in a hospital setting for use in monitoring nursing time spent in patient rooms. The system captured 7393 staff hours. Of that time, 7% did not reflect actual patient care time, so these were eliminated from further analysis. The remaining 6880 hours represented 91% of expected worked time. RNs and nursing assistants spent 33% to 36% of their time in patient rooms, presumably providing direct care. Radiofrequency identification data technology was found to provide feasible and accurate means for capturing and evaluating nursing time spent in patient rooms. Depending on the outcomes per unit, leaders should work with staff to maximize patient care time.

  14. Japanese professional nurses spend unnecessarily long time doing nursing assistants' tasks.

    PubMed

    Kudo, Yasushi; Yoshimura, Emiko; Shahzad, Machiko Taruzuka; Shibuya, Akitaka; Aizawa, Yoshiharu

    2012-09-01

    In environments in which professional nurses do simple tasks, e.g., laundry, cleaning, and waste disposal, they cannot concentrate on technical jobs by utilizing their expertise to its fullest benefit. Particularly, in Japan, the nursing shortage is a serious problem. If professional nurses take their time to do any of these simple tasks, the tasks should be preferentially allocated to nursing assistants. Because there has been no descriptive study to investigate the amount of time Japanese professional nurses spent doing such simple tasks during their working time, their actual conditions remain unclear. Professional nurses recorded their total working time and the time they spent doing such simple tasks during the week of the survey period. The time an individual respondent spent doing one or more simple tasks during that week was summed up, as was their working time. Subsequently, the percentage of the summed time he or she spent doing any of those tasks in his or her summed working time was calculated. A total of 1,086 respondents in 19 hospitals that had 87 to 376 beds were analyzed (response rate: 53.3%). The average time (SD) that respondents spent doing those simple tasks and their total working time were 2.24 (3.35) hours and 37.48 (10.88) hours, respectively. The average percentage (SD) of the time they spent doing the simple tasks in their working time was 6.00% (8.39). Hospital administrators must decrease this percentage. Proper working environments in which professional nurses can concentrate more on their technical jobs must be created.

  15. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

  16. Very Low Levels of Physical Activity in Older Patients During Hospitalization at an Acute Geriatric Ward: A Prospective Cohort Study.

    PubMed

    Villumsen, Morten; Jorgensen, Martin Gronbech; Andreasen, Jane; Rathleff, Michael Skovdal; Mølgaard, Carsten Møller

    2015-10-01

    Lack of activity during hospitalization may contribute to functional decline. The purpose of this study was to investigate (1) the time spent walking during hospitalization by geriatric patients referred to physical and/or occupational therapy and (2) the development in time spent walking during hospitalization. In this observational study, 24-hr accelerometer data (ActivPal) were collected from inclusion to discharge in 124 patients at an acute geriatric ward. The median time spent walking was 7 min per day. During the first quartile of hospitalization, the patients spent 4 (IQR:1;11) min per day walking, increasing to 10 (IQR:1;29) min during the last quartile. Improvement in time spent walking was primarily observed in the group able to perform the Timed Up & Go task at admission. When walking only 7 min per day, patients could be classified as inactive and at risk for functional decline; nonetheless, the physical activity level increased significantly during hospitalization.

  17. Interactivity vs. fairness in networked linux systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Wenji; Crawford, Matt; /Fermilab

    In general, the Linux 2.6 scheduler can ensure fairness and provide excellent interactive performance at the same time. However, our experiments and mathematical analysis have shown that the current Linux interactivity mechanism tends to incorrectly categorize non-interactive network applications as interactive, which can lead to serious fairness or starvation issues. In the extreme, a single process can unjustifiably obtain up to 95% of the CPU! The root cause is due to the facts that: (1) network packets arrive at the receiver independently and discretely, and the 'relatively fast' non-interactive network process might frequently sleep to wait for packet arrival. Thoughmore » each sleep lasts for a very short period of time, the wait-for-packet sleeps occur so frequently that they lead to interactive status for the process. (2) The current Linux interactivity mechanism provides the possibility that a non-interactive network process could receive a high CPU share, and at the same time be incorrectly categorized as 'interactive.' In this paper, we propose and test a possible solution to address the interactivity vs. fairness problems. Experiment results have proved the effectiveness of the proposed solution.« less

  18. GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU.

    PubMed

    Su, Xiaoquan; Wang, Xuetao; Jing, Gongchao; Ning, Kang

    2014-04-01

    The number of microbial community samples is increasing with exponential speed. Data-mining among microbial community samples could facilitate the discovery of valuable biological information that is still hidden in the massive data. However, current methods for the comparison among microbial communities are limited by their ability to process large amount of samples each with complex community structure. We have developed an optimized GPU-based software, GPU-Meta-Storms, to efficiently measure the quantitative phylogenetic similarity among massive amount of microbial community samples. Our results have shown that GPU-Meta-Storms would be able to compute the pair-wise similarity scores for 10 240 samples within 20 min, which gained a speed-up of >17 000 times compared with single-core CPU, and >2600 times compared with 16-core CPU. Therefore, the high-performance of GPU-Meta-Storms could facilitate in-depth data mining among massive microbial community samples, and make the real-time analysis and monitoring of temporal or conditional changes for microbial communities possible. GPU-Meta-Storms is implemented by CUDA (Compute Unified Device Architecture) and C++. Source code is available at http://www.computationalbioenergy.org/meta-storms.html.

  19. Hotspot detection using image pattern recognition based on higher-order local auto-correlation

    NASA Astrophysics Data System (ADS)

    Maeda, Shimon; Matsunawa, Tetsuaki; Ogawa, Ryuji; Ichikawa, Hirotaka; Takahata, Kazuhiro; Miyairi, Masahiro; Kotani, Toshiya; Nojima, Shigeki; Tanaka, Satoshi; Nakagawa, Kei; Saito, Tamaki; Mimotogi, Shoji; Inoue, Soichi; Nosato, Hirokazu; Sakanashi, Hidenori; Kobayashi, Takumi; Murakawa, Masahiro; Higuchi, Tetsuya; Takahashi, Eiichi; Otsu, Nobuyuki

    2011-04-01

    Below 40nm design node, systematic variation due to lithography must be taken into consideration during the early stage of design. So far, litho-aware design using lithography simulation models has been widely applied to assure that designs are printed on silicon without any error. However, the lithography simulation approach is very time consuming, and under time-to-market pressure, repetitive redesign by this approach may result in the missing of the market window. This paper proposes a fast hotspot detection support method by flexible and intelligent vision system image pattern recognition based on Higher-Order Local Autocorrelation. Our method learns the geometrical properties of the given design data without any defects as normal patterns, and automatically detects the design patterns with hotspots from the test data as abnormal patterns. The Higher-Order Local Autocorrelation method can extract features from the graphic image of design pattern, and computational cost of the extraction is constant regardless of the number of design pattern polygons. This approach can reduce turnaround time (TAT) dramatically only on 1CPU, compared with the conventional simulation-based approach, and by distributed processing, this has proven to deliver linear scalability with each additional CPU.

  20. Intact intracortical microstimulation (ICMS) representations of rostral and caudal forelimb areas in rats with quinolinic acid lesions of the medial or lateral caudate-putamen in an animal model of Huntington's disease.

    PubMed

    Karl, Jenni M; Sacrey, Lori-Ann R; McDonald, Robert J; Whishaw, Ian Q

    2008-09-05

    Neurotoxic, cell-specific lesions of the rat caudate-putamen (CPu) have been proposed as a model of human Huntington's disease and as such impair performance on many motor tasks, including skilled forelimbs tasks such as reaching for food. Because the CPu and motor cortex share reciprocal connections, it has been proposed that the motor deficits are due in part to a secondary disruption of motor cortex. The purpose of the present study was to examine the functionality of the motor cortex using intracortical microstimulation (ICMS) following neurotoxic lesions of the CPu. ICMS maps have been shown to be sensitive indicators of motor skill, cortical injury, learning, and experience. Long-evans hooded rats received a sham, a medial, or a lateral CPu lesion using the neurotoxin, quinolinic acid (2,3-pyridinedicarboxylic acid). Two weeks later the motor cortex was stimulated under light ketamine anesthesia. Neither lateral nor medial lesions of the CPu altered the stimulation threshold for eliciting forelimb movements, the type of movements elicited, or the size of the rostral forelimb (RFA) and caudal forelimb areas (CFA) from which movements were elicited. The preservation of ICMS forelimb movement representations (the forelimb map) in rats with cell-specific CPu lesions suggests motor impairments following lesions of the lateral striatum are not due to the disruption of the motor map. Therefore, the impairments that follow striatal cell loss are due either to alterations in circuitry that is independent of motor cortex or to alterations in circuitry afferent to the motor cortex projections.

  1. Habitual Physical Activity in Children With Cerebral Palsy Aged 4 to 5 Years Across All Functional Abilities.

    PubMed

    Keawutan, Piyapa; Bell, Kristie L; Oftedal, Stina; Davies, Peter S W; Ware, Robert S; Boyd, Roslyn N

    2017-01-01

    To compare ambulatory status in children with cerebral palsy aged 4 to 5 years with their habitual physical activity and time spent sedentary, and to compare their activity with physical activity guidelines. Sixty-seven participants-independently ambulant, marginally ambulant, and nonambulant-wore accelerometers for 3 days. Time spent sedentary as a percentage of wear time and activity counts were compared between groups. There were significant differences in time spent sedentary and activity counts between groups. Children who were independently ambulant were more likely to meet physical activity guidelines. Children with cerebral palsy spent more than half of their waking hours in sedentary time. Interventions to reduce sedentary behavior and increase habitual physical activity are needed in children with cerebral palsy at age 4 to 5 years.

  2. Collateral projections of nucleus raphe dorsalis neurones to the caudate-putamen and region around the nucleus raphe magnus and nucleus reticularis gigantocellularis pars alpha in the rat.

    PubMed

    Li, Y Q; Kaneko, T; Mizuno, N

    2001-02-16

    It was examined whether or not the nucleus raphe dorsalis (RD) neurons projecting to the caudate-putamen (CPu) might also project to the motor-controlling region around the nucleus raphe magnus (NRM) and nucleus reticularis gigantocellularis pars alpha (Gia) in the rat. Single RD neurons projecting to the CPu and NRM/Gia by way of axon collaterals were identified by the retrograde double-labeling method with fluorescent dyes, Fast Blue and Diamidino Yellow, which were injected respectively into the CPu and NRM/Gia. Then, serotonin (5-HT)-like immunoreactivity of the double-labeled RD neurons was examined immunohistochemically; approximately 60% of the double-labeled RD neurons showed 5-HT-like immunoreactivity. The results indicated that some of serotonergic and non-serotonergic RD neurons might control motor functions simultaneously at the levels of the CPu and NRM/Gia by way of axon collaterals.

  3. An evaluation of superminicomputers for thermal analysis

    NASA Technical Reports Server (NTRS)

    Storaasli, O. O.; Vidal, J. B.; Jones, G. K.

    1982-01-01

    The use of superminicomputers for solving a series of increasingly complex thermal analysis problems is investigated. The approach involved (1) installation and verification of the SPAR thermal analyzer software on superminicomputers at Langley Research Center and Goddard Space Flight Center, (2) solution of six increasingly complex thermal problems on this equipment, and (3) comparison of solution (accuracy, CPU time, turnaround time, and cost) with solutions on large mainframe computers.

  4. Real Time Control of the SSC String Magnets

    NASA Astrophysics Data System (ADS)

    Calvo, O.; Flora, R.; MacPherson, M.

    1987-08-01

    The system described in this paper, called SECAR, was designed to control the excitation of a test string of magnets for the proposed Superconducting Super Collider (SSC) and will be used to upgrade the present Tevatron Excitation, Control and Regulation (TECAR) hardware and software . It resides in a VME crate and is controlled by a 68020/68881 based CPU running the application software under a real time operating system named VRTX.

  5. Real time control of the SSC string magnets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Calvo, O.; Flora, R.; MacPherson, M.

    1987-08-01

    The system described in this paper, called SECAR, was designed to control the excitation of a test string of magnets for the proposed Superconducting Super Collider (SSC) and will be used to upgrade the present Tevatron Excitation, Control and Regulation (TECAR) hardware and software. It resides in a VME orate and is controlled by a 68020/68881 based CPU running the application software under a real time operating system named VRTX.

  6. Adaptive Multilevel Middleware for Object Systems

    DTIC Science & Technology

    2006-12-01

    the system at the system-call level or using the CORBA-standard Extensible Transport Framework ( ETF ). Transparent insertion is highly desirable from an...often as it needs to. This is remedied by using the real-time scheduling class in a stock Linux kernel. We used schedsetscheduler system call (with...real-time scheduling class (SCHEDFIFO) for all the ML-NFD programs, later experiments with CPU load indicate that a stock Linux kernel is not

  7. GPU-based prompt gamma ray imaging from boron neutron capture therapy.

    PubMed

    Yoon, Do-Kun; Jung, Joo-Young; Jo Hong, Key; Sil Lee, Keum; Suk Suh, Tae

    2015-01-01

    The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU). Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.

  8. Machine-Aided Indexing of Technical Literature

    ERIC Educational Resources Information Center

    Klingbiel, Paul H.

    1973-01-01

    To index at the Defense Documentation Center (DDC), an automated system must choose single words or phrases rapidly and economically. Automation of DDC's indexing has been machine-aided from its inception. A machine-aided indexing system is described that indexes one million words of text per hour of CPU time. (22 references) (Author/SJ)

  9. An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients

    NASA Astrophysics Data System (ADS)

    Mnaouer, Adel B.; Ragoonath, Colin

    2010-11-01

    Volunteer Computing (VC) is a Distributed Computing model which utilizes idle CPU cycles from computing resources donated by volunteers who are connected through the Internet to form a very large-scale, loosely coupled High Performance Computing environment. Distributed Volunteer Computing environments such as the BOINC framework is concerned mainly with the efficient scheduling of the available resources to the applications which require them. The BOINC framework thus contains a number of scheduling policies/algorithms both on the server-side and on the client which work together to maximize the available resources and to provide a degree of QoS in an environment which is highly volatile. This paper focuses on the BOINC client and introduces an adaptive priority tuning client side middleware application which improves the execution times of Work Units (WUs) while maintaining an acceptable Maximum Response Time (MRT) for the end user. We have conducted extensive experimentation of the proposed system and the results show clear speedup of BOINC applications using our optimized middleware as opposed to running using the original BOINC client.

  10. Fog computing job scheduling optimization based on bees swarm

    NASA Astrophysics Data System (ADS)

    Bitam, Salim; Zeadally, Sherali; Mellouk, Abdelhamid

    2018-04-01

    Fog computing is a new computing architecture, composed of a set of near-user edge devices called fog nodes, which collaborate together in order to perform computational services such as running applications, storing an important amount of data, and transmitting messages. Fog computing extends cloud computing by deploying digital resources at the premise of mobile users. In this new paradigm, management and operating functions, such as job scheduling aim at providing high-performance, cost-effective services requested by mobile users and executed by fog nodes. We propose a new bio-inspired optimization approach called Bees Life Algorithm (BLA) aimed at addressing the job scheduling problem in the fog computing environment. Our proposed approach is based on the optimized distribution of a set of tasks among all the fog computing nodes. The objective is to find an optimal tradeoff between CPU execution time and allocated memory required by fog computing services established by mobile users. Our empirical performance evaluation results demonstrate that the proposal outperforms the traditional particle swarm optimization and genetic algorithm in terms of CPU execution time and allocated memory.

  11. Transient dynamics capability at Sandia National Laboratories

    NASA Technical Reports Server (NTRS)

    Attaway, Steven W.; Biffle, Johnny H.; Sjaardema, G. D.; Heinstein, M. W.; Schoof, L. A.

    1993-01-01

    A brief overview of the transient dynamics capabilities at Sandia National Laboratories, with an emphasis on recent new developments and current research is presented. In addition, the Sandia National Laboratories (SNL) Engineering Analysis Code Access System (SEACAS), which is a collection of structural and thermal codes and utilities used by analysts at SNL, is described. The SEACAS system includes pre- and post-processing codes, analysis codes, database translation codes, support libraries, Unix shell scripts for execution, and an installation system. SEACAS is used at SNL on a daily basis as a production, research, and development system for the engineering analysts and code developers. Over the past year, approximately 190 days of CPU time were used by SEACAS codes on jobs running from a few seconds up to two and one-half days of CPU time. SEACAS is running on several different systems at SNL including Cray Unicos, Hewlett Packard PH-UX, Digital Equipment Ultrix, and Sun SunOS. An overview of SEACAS, including a short description of the codes in the system, are presented. Abstracts and references for the codes are listed at the end of the report.

  12. Television viewing, computer game playing, and Internet use and self-reported time to bed and time out of bed in secondary-school children.

    PubMed

    Van den Bulck, Jan

    2004-02-01

    To investigate the relationship between the presence of a television set, a gaming computer, and/or an Internet connection in the room of adolescents and television viewing, computer game playing, and Internet use on the one hand, and time to bed, time up, time spent in bed, and overall tiredness in first- and fourth-year secondary-school children on the other hand. A random sample of students from 15 schools in Flanders, Belgium, yielded 2546 children who completed a questionnaire with questions about media presence in bedrooms; volume of television viewing, computer game playing, and Internet use; time to bed and time up on average weekdays and average weekend days; and questions regarding the level of tiredness in the morning, at school, after a day at school, and after the weekend. Children with a television set in their rooms went to bed significantly later on weekdays and weekend days and got up significantly later on weekend days. Overall, they spent less time in bed on weekdays. Children with a gaming computer in their rooms went to bed significantly later on weekdays. On weekdays, they spent significantly less time in bed. Children who watched more television went to bed later on weekdays and weekend days and got up later on weekend days. They spent less time in bed on weekdays. They reported higher overall levels of being tired. Children who spent more time playing computer games went to bed later on weekdays and weekend days and got up later on weekend days. On weekdays, they actually got up significantly earlier. They spent less time in bed on weekdays and reported higher levels of tiredness. Children who spent more time using the Internet went to bed significantly later during the week and during the weekend. They got up later on weekend days. They spent less time in bed during the week and reported higher levels of tiredness. Going out was also significantly related to sleeping later and less. Concerns about media use should not be limited to television. Computer game playing and Internet use are related to sleep behavior as well. Leisure activities that are unstructured seem to be negatively related to good sleep patterns. Imposing more structure (eg, end times) might reduce impact.

  13. A Time Study of Plastic Surgery Residents.

    PubMed

    Lau, Frank H; Sinha, Indranil; Jiang, Wei; Lipsitz, Stuart R; Eriksson, Elof

    2016-05-01

    Resident work hours are under scrutiny and have been subject to multiple restrictions. The studies supporting these changes have not included data on surgical residents. We studied the workday of a team of plastic surgery residents to establish prospective time-study data of plastic surgery (PRS) residents at a single tertiary-care academic medical center. Five trained research assistants observed all residents (n = 8) on a PRS service for 10 weeks and produced minute-by-minute activity logs. Data collection began when the team first met in the morning and continued until the resident being followed completed all non-call activities. We analyzed our data from 3 perspectives: 1) time spent in direct patient care (DPC), indirect patient care, and didactic activities; 2) time spent in high education-value activities (HEAs) versus low education-value activities; and 3) resident efficiency. We defined HEAs as activities that surgeons must master; other activities were LEAs. We quantified resident efficiency in terms of time fragmentation and time spent waiting. A total of 642.4 hours of data across 50 workdays were collected. Excluding call, residents worked an average of 64.2 hours per week. Approximately 50.7% of surgical resident time was allotted to DPC, with surgery accounting for the largest segment of this time (34.8%). Time spent on HEAs demonstrated trended upward with higher resident level (P = 0.086). Time in spent in surgery was significantly associated with higher resident levels (P < 0.0001); 57.7% of activities require 4 minutes or less, suggesting that resident work was highly fragmented. Residents spent 10.7% of their workdays waiting for other services. In this first-time study of PRS residents, we found that compared with medicine trainees, surgical residents spent 3.23 times more time on DPC. High education-value activities comprised most of our residents' workdays. Surgery was the leading component of both DPC and HEAs. Our residents were highly efficient and fragmented, with the majority of all activities requiring 4 minutes or less. Residents spent a large portion of their time waiting for other services. In light of these data, we suggest that future changes to residency programs be pilot tested, with preimplantation and postimplementation time studies performed to quantify the changes' impact.

  14. Effect of acute and continuous morphine treatment on transcription factor expression in subregions of the rat caudate putamen. Marked modulation by D4 receptor activation.

    PubMed

    Gago, Belén; Suárez-Boomgaard, Diana; Fuxe, Kjell; Brené, Stefan; Reina-Sánchez, María Dolores; Rodríguez-Pérez, Luis M; Agnati, Luigi F; de la Calle, Adelaida; Rivera, Alicia

    2011-08-17

    Acute administration of the dopamine D(4) receptor (D(4)R) agonist PD168,077 induces a down-regulation of the μ opioid receptor (MOR) in the striosomal compartment of the rat caudate putamen (CPu), suggesting a striosomal D(4)R/MOR receptor interaction in line with their high co-distribution in this brain subregion. The present work was designed to explore if a D(4)R/MOR receptor interaction also occurs in the modulation of the expression pattern of several transcription factors in striatal subregions that play a central role in drug addiction. Thus, c-Fos, FosB/ΔFosB and P-CREB immunoreactive profiles were quantified in the rat CPu after either acute or continuous (6-day) administration of morphine and/or PD168,077. Acute and continuous administration of morphine induced different patterns of expression of these transcription factors, effects that were time-course and region dependent and fully blocked by PD168,077 co-administration. Moreover, this effect of the D(4)R agonist was counteracted by the D(4)R antagonist L745,870. Interestingly, at some time-points, combined treatment with morphine and PD168,077 substantially increased c-Fos, FosB/ΔFosB and P-CREB expression. The results of this study give indications for a general antagonistic D(4)R/MOR receptor interaction at the level of transcription factors. The change in the transcription factor expression by D(4)R/MOR interactions in turn suggests a modulation of neuronal activity in the CPu that could be of relevance for drug addiction. Copyright © 2011 Elsevier B.V. All rights reserved.

  15. Design and implementation of a UNIX based distributed computing system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Love, J.S.; Michael, M.W.

    1994-12-31

    We have designed, implemented, and are running a corporate-wide distributed processing batch queue on a large number of networked workstations using the UNIX{reg_sign} operating system. Atlas Wireline researchers and scientists have used the system for over a year. The large increase in available computer power has greatly reduced the time required for nuclear and electromagnetic tool modeling. Use of remote distributed computing has simultaneously reduced computation costs and increased usable computer time. The system integrates equipment from different manufacturers, using various CPU architectures, distinct operating system revisions, and even multiple processors per machine. Various differences between the machines have tomore » be accounted for in the master scheduler. These differences include shells, command sets, swap spaces, memory sizes, CPU sizes, and OS revision levels. Remote processing across a network must be performed in a manner that is seamless from the users` perspective. The system currently uses IBM RISC System/6000{reg_sign}, SPARCstation{sup TM}, HP9000s700, HP9000s800, and DEC Alpha AXP{sup TM} machines. Each CPU in the network has its own speed rating, allowed working hours, and workload parameters. The system if designed so that all of the computers in the network can be optimally scheduled without adversely impacting the primary users of the machines. The increase in the total usable computational capacity by means of distributed batch computing can change corporate computing strategy. The integration of disparate computer platforms eliminates the need to buy one type of computer for computations, another for graphics, and yet another for day-to-day operations. It might be possible, for example, to meet all research and engineering computing needs with existing networked computers.« less

  16. P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

    PubMed

    Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

    2017-03-14

    The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

  17. GPU acceleration of Runge Kutta-Fehlberg and its comparison with Dormand-Prince method

    NASA Astrophysics Data System (ADS)

    Seen, Wo Mei; Gobithaasan, R. U.; Miura, Kenjiro T.

    2014-07-01

    There is a significant reduction of processing time and speedup of performance in computer graphics with the emergence of Graphic Processing Units (GPUs). GPUs have been developed to surpass Central Processing Unit (CPU) in terms of performance and processing speed. This evolution has opened up a new area in computing and researches where highly parallel GPU has been used for non-graphical algorithms. Physical or phenomenal simulations and modelling can be accelerated through General Purpose Graphic Processing Units (GPGPU) and Compute Unified Device Architecture (CUDA) implementations. These phenomena can be represented with mathematical models in the form of Ordinary Differential Equations (ODEs) which encompasses the gist of change rate between independent and dependent variables. ODEs are numerically integrated over time in order to simulate these behaviours. The classical Runge-Kutta (RK) scheme is the common method used to numerically solve ODEs. The Runge Kutta Fehlberg (RKF) scheme has been specially developed to provide an estimate of the principal local truncation error at each step, known as embedding estimate technique. This paper delves into the implementation of RKF scheme for GPU devices and compares its result with Dorman Prince method. A pseudo code is developed to show the implementation in detail. Hence, practitioners will be able to understand the data allocation in GPU, formation of RKF kernels and the flow of data to/from GPU-CPU upon RKF kernel evaluation. The pseudo code is then written in C Language and two ODE models are executed to show the achievable speedup as compared to CPU implementation. The accuracy and efficiency of the proposed implementation method is discussed in the final section of this paper.

  18. Efficient spares matrix multiplication scheme for the CYBER 203

    NASA Technical Reports Server (NTRS)

    Lambiotte, J. J., Jr.

    1984-01-01

    This work has been directed toward the development of an efficient algorithm for performing this computation on the CYBER-203. The desire to provide software which gives the user the choice between the often conflicting goals of minimizing central processing (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of three types of storage is selected for each diagonal. For each storage type, an initialization sub-routine estimates the CPU and storage requirements based upon results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the resources. The three storage types employed were chosen to be efficient on the CYBER-203 for diagonals which are sparse, moderately sparse, or dense; however, for many densities, no diagonal type is most efficient with respect to both resource requirements. The user-supplied weights dictate the choice.

  19. GPU accelerated implementation of NCI calculations using promolecular density.

    PubMed

    Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

    2017-05-30

    The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  20. The Effect of NUMA Tunings on CPU Performance

    NASA Astrophysics Data System (ADS)

    Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

    2015-12-01

    Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.

  1. Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

    NASA Astrophysics Data System (ADS)

    Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

    2018-05-01

    Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.

  2. Time diary and questionnaire assessment of factors associated with academic and personal success among university undergraduates.

    PubMed

    George, Darren; Dixon, Sinikka; Stansal, Emory; Gelb, Shannon Lund; Pheri, Tabitha

    2008-01-01

    A sample of 231 students attending a private liberal arts university in central Alberta, Canada, completed a 5-day time diary and a 71-item questionnaire assessing the influence of personal, cognitive, and attitudinal factors on success. The authors used 3 success measures: cumulative grade point average (GPA), Personal Success--each participant's rating of congruence between stated goals and progress toward those goals--and Total Success--a measure that weighted GPA and Personal Success equally. The greatest predictors of GPA were time-management skills, intelligence, time spent studying, computer ownership, less time spent in passive leisure, and a healthy diet. Predictors of Personal Success scores were clearly defined goals, overall health, personal spirituality, and time-management skills. Predictors of Total Success scores were clearly defined goals, time-management skills, less time spent in passive leisure, healthy diet, waking up early, computer ownership, and less time spent sleeping. Results suggest alternatives to traditional predictors of academic success.

  3. l-fenfluramine in tests of dominance and anxiety in the rat.

    PubMed

    File, S E; Guardiola-Lemaitre, B J

    1988-01-01

    l-Fenfluramine (1.25 and 2.5 mg/kg) significantly reduced the success of dominant rats competing with untreated middle rank rats for chocolate. In resident rats, l-fenfluramine (2.5 mg/kg) significantly increased the number of submissions, and the time spent submitting, to untreated rats intruding into their home-cage territory; it also significantly reduced the number of kicks directed at, and the time spent kicking, the intruder; and the incidence of, and time spent in, aggressively grooming the intruder. When the intruder rats were treated with l-fenfluramine the only significantly change was a decrease in the number of wrestling bouts and the time spent wrestling. Since l-fenfluramine did not change other behaviours in this test (e.g. sniffing the opponent) the decrease in dominance behaviours was probably not secondary to nonspecific sedation. In the social interaction test of anxiety, l-fenfluramine (2.5 and 5 mg/kg) significantly reduced the time spent in active social interaction, and decreased motor activity. Analyses of covariance indicated that these were two independent effects. In the elevated plus-maze, l-fenfluramine (1.25-5 mg/kg) significantly decreased the percent number of entries made onto open arms, and (2.5 and 5 mg/kg) significantly decreased the percent of times spent on the open arms. The total number of arm entries was reduced by all doses (0.625-5 mg/kg). Analysis of covariance indicated that the decrease in percent of time spent on the open arms was secondary to the drop in overall activity. Thus there was no evidence of anxiolytic action in either of these tests, the changes indicating, if anything, anxiogenic effects.(ABSTRACT TRUNCATED AT 250 WORDS)

  4. Sedentary behaviours and its association with bone mass in adolescents: the HELENA cross-sectional study

    PubMed Central

    2012-01-01

    Background We aimed to examine whether time spent on different sedentary behaviours is associated with bone mineral content (BMC) in adolescents, after controlling for relevant confounders such as lean mass and objectively measured physical activity (PA), and if so, whether extra-curricular participation in osteogenic sports could have a role in this association. Methods Participants were 359 Spanish adolescents (12.5-17.5 yr, 178 boys,) from the HELENA-CSS (2006–07). Relationships of sedentary behaviours with bone variables were analysed by linear regression. The prevalence of low BMC (at least 1SD below the mean) and time spent on sedentary behaviours according to extracurricular sport participation was analysed by Chi-square tests. Results In boys, the use of internet for non-study was negatively associated with whole body BMC after adjustment for lean mass and moderate to vigorous PA (MVPA). In girls, the time spent studying was negatively associated with femoral neck BMC. Additional adjustment for lean mass slightly reduced the negative association between time spent studying and femoral neck BMC. The additional adjustment for MVPA did not change the results at this site. The percentage of girls having low femoral neck BMC was significantly smaller in those participating in osteogenic sports (≥ 3 h/week) than in the rest, independently of the cut-off selected for the time spent studying. Conclusions The use of internet for non-study (in boys) and the time spent studying (in girls) are negatively associated with whole body and femoral neck BMC, respectively. In addition, at least 3 h/week of extra-curricular osteogenic sports may help to counteract the negative association of time spent studying on bone health in girls. PMID:23148760

  5. Are environmental influences on physical activity distinct for urban, suburban, and rural schools? A multilevel study among secondary school students in Ontario, Canada.

    PubMed

    Hobin, Erin P; Leatherdale, Scott; Manske, Steve; Dubin, Joel A; Elliott, Susan; Veugelers, Paul

    2013-05-01

    This study examined differences in students' time spent in physical activity (PA) across secondary schools in rural, suburban, and urban environments and identified the environment-level factors associated with these between school differences in students' PA. Multilevel linear regression analyses were used to examine the environment- and student-level characteristics associated with time spent in PA among grades 9 to 12 students attending 76 secondary schools in Ontario, Canada, as part of the SHAPES-Ontario study. This approach was first conducted with the full data set testing for interactions between environment-level factors and school location. Then, school-location specific regression models were run separately. Statistically significant between-school variation was identified among students attending urban (σ(2) μ0  = 8959.63 [372.46]), suburban (σ(2) μ0  = 8918.75 [186.20]), and rural (σ(2) μ0  = 9403.17 [203.69]) schools, where school-level differences accounted for 4.0%, 2.0%, and 2.1% of the variability in students' time spent in PA, respectively. Students attending an urban or suburban school that provided another room for PA or was located within close proximity to a shopping mall or fast food outlet spent more time in PA. Students' time spent in PA varies by school location and some features of the school environment have a different impact on students' time spent in PA by school location. Developing a better understanding of the environment-level characteristics associated with students' time spent in PA by school location may help public health and planning experts to tailor school programs and policies to the needs of students in different locations. © 2013, American School Health Association.

  6. Photoprotection by sunscreen depends on time spent on application.

    PubMed

    Heerfordt, Ida M; Torsnes, Linnea R; Philipsen, Peter A; Wulf, Hans Christian

    2018-03-01

    To be effective, sunscreens must be applied in a sufficient quantity and reapplication is recommended. No previous study has investigated whether time spent on sunscreen application is important for the achieved photoprotection. To determine whether time spent on sunscreen application is related to the amount of sunscreen used during a first and second application. Thirty-one volunteers wearing swimwear applied sunscreen twice in a laboratory environment. Time spent and the amount of sunscreen used during each application was measured. Subjects' body surface area accessible for sunscreen application (BSA) was estimated from their height, weight and swimwear worn. The average applied quantity of sunscreen after each application was calculated. Subjects spent on average 4 minutes and 15 seconds on the first application and approximately 85% of that time on the second application. There was a linear relationship between time spent on application and amount of sunscreen used during both the first and the second application (P < .0001). Participants applied 2.21 grams of sunscreen per minute during both applications. After the first application, subjects had applied a mean quantity of sunscreen of 0.71 mg/cm 2 on the BSA, and after the second application, a mean total quantity of 1.27 mg/cm 2 had been applied. We found that participants applied a constant amount of sunscreen per minute during both a first and a second application. Measurement of time spent on application of sunscreen on different body sites may be useful in investigating the distribution of sunscreen in real-life settings. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  7. The Workloads of Secondary School Teachers. Final Report.

    ERIC Educational Resources Information Center

    Campbell, R. J.; Neill, S. R. St. J.

    This study investigated the amount of time secondary school teachers spent working and the types of work activities, based on records and survey forms from 348 teachers in secondary schools in England and Wales. Findings include: (1) the weekly mean time spent on work was 54.4 hours; (2) teachers spent an average of 16.9 hours teaching, 12.9 hours…

  8. Predicting Time Spent in Treatment in a Sample of Danish Survivors of Child Sexual Abuse.

    PubMed

    Fletcher, Shelley; Elklit, Ask; Shevlin, Mark; Armour, Cherie

    2017-07-01

    The aim of this study was to identify significant predictors of length of time spent in treatment. In a convenience sample of 439 Danish survivors of child sexual abuse, predictors of time spent in treatment were examined. Assessments were conducted on a 6-month basis over a period of 18 months. A multinomial logistic regression analysis revealed that the experience of neglect in childhood and having experienced rape at any life stage were associated with less time in treatment. Higher educational attainment and being male were associated with staying in treatment for longer periods of time. These factors may be important for identifying those at risk of terminating treatment prematurely. It is hoped that a better understanding of the factors that predict time spent in treatment will help to improve treatment outcomes for individuals who are at risk of dropping out of treatment at an early stage.

  9. Using SimCPU in Cooperative Learning Laboratories.

    ERIC Educational Resources Information Center

    Lin, Janet Mei-Chuen; Wu, Cheng-Chih; Liu, Hsi-Jen

    1999-01-01

    Reports research findings of an experimental design in which cooperative-learning strategies were applied to closed-lab instruction of computing concepts. SimCPU, a software package specially designed for closed-lab usage was used by 171 high school students of four classes. Results showed that collaboration enhanced learning and that blending…

  10. Food preparation patterns in German family households. An econometric approach with time budget data.

    PubMed

    Möser, Anke

    2010-08-01

    In Germany, the rising importance of out-of-home consumption, increasing usage of convenience products and decreasing knowledge of younger individuals how to prepare traditional dishes can be seen as obvious indicators for shifting patterns in food preparation. In this paper, econometric analyses are used to shed more light on the factors which may influence the time spent on food preparation in two-parent family households with children. Two time budget surveys, carried out 1991/92 and 2001/02 through the German National Statistical Office, provide the necessary data. Time budget data analyses reveal that over the last ten years the time spent on food preparation in Germany has decreased. The results point out that time resources of a household, for example gainful employment of the parents, significantly affect the amount of time spent on food preparation. The analysis confirms further that there is a more equal allocation of time spent on cooking, baking or laying the table between women and men in the last ten years. Due to changing attitudes and conceivably adaption of economic conditions, differences in time devoted to food preparation seem to have vanished between Eastern and Western Germany. Greater time spent on eating out in Germany as well as decreasing time spent on food preparation at home reveal that the food provisioning of families is no longer a primarily private task of the households themselves but needs more public attention and institutional offers and help. Among other points, the possibility of addressing mothers' lack of time as well as growing "food illiteracy" of children and young adults are discussed. 2010 Elsevier Ltd. All rights reserved.

  11. Deterministic Stress Modeling of Hot Gas Segregation in a Turbine

    NASA Technical Reports Server (NTRS)

    Busby, Judy; Sondak, Doug; Staubach, Brent; Davis, Roger

    1998-01-01

    Simulation of unsteady viscous turbomachinery flowfields is presently impractical as a design tool due to the long run times required. Designers rely predominantly on steady-state simulations, but these simulations do not account for some of the important unsteady flow physics. Unsteady flow effects can be modeled as source terms in the steady flow equations. These source terms, referred to as Lumped Deterministic Stresses (LDS), can be used to drive steady flow solution procedures to reproduce the time-average of an unsteady flow solution. The goal of this work is to investigate the feasibility of using inviscid lumped deterministic stresses to model unsteady combustion hot streak migration effects on the turbine blade tip and outer air seal heat loads using a steady computational approach. The LDS model is obtained from an unsteady inviscid calculation. The LDS model is then used with a steady viscous computation to simulate the time-averaged viscous solution. Both two-dimensional and three-dimensional applications are examined. The inviscid LDS model produces good results for the two-dimensional case and requires less than 10% of the CPU time of the unsteady viscous run. For the three-dimensional case, the LDS model does a good job of reproducing the time-averaged viscous temperature migration and separation as well as heat load on the outer air seal at a CPU cost that is 25% of that of an unsteady viscous computation.

  12. Advances in Mechanisms Supporting Data Collection on Future Force Networks: Product Manager C4ISR On-the-Move

    DTIC Science & Technology

    2008-12-01

    for Layer 3 data capture: NetPoll ncap tget Monitor session Radio System switch router User App interface box GPS This model applies to most fixed...developed a lightweight, custom implementation, termed ncap . As described in Section 3.1, the Ground Truth System provides a linkage between host...computer CPU time and GPS time, and ncap leverages this to perform highly precise (əmsec) time tagging of offered and received packets. Such

  13. 50 CFR 260.79 - Travel and other expenses.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... based on an hourly rate, an additional hourly charge may be made for travel time including time spent waiting for transportation as well as time spent traveling, but not to exceed 8 hours of travel time for... charge may be made for travel time outside the employee's official work hours. ...

  14. 50 CFR 260.79 - Travel and other expenses.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... based on an hourly rate, an additional hourly charge may be made for travel time including time spent waiting for transportation as well as time spent traveling, but not to exceed 8 hours of travel time for... charge may be made for travel time outside the employee's official work hours. ...

  15. 50 CFR 260.79 - Travel and other expenses.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... based on an hourly rate, an additional hourly charge may be made for travel time including time spent waiting for transportation as well as time spent traveling, but not to exceed 8 hours of travel time for... charge may be made for travel time outside the employee's official work hours. ...

  16. 50 CFR 260.79 - Travel and other expenses.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... based on an hourly rate, an additional hourly charge may be made for travel time including time spent waiting for transportation as well as time spent traveling, but not to exceed 8 hours of travel time for... charge may be made for travel time outside the employee's official work hours. ...

  17. 50 CFR 260.79 - Travel and other expenses.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... based on an hourly rate, an additional hourly charge may be made for travel time including time spent waiting for transportation as well as time spent traveling, but not to exceed 8 hours of travel time for... charge may be made for travel time outside the employee's official work hours. ...

  18. Unobtrusive in-home detection of time spent out-of-home with applications to loneliness and physical activity

    PubMed Central

    Austin, Daniel; Kaye, Jeffrey A.; Pavel, Misha; Hayes, Tamara L.

    2014-01-01

    Loneliness is a common condition in elderly associated with severe health consequences including increased mortality, decreased cognitive function, and poor quality of life. Identifying and assisting lonely individuals is therefore increasingly important—especially in the home setting—as the very nature of loneliness often makes it difficult to detect by traditional methods. One critical component in assessing loneliness unobtrusively is to measure time spent out-of-home, as loneliness often presents with decreased physical activity, decreased motor functioning, and a decline in activities of daily living, all of which may cause decreases in the amount of time spent outside the home. Using passive and unobtrusive in-home sensing technologies, we have developed a methodology for detecting time spent out-of-home based on logistic regression. Our approach was both sensitive (0.939) and specific (0.975) in detecting time out-of-home across over 41,000 epochs of data collected from 4 subjects monitored for at least 30 days each in their own homes. In addition to linking time spent out-of-home to loneliness (r=−0.44, p=0.011) as measured by the UCLA Loneliness Index, we demonstrate its usefulness in other applications such as uncovering general behavioral patterns of elderly and exploring the link between time spent out-of-home and physical activity (r=0.415, p=0.031), as measured by the Berkman Social Disengagement Index. PMID:25192570

  19. Age at Menarche and Time Spent in Education: A Mendelian Randomization Study.

    PubMed

    Gill, D; Del Greco M, F; Rawson, T M; Sivakumaran, P; Brown, A; Sheehan, N A; Minelli, C

    2017-09-01

    Menarche signifies the primary event in female puberty and is associated with changes in self-identity. It is not clear whether earlier puberty causes girls to spend less time in education. Observational studies on this topic are likely to be affected by confounding environmental factors. The Mendelian randomization (MR) approach addresses these issues by using genetic variants (such as single nucleotide polymorphisms, SNPs) as proxies for the risk factor of interest. We use this technique to explore whether there is a causal effect of age at menarche on time spent in education. Instruments and SNP-age at menarche estimates are identified from a Genome Wide Association Study (GWAS) meta-analysis of 182,416 women of European descent. The effects of instruments on time spent in education are estimated using a GWAS meta-analysis of 118,443 women performed by the Social Science Genetic Association Consortium (SSGAC). In our main analysis, we demonstrate a small but statistically significant causal effect of age at menarche on time spent in education: a 1 year increase in age at menarche is associated with 0.14 years (53 days) increase in time spent in education (95% CI 0.10-0.21 years, p = 3.5 × 10 -8 ). The causal effect is confirmed in sensitivity analyses. In identifying this positive causal effect of age at menarche on time spent in education, we offer further insight into the social effects of puberty in girls.

  20. Physical activity, sedentary time and physical capability in early old age: British birth cohort study.

    PubMed

    Cooper, Andrew J M; Simmons, Rebecca K; Kuh, Diana; Brage, Soren; Cooper, Rachel

    2015-01-01

    To investigate the associations of time spent sedentary, in moderate-to-vigorous-intensity physical activity (MVPA) and physical activity energy expenditure (PAEE) with physical capability measures at age 60-64 years. Time spent sedentary and in MVPA and, PAEE were assessed using individually calibrated combined heart rate and movement sensing among 1727 participants from the MRC National Survey of Health and Development in England, Scotland and Wales as part of a detailed clinical assessment undertaken in 2006-2010. Multivariable linear regression models were used to examine the cross-sectional associations between standardised measures of each of these behavioural variables with grip strength, chair rise and timed up-&-go (TUG) speed and standing balance time. Greater time spent in MVPA was associated with higher levels of physical capability; adjusted mean differences in each capability measure per 1 standard deviation increase in MVPA time were: grip strength (0.477 kg, 95% confidence interval (CI): 0.015 to 0.939), chair rise speed (0.429 stands/min, 95% CI: 0.093 to 0.764), standing balance time (0.028 s, 95% CI: 0.003 to 0.053) and TUG speed (0.019 m/s, 95% CI: 0.011 to 0.026). In contrast, time spent sedentary was associated with lower grip strength (-0.540 kg, 95% CI: -1.013 to -0.066) and TUG speed (-0.011 m/s, 95% CI: -0.019 to -0.004). Associations for PAEE were similar to those for MVPA. Higher levels of MVPA and overall physical activity (PAEE) are associated with greater levels of physical capability whereas time spent sedentary is associated with lower levels of capability. Future intervention studies in older adults should focus on both the promotion of physical activity and reduction in time spent sedentary.

  1. Tactical Operations Analysis Support Facility.

    DTIC Science & Technology

    1981-05-01

    Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared

  2. Fast, large-scale hologram calculation in wavelet domain

    NASA Astrophysics Data System (ADS)

    Shimobaba, Tomoyoshi; Matsushima, Kyoji; Takahashi, Takayuki; Nagahama, Yuki; Hasegawa, Satoki; Sano, Marie; Hirayama, Ryuji; Kakue, Takashi; Ito, Tomoyoshi

    2018-04-01

    We propose a large-scale hologram calculation using WAvelet ShrinkAge-Based superpositIon (WASABI), a wavelet transform-based algorithm. An image-type hologram calculated using the WASABI method is printed on a glass substrate with the resolution of 65 , 536 × 65 , 536 pixels and a pixel pitch of 1 μm. The hologram calculation time amounts to approximately 354 s on a commercial CPU, which is approximately 30 times faster than conventional methods.

  3. Quantifying faculty teaching time in a department of obstetrics and gynecology.

    PubMed

    Emmons, S

    1998-10-01

    The goal of this project was to develop a reproducible system that measures quantity and quality of teaching in unduplicated hours, such that comparisons of teaching activities could be drawn within and across departments. Such a system could be used for allocating teaching monies and for assessing teaching as part of the promotion and tenure process. Various teaching activities, including time spent in clinic, rounds, and doing procedures, were enumerated. The faculty were surveyed about their opinions on the proportion of clinical time spent in teaching. The literature also was reviewed. Based on analysis of the faculty survey and the literature, a series of calculations were developed to divide clinical time among resident teaching, medical student teaching, and patient care. The only input needed was total time spent in the various clinical activities, time spent in didactic activities, and the resident procedure database. This article describes a simple and fair database system to calculate time spent teaching from activities such as clinic, ward rounds, labor and delivery, and surgery. The teaching portfolio database calculates teaching as a proportion of the faculty member's total activities. The end product is a report that provides a reproducible yearly summary of faculty teaching time per activity and per type of learner.

  4. ELT-scale Adaptive Optics real-time control with thes Intel Xeon Phi Many Integrated Core Architecture

    NASA Astrophysics Data System (ADS)

    Jenkins, David R.; Basden, Alastair; Myers, Richard M.

    2018-05-01

    We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.

  5. Student preparation time for traditional lecture versus team-based learning in a pharmacotherapy course.

    PubMed

    DeJongh, Beth; Lemoine, Nicia; Buckley, Elizabeth; Traynor, Laura

    2018-03-01

    Determine how much time students spent preparing for traditional lecture versus team-based learning (TBL) for a pharmacotherapy course and determine if time spent in each pedagogy was within stated expectations for the course. Instructors used a combination of traditional lecture and TBL to deliver material. Before each lecture, instructors recorded the amount of time students spent preparing for each method using a one-question clicker-response survey. Instructors delivered 16 hours of TBL, 32 hours of traditional lecture, and eight hours of a mix of TBL and traditional lecture. The median of students completing the survey each week was 89. A large percentage of the class (40.9%) did not prepare for traditional lecture while only 3.4% did not prepare for TBL. About 61% of students spent between 30 min and two hours preparing for a two-hour TBL session and only 10% spent more than three hours preparing. Results of this project show students spend little time preparing for traditional lectures without in-class accountability, which may give students the perception that TBL requires too much preparation time. Copyright © 2017. Published by Elsevier Inc.

  6. Physical activity patterns in morbidly obese and normal-weight women.

    PubMed

    Kwon, Soyang; Mohammad, Jamal; Samuel, Isaac

    2011-01-01

    To compare physical activity patterns between morbidly obese and normal-weight women. Daily physical activity of 18 morbidly obese and 7 normal-weight women aged 30-58 years was measured for 2 days using the Intelligent Device for Energy Expenditure and Activity (IDEEA) device. The obese group spent about 2 hr/day less standing and 30 min/day less walking than did the normal-weight group. Time spent standing (standing time) was positively associated with time spent walking (walking time). Age- and walking time-adjusted standing time did not differ according to weight status. Promoting standing may be a strategy to increase walking.

  7. The Impact of External Employment on 12th Grade Student Participation in Extracurricular Activities as a Function of School Size

    ERIC Educational Resources Information Center

    Garcia, Miguel A.

    2012-01-01

    Data from the Educational Longitudinal Study of 2002 were used to compare 11,000 high school students on school size, time spent participating in extracurricular activities (ECA), and hours spent in employment. Findings indicated that students from small schools spent more time participating in ECA than students from larger schools for equivalent…

  8. Japanese Ubiquotous Network Project: Ubila

    NASA Astrophysics Data System (ADS)

    Ohashi, Masayoshi

    Recently, the advent of sophisticated technologies has stimulated ambient paradigms that may include high-performance CPU, compact real-time operating systems, a variety of devices/sensors, low power and high-speed radio communications, and in particular, third generation mobile phones. In addition, due to the spread of broadband ccess networks, various ubiquitous terminals and sensors can be connected closely.

  9. A new approach to flow simulation in highly heterogeneous porous media

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rame, M.; Killough, J.E.

    In this paper, applications are presented for a new numerical method - operator splittings on multiple grids (OSMG) - devised for simulations in heterogeneous porous media. A coarse-grid, finite-element pressure solver is interfaced with a fine-grid timestepping scheme. The CPU time for the pressure solver is greatly reduced and concentration fronts have minimal numerical dispersion.

  10. An Analysis of CONUS Based Deployment of Pseudolites for Positioning, Navigation and Timing (PNT) Systems

    DTIC Science & Technology

    2015-09-17

    Geostationary Satellite cMateriel» Geostationary Satellite::Re-ceive Antennas cMBI!t’iel» Geostationary •Fiba OptioC.bl.h Satellite::CPU...8217 cRsdio Ftequency Signal» .Ra v dio Fre-queno; OBI» •Fiber OplicCsbl•• cMstMiel» Geostationary Satell ite:: Transmitters cMste

  11. Quality of Learners' Time and Learning Performance beyond Quantitative Time-on-Task

    ERIC Educational Resources Information Center

    Romero, Margarida; Barbera, Elena

    2011-01-01

    Along with the amount of time spent learning (or time-on-task), the quality of learning time has a real influence on learning performance. Quality of time in online learning depends on students' time availability and their willingness to devote quality cognitive time to learning activities. However, the quantity and quality of the time spent by…

  12. Differences between 9-11 year old British Pakistani and White British girls in physical activity and behavior during school recess.

    PubMed

    Pollard, Tessa M; Hornby-Turner, Yvonne C; Ghurbhurrun, Adarshini; Ridgers, Nicola D

    2012-12-18

    School recess provides an important opportunity for children to engage in physical activity. Previous studies indicate that children and adults of South Asian origin are less active than other ethnic groups in the United Kingdom, but have not investigated whether activity differs within the shared school environment. The aim of this study was to test the hypothesis that British Pakistani girls aged 9-11 years are less active during recess than White British girls. In Study One, the proportion of recess spent by 137 White British (N = 70) and British Pakistani (N = 67) girls in sedentary behavior, moderate-to-vigorous activity (MVPA) and vigorous activity (VPA) was determined using accelerometry. In Study Two, 86 White British (N = 48) and British Pakistani (N = 38) girls were observed on the playground using the System for Observing Children's Activity and Relationships during Play (SOCARP). Accelerometry data were collected during observations to allow identification of activities contributing to objectively measured physical activity. Accelerometry data indicated that British Pakistani girls spent 2.2% (95% CI: 0.2, 4.3) less of their total recess time in MVPA and 1.3% (95% CI: 0.2, 2.4) less in VPA than White British girls. Direct observation showed that British Pakistani girls spent 12.0% (95% CI: 2.9, 21.1) less playground time being very active, and 12.3% (95% CI: 1.7, 23.0) less time playing games. Time spent being very active according to direct observation data correlated significantly with accelerometer-assessed time spent in MVPA and VPA, and time spent playing games correlated significantly with accelerometer-assessed time spent in VPA, suggesting that differences in behavior observed in Study Two may have contributed to the differences in time spent in MVPA and VPA in Study One. British Pakistani girls were less active than White British girls during school recess. Recess has been identified as a potentially important target for the delivery of physical activity interventions; such interventions should consider ways in which the activity levels of British Pakistani girls could be increased.

  13. Differences between 9–11 year old British Pakistani and White British girls in physical activity and behavior during school recess

    PubMed Central

    2012-01-01

    Background School recess provides an important opportunity for children to engage in physical activity. Previous studies indicate that children and adults of South Asian origin are less active than other ethnic groups in the United Kingdom, but have not investigated whether activity differs within the shared school environment. The aim of this study was to test the hypothesis that British Pakistani girls aged 9–11 years are less active during recess than White British girls. Methods In Study One, the proportion of recess spent by 137 White British (N = 70) and British Pakistani (N = 67) girls in sedentary behavior, moderate-to-vigorous activity (MVPA) and vigorous activity (VPA) was determined using accelerometry. In Study Two, 86 White British (N = 48) and British Pakistani (N = 38) girls were observed on the playground using the System for Observing Children’s Activity and Relationships during Play (SOCARP). Accelerometry data were collected during observations to allow identification of activities contributing to objectively measured physical activity. Results Accelerometry data indicated that British Pakistani girls spent 2.2% (95% CI: 0.2, 4.3) less of their total recess time in MVPA and 1.3% (95% CI: 0.2, 2.4) less in VPA than White British girls. Direct observation showed that British Pakistani girls spent 12.0% (95% CI: 2.9, 21.1) less playground time being very active, and 12.3% (95% CI: 1.7, 23.0) less time playing games. Time spent being very active according to direct observation data correlated significantly with accelerometer-assessed time spent in MVPA and VPA, and time spent playing games correlated significantly with accelerometer-assessed time spent in VPA, suggesting that differences in behavior observed in Study Two may have contributed to the differences in time spent in MVPA and VPA in Study One. Conclusions British Pakistani girls were less active than White British girls during school recess. Recess has been identified as a potentially important target for the delivery of physical activity interventions; such interventions should consider ways in which the activity levels of British Pakistani girls could be increased. PMID:23249170

  14. Effects of guest feeding programs on captive giraffe behavior.

    PubMed

    Orban, David A; Siegford, Janice M; Snider, Richard J

    2016-01-01

    Zoological institutions develop human-animal interaction opportunities for visitors to advance missions of conservation, education, and recreation; however, the animal welfare implications largely have yet to be evaluated. This behavioral study was the first to quantify impacts of guest feeding programs on captive giraffe behavior and welfare, by documenting giraffe time budgets that included both normal and stereotypic behaviors. Thirty giraffes from nine zoos (six zoos with varying guest feeding programs and three without) were observed using both instantaneous scan sampling and continuous behavioral sampling techniques. All data were collected during summer 2012 and analyzed using linear mixed models. The degree of individual giraffe participation in guest feeding programs was positively associated with increased time spent idle and marginally associated with reduced time spent ruminating. Time spent participating in guest feeding programs had no effect on performance of stereotypic behaviors. When time spent eating routine diets was combined with time spent participating in guest feeding programs, individuals that spent more time engaged in total feeding behaviors tended to perform less oral stereotypic behavior such as object-licking and tongue-rolling. By extending foraging time and complexity, guest feeding programs have the potential to act as environmental enrichment and alleviate unfulfilled foraging motivations that may underlie oral stereotypic behaviors observed in many captive giraffes. However, management strategies may need to be adjusted to mitigate idleness and other program consequences. Further studies, especially pre-and-post-program implementation comparisons, are needed to better understand the influence of human-animal interactions on zoo animal behavior and welfare. © 2016 Wiley Periodicals, Inc.

  15. Sedentary behavior, physical activity, and concentrations of insulin among US adults.

    PubMed

    Ford, Earl S; Li, Chaoyang; Zhao, Guixiang; Pearson, William S; Tsai, James; Churilla, James R

    2010-09-01

    Time spent watching television has been linked to obesity, metabolic syndrome, and diabetes, all conditions characterized to some degree by hyperinsulinemia and insulin resistance. However, limited evidence relates screen time (watching television or using a computer) directly to concentrations of insulin. We examined the cross-sectional associations between time spent watching television or using a computer, physical activity, and serum concentrations of insulin using data from 2800 participants aged at least 20 years of the 2003-2006 National Health and Nutrition Examination Survey. The amount of time spent watching television and using a computer as well as physical activity was self-reported. The unadjusted geometric mean concentration of insulin increased from 6.2 microU/mL among participants who did not watch television to 10.0 microU/mL among those who watched television for 5 or more hours per day (P = .001). After adjustment for age, sex, race or ethnicity, educational status, concentration of cotinine, alcohol intake, physical activity, waist circumference, and body mass index using multiple linear regression analysis, the log-transformed concentrations of insulin were significantly and positively associated with time spent watching television (P = < .001). Reported time spent using a computer was significantly associated with log-transformed concentrations of insulin before but not after accounting for waist circumference and body mass index. Leisure-time physical activity but not transportation or household physical activity was significantly and inversely associated with log-transformed concentrations of insulin. Sedentary behavior, particularly the amount of time spent watching television, may be an important modifiable determinant of concentrations of insulin. Published by Elsevier Inc.

  16. Time budgets of Snow Geese Chen caerulescens and Ross's Geese Chen rossii in mixed flocks: Implications of body size, ambient temperature and family associations

    USGS Publications Warehouse

    Jonsson, J.E.; Afton, A.D.

    2009-01-01

    Body size affects foraging and forage intake rates directly via energetic processes and indirectly through interactions with social status and social behaviour. Ambient temperature has a relatively greater effect on the energetics of smaller species, which also generally are more vulnerable to predator attacks than are larger species. We examined variability in an index of intake rates and an index of alertness in Lesser Snow Geese Chen caerulescens caerulescens and Ross's Geese Chen rossii wintering in southwest Louisiana. Specifically we examined variation in these response variables that could be attributed to species, age, family size and ambient temperature. We hypothesized that the smaller Ross's Geese would spend relatively more time feeding, exhibit relatively higher peck rates, spend more time alert or raise their heads up from feeding more frequently, and would respond to declining temperatures by increasing their proportion of time spent feeding. As predicted, we found that Ross's Geese spent more time feeding than did Snow Geese and had slightly higher peck rates than Snow Geese in one of two winters. Ross's Geese spent more time alert than did Snow Geese in one winter, but alert rates differed by family size, independent of species, in contrast to our prediction. In one winter, time spent foraging and walking was inversely related to average daily temperature, but both varied independently of species. Effects of age and family size on time budgets were generally independent of species and in accordance with previous studies. We conclude that body size is a key variable influencing time spent feeding in Ross's Geese, which may require a high time spent feeding at the expense of other activities. ?? 2008 The Authors.

  17. Time Outdoors and Physical Activity as Predictors of Incident Myopia in Childhood: A Prospective Cohort Study

    PubMed Central

    Guggenheim, Jeremy A.; Northstone, Kate; McMahon, George; Ness, Andy R.; Deere, Kevin; Mattocks, Calum; Pourcain, Beate St; Williams, Cathy

    2012-01-01

    Purpose. Time spent in “sports/outdoor activity” has shown a negative association with incident myopia during childhood. We investigated the association of incident myopia with time spent outdoors and physical activity separately. Methods. Participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) were assessed by noncycloplegic autorefraction at ages 7, 10, 11, 12, and 15 years, and classified as myopic (≤−1 diopters) or as emmetropic/hyperopic (≥−0.25 diopters) at each visit (N = 4,837–7,747). Physical activity at age 11 years was measured objectively using an accelerometer, worn for 1 week. Time spent outdoors was assessed via a parental questionnaire administered when children were aged 8–9 years. Variables associated with incident myopia were examined using Cox regression. Results. In analyses using all available data, both time spent outdoors and physical activity were associated with incident myopia, with time outdoors having the larger effect. The results were similar for analyses restricted to children classified as either nonmyopic or emmetropic/hyperopic at age 11 years. Thus, for children nonmyopic at age 11, the hazard ratio (95% confidence interval, CI) for incident myopia was 0.66 (0.47–0.93) for a high versus low amount of time spent outdoors, and 0.87 (0.76–0.99) per unit standard deviation above average increase in moderate/vigorous physical activity. Conclusion. Time spent outdoors was predictive of incident myopia independently of physical activity level. The greater association observed for time outdoors suggests that the previously reported link between “sports/outdoor activity” and incident myopia is due mainly to its capture of information relating to time outdoors rather than physical activity. PMID:22491403

  18. Measured sedentary time and physical activity during the school day of European 10- to 12-year-old children: the ENERGY project.

    PubMed

    van Stralen, Maartje M; Yıldırım, Mine; Wulp, Anouk; te Velde, Saskia J; Verloigne, Maïté; Doessegger, Alain; Androutsos, Odysseas; Kovács, Éva; Brug, Johannes; Chinapaw, Mai J M

    2014-03-01

    This study aims to describe the time devoted to sedentary and physical activities at school in five European countries and to examine differences according to country, sex, ethnicity, parental education and weight status. cross-European cross-sectional survey. Primary schoolchildren (n=1025) aged 10-12 years in Belgium, Greece, Hungary, the Netherlands, and Switzerland wore accelerometers for at least six consecutive days. Only weekdays were used for this study to calculate the percentages of school-time spent in sedentary activities and moderate to vigorous intensity activity. Trained research assistants measured height and weight. Sex and date of birth were self-reported by the child and parental education and ethnicity were parent-reported. European schoolchildren spent on average 65% of their time at school in sedentary activities and 5% on moderate to vigorous intensity activities, with small differences between countries. Girls spent a significant larger amount of school-time in sedentary activities (67%) than boys (63%; p<0.0001), and spent less time in moderate to vigorous intensity activities (4% versus 5%; p<0.001). Overweight children spent significantly less time in moderate to vigorous intensity activities (4%) than normal weight children (5%,p < 0.01) [corrected]. Parental education or ethnicity were not associated with time spent in sedentary or physical activities. European schoolchildren spend a small amount of their school-time in moderate to vigorous intensity activities and a large amount in sedentary activities, with small but significant differences across countries. Future interventions should target more physical activities and less sedentary time at school particularly in girls. Copyright © 2013 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.

  19. How Much Is Too Much to Pay for Internet Access? A Behavioral Economic Analysis of Internet Use.

    PubMed

    Broadbent, Julie; Dakki, Michelle A

    2015-08-01

    The popularity of online recreational activities, such as social networking, has dramatically increased the amount of time spent on the Internet. Excessive or inappropriate use of the Internet can result in serious adverse consequences. The current study used a behavioral economic task to determine if the amount of time spent online by problematic and nonproblematic users can be modified by price. The Internet Purchase Task was used to determine how much time undergraduate students (N=233) would spend online at 13 different prices. Despite high demand for Internet access when access was free, time spent online by both problematic and nonproblematic users decreased dramatically, even at low prices. These results suggest that the amount of time spent online may be modified by having a tangible cost associated with use, whereas having free access to the Internet may encourage excessive, problematic use.

  20. CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

    ERIC Educational Resources Information Center

    Skrein, Dale

    1994-01-01

    CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)

  1. Combustion Power Unit--400: CPU-400.

    ERIC Educational Resources Information Center

    Combustion Power Co., Palo Alto, CA.

    Aerospace technology may have led to a unique basic unit for processing solid wastes and controlling pollution. The Combustion Power Unit--400 (CPU-400) is designed as a turboelectric generator plant that will use municipal solid wastes as fuel. The baseline configuration is a modular unit that is designed to utilize 400 tons of refuse per day…

  2. An efficient and robust algorithm for two dimensional time dependent incompressible Navier-Stokes equations: High Reynolds number flows

    NASA Technical Reports Server (NTRS)

    Goodrich, John W.

    1991-01-01

    An algorithm is presented for unsteady two-dimensional incompressible Navier-Stokes calculations. This algorithm is based on the fourth order partial differential equation for incompressible fluid flow which uses the streamfunction as the only dependent variable. The algorithm is second order accurate in both time and space. It uses a multigrid solver at each time step. It is extremely efficient with respect to the use of both CPU time and physical memory. It is extremely robust with respect to Reynolds number.

  3. Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

    NASA Astrophysics Data System (ADS)

    Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

    2016-05-01

    This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.

  4. Software Defined Radio with Parallelized Software Architecture

    NASA Technical Reports Server (NTRS)

    Heckler, Greg

    2013-01-01

    This software implements software-defined radio procession over multicore, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to approx.50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.

  5. Software Defined Radio with Parallelized Software Architecture

    NASA Technical Reports Server (NTRS)

    Heckler, Greg

    2013-01-01

    This software implements software-defined radio procession over multi-core, multi-CPU systems in a way that maximizes the use of CPU resources in the system. The software treats each processing step in either a communications or navigation modulator or demodulator system as an independent, threaded block. Each threaded block is defined with a programmable number of input or output buffers; these buffers are implemented using POSIX pipes. In addition, each threaded block is assigned a unique thread upon block installation. A modulator or demodulator system is built by assembly of the threaded blocks into a flow graph, which assembles the processing blocks to accomplish the desired signal processing. This software architecture allows the software to scale effortlessly between single CPU/single-core computers or multi-CPU/multi-core computers without recompilation. NASA spaceflight and ground communications systems currently rely exclusively on ASICs or FPGAs. This software allows low- and medium-bandwidth (100 bps to .50 Mbps) software defined radios to be designed and implemented solely in C/C++ software, while lowering development costs and facilitating reuse and extensibility.

  6. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less

  7. Work stealing for GPU-accelerated parallel programs in a global address space framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less

  8. Optimizing legacy molecular dynamics software with directive-based offload

    NASA Astrophysics Data System (ADS)

    Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.

    2015-10-01

    Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.

  9. Development of Neutron Energy Spectral Signatures for Passive Monitoring of Spent Nuclear Fuels in Dry Cask Storage

    NASA Astrophysics Data System (ADS)

    Harkness, Ira; Zhu, Ting; Liang, Yinong; Rauch, Eric; Enqvist, Andreas; Jordan, Kelly A.

    2018-01-01

    Demand for spent nuclear fuel dry casks as an interim storage solution has increased globally and the IAEA has expressed a need for robust safeguards and verification technologies for ensuring the continuity of knowledge and the integrity of radioactive materials inside spent fuel casks. Existing research has been focusing on "fingerprinting" casks based on count rate statistics to represent radiation emission signatures. The current research aims to expand to include neutron energy spectral information as part of the fuel characteristics. First, spent fuel composition data are taken from the Next Generation Safeguards Initiative Spent Fuel Libraries, representative for Westinghouse 17ˣ17 PWR assemblies. The ORIGEN-S code then calculates the spontaneous fission and (α,n) emissions for individual fuel rods, followed by detailed MCNP simulations of neutrons transported through the fuel assemblies. A comprehensive database of neutron energy spectral profiles is to be constructed, with different enrichment, burn-up, and cooling time conditions. The end goal is to utilize the computational spent fuel library, predictive algorithm, and a pressurized 4He scintillator to verify the spent fuel assemblies inside a cask. This work identifies neutron spectral signatures that correlate with the cooling time of spent fuel. Both the total and relative contributions from spontaneous fission and (α,n) change noticeably with respect to cooling time, due to the relatively short half-life (18 years) of the major neutron source 244Cm. Identification of this and other neutron spectral signatures allows the characterization of spent nuclear fuels in dry cask storage.

  10. What Do Teachers Perceive as the Most Important Use of Reading Time?

    ERIC Educational Resources Information Center

    McNinch, George H.; Schaffer, Gary L.; Cambell, Patricia; Rakes, Sondra

    1999-01-01

    Considers what teachers perceive as the appropriate time allocation among the distinct instructional areas during a typical reading session. Evaluates 58 teachers using a questionnaire that investigates four questions concerning ideal use of instruction time. Suggests that time spent reading must exceed time spent talking and writing about reading…

  11. Do Physicians Spend More Time with Non-English-Speaking Patients?

    PubMed Central

    Tocher, Thomas M; Larson, Eric B

    1999-01-01

    OBJECTIVE To determine whether physicians at a general internal medicine clinic spend more time with non-English-speaking patients. DESIGN A time-motion study comparing physician time spent with non-English-speaking patients and time spent with English-speaking patients during 5 months of observation. We also tested physicians’ perceptions of their time use with a questionnaire. SETTING Primary care internal medicine clinic at a county hospital. PATIENTS/PARTICIPANTS One hundred sixty-six established clinic patients, of whom 57 were non-English speaking and 109 were English speaking, and 15 attending physicians and 8 third-year resident physicians. MEASUREMENTS AND MAIN RESULTS Outcome measures included total patient time in clinic, wait for first nurse or physician contact, time in contact with the nurse or physician, physician time spent on the visit, and physician perceptions of time use with non-English-speaking patients. After adjustment for demographic and comorbidity variables, non-English-speaking and English-speaking patients did not differ on any time-motion variables, including physician time spent on the visit (26.0 vs 25.8 minutes). A significant number of clinic physicians believed that they spent more time during a visit with non-English-speaking patients (85.7%) and needed more time to address important issues during a visit (90.4%), (both p < .01). Physicians did not perceive differences in the amount they accomplished during a visit with non-English-speaking patients. CONCLUSIONS There were no differences in the time these physicians spent providing care to non-English-speaking patients and English-speaking patients. An important limitation of this study is that we were unable to measure quality of care provided or patients’ satisfaction with their care. Physicians may believe that they are spending more time with non-English-speaking patients because of the challenges of language and cultural barriers. PMID:10337040

  12. Medical students' perceptions of their housestaffs' ability to teach physical examination skills.

    PubMed

    Smith, Miriam A; Gertler, Tracy; Freeman, Katherine

    2003-01-01

    To evaluate the amount of time housestaff spent at the bedside on physical examination skills with third-year medical students and whether housestaff enhanced physical examination skills. All Albert Einstein College of Medicine students who completed the third-year medicine inpatient clerkship at one of five participating sites evaluated housestaff (interns and residents) with whom they spent at least ten days. The students quantified the amount of time housestaff spent with them at the bedside and used a modified five-point Likert scale to evaluate housestaff's enhancement of students' physical examination skills. Data were analyzed separately for interns, but pooled for residents (years two and three). Differences between groups were tested using Wilcoxon rank-sum and by Mantel-Haenszel chi-square tests. Totals of 191 responses for interns and 166 responses for residents were collected from October 1999 to October 2000. Fifteen (8%) of the intern group and 59 (36%) of the resident group spent no time at the bedside (p <.0001). Students were most satisfied with enhancement of pulmonary, cardiovascular, and gastrointestinal skills and least satisfied with enhancement of ENT, eye, and genitourinary skills (p <.0001). Interns spent more time with students than did residents. Almost one third of the residents spent no time on physical examination skills with students. Training programs should re-emphasize the importance of housestaff's teaching at the bedside and address areas of deficiency.

  13. Fathers' and Mothers' Involvement with Their Adolescents

    ERIC Educational Resources Information Center

    Phares, Vicky; Fields, Sherecce; Kamboukos, Dimitra

    2009-01-01

    We explored mothers' and fathers' time spent with their adolescents and found that mothers reported spending more time with their adolescents than did fathers. Developmental patterns were found for some aspects of time involvement, with both mothers and fathers reporting higher involvement with younger adolescents. Ratings of time-spent were not…

  14. Validation of GPU based TomoTherapy dose calculation engine.

    PubMed

    Chen, Quan; Lu, Weiguo; Chen, Yu; Chen, Mingli; Henderson, Douglas; Sterpin, Edmond

    2012-04-01

    The graphic processing unit (GPU) based TomoTherapy convolution/superposition(C/S) dose engine (GPU dose engine) achieves a dramatic performance improvement over the traditional CPU-cluster based TomoTherapy dose engine (CPU dose engine). Besides the architecture difference between the GPU and CPU, there are several algorithm changes from the CPU dose engine to the GPU dose engine. These changes made the GPU dose slightly different from the CPU-cluster dose. In order for the commercial release of the GPU dose engine, its accuracy has to be validated. Thirty eight TomoTherapy phantom plans and 19 patient plans were calculated with both dose engines to evaluate the equivalency between the two dose engines. Gamma indices (Γ) were used for the equivalency evaluation. The GPU dose was further verified with the absolute point dose measurement with ion chamber and film measurements for phantom plans. Monte Carlo calculation was used as a reference for both dose engines in the accuracy evaluation in heterogeneous phantom and actual patients. The GPU dose engine showed excellent agreement with the current CPU dose engine. The majority of cases had over 99.99% of voxels with Γ(1%, 1 mm) < 1. The worst case observed in the phantom had 0.22% voxels violating the criterion. In patient cases, the worst percentage of voxels violating the criterion was 0.57%. For absolute point dose verification, all cases agreed with measurement to within ±3% with average error magnitude within 1%. All cases passed the acceptance criterion that more than 95% of the pixels have Γ(3%, 3 mm) < 1 in film measurement, and the average passing pixel percentage is 98.5%-99%. The GPU dose engine also showed similar degree of accuracy in heterogeneous media as the current TomoTherapy dose engine. It is verified and validated that the ultrafast TomoTherapy GPU dose engine can safely replace the existing TomoTherapy cluster based dose engine without degradation in dose accuracy.

  15. FPT- FORTRAN PROGRAMMING TOOLS FOR THE DEC VAX

    NASA Technical Reports Server (NTRS)

    Ragosta, A. E.

    1994-01-01

    The FORTRAN Programming Tools (FPT) are a series of tools used to support the development and maintenance of FORTRAN 77 source codes. Included are a debugging aid, a CPU time monitoring program, source code maintenance aids, print utilities, and a library of useful, well-documented programs. These tools assist in reducing development time and encouraging high quality programming. Although intended primarily for FORTRAN programmers, some of the tools can be used on data files and other programming languages. BUGOUT is a series of FPT programs that have proven very useful in debugging a particular kind of error and in optimizing CPU-intensive codes. The particular type of error is the illegal addressing of data or code as a result of subtle FORTRAN errors that are not caught by the compiler or at run time. A TRACE option also allows the programmer to verify the execution path of a program. The TIME option assists the programmer in identifying the CPU-intensive routines in a program to aid in optimization studies. Program coding, maintenance, and print aids available in FPT include: routines for building standard format subprogram stubs; cleaning up common blocks and NAMELISTs; removing all characters after column 72; displaying two files side by side on a VT-100 terminal; creating a neat listing of a FORTRAN source code including a Table of Contents, an Index, and Page Headings; converting files between VMS internal format and standard carriage control format; changing text strings in a file without using EDT; and replacing tab characters with spaces. The library of useful, documented programs includes the following: time and date routines; a string categorization routine; routines for converting between decimal, hex, and octal; routines to delay process execution for a specified time; a Gaussian elimination routine for solving a set of simultaneous linear equations; a curve fitting routine for least squares fit to polynomial, exponential, and sinusoidal forms (with a screen-oriented editor); a cubic spline fit routine; a screen-oriented array editor; routines to support parsing; and various terminal support routines. These FORTRAN programming tools are written in FORTRAN 77 and ASSEMBLER for interactive and batch execution. FPT is intended for implementation on DEC VAX series computers operating under VMS. This collection of tools was developed in 1985.

  16. Grey Literature Searching for Health Sciences Systematic Reviews: A Prospective Study of Time Spent and Resources Utilized.

    PubMed

    Saleh, Ahlam A; Ratajeski, Melissa A; Bertolet, Marnie

    To identify estimates of time taken to search grey literature in support of health sciences systematic reviews and to identify searcher or systematic review characteristics that may impact resource selection or time spent searching. A survey was electronically distributed to searchers embarking on a new systematic review. Characteristics of the searcher and systematic review were collected along with time spent searching and what resources were searched. Time and resources were tabulated and resources were categorized as grey or non-grey. Data was analyzed using Kruskal-Wallis tests. Out of 81 original respondents, 21% followed through with completion of the surveys in their entirety. The median time spent searching all resources was 471 minutes, and of those a median of 85 minutes were spent searching grey literature. The median number of resources used in a systematic review search was four and the median number of grey literature sources searched was two. The amount of time spent searching was influenced by whether the systematic review was grant funded. Additionally, the number of resources searched was impacted by institution type and whether systematic review training was received. This study characterized the amount of time for conducting systematic review searches including searching the grey literature, in addition to the number and types of resources used. This may aid searchers in planning their time, along with providing benchmark information for future studies. This paper contributes by quantifying current grey literature search patterns and associating them with searcher and review characteristics. Further discussion and research into the search approach for grey literature in support of systematic reviews is encouraged.

  17. Grey Literature Searching for Health Sciences Systematic Reviews: A Prospective Study of Time Spent and Resources Utilized

    PubMed Central

    Saleh, Ahlam A.; Ratajeski, Melissa A.; Bertolet, Marnie

    2015-01-01

    Objective To identify estimates of time taken to search grey literature in support of health sciences systematic reviews and to identify searcher or systematic review characteristics that may impact resource selection or time spent searching. Methods A survey was electronically distributed to searchers embarking on a new systematic review. Characteristics of the searcher and systematic review were collected along with time spent searching and what resources were searched. Time and resources were tabulated and resources were categorized as grey or non-grey. Data was analyzed using Kruskal-Wallis tests. Results Out of 81 original respondents, 21% followed through with completion of the surveys in their entirety. The median time spent searching all resources was 471 minutes, and of those a median of 85 minutes were spent searching grey literature. The median number of resources used in a systematic review search was four and the median number of grey literature sources searched was two. The amount of time spent searching was influenced by whether the systematic review was grant funded. Additionally, the number of resources searched was impacted by institution type and whether systematic review training was received. Conclusions This study characterized the amount of time for conducting systematic review searches including searching the grey literature, in addition to the number and types of resources used. This may aid searchers in planning their time, along with providing benchmark information for future studies. This paper contributes by quantifying current grey literature search patterns and associating them with searcher and review characteristics. Further discussion and research into the search approach for grey literature in support of systematic reviews is encouraged. PMID:25914722

  18. GPU-based cone beam computed tomography.

    PubMed

    Noël, Peter B; Walczak, Alan M; Xu, Jinhui; Corso, Jason J; Hoffmann, Kenneth R; Schafer, Sebastian

    2010-06-01

    The use of cone beam computed tomography (CBCT) is growing in the clinical arena due to its ability to provide 3D information during interventions, its high diagnostic quality (sub-millimeter resolution), and its short scanning times (60 s). In many situations, the short scanning time of CBCT is followed by a time-consuming 3D reconstruction. The standard reconstruction algorithm for CBCT data is the filtered backprojection, which for a volume of size 256(3) takes up to 25 min on a standard system. Recent developments in the area of Graphic Processing Units (GPUs) make it possible to have access to high-performance computing solutions at a low cost, allowing their use in many scientific problems. We have implemented an algorithm for 3D reconstruction of CBCT data using the Compute Unified Device Architecture (CUDA) provided by NVIDIA (NVIDIA Corporation, Santa Clara, California), which was executed on a NVIDIA GeForce GTX 280. Our implementation results in improved reconstruction times from minutes, and perhaps hours, to a matter of seconds, while also giving the clinician the ability to view 3D volumetric data at higher resolutions. We evaluated our implementation on ten clinical data sets and one phantom data set to observe if differences occur between CPU and GPU-based reconstructions. By using our approach, the computation time for 256(3) is reduced from 25 min on the CPU to 3.2 s on the GPU. The GPU reconstruction time for 512(3) volumes is 8.5 s. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.

  19. Using a pruned, nondirect product basis in conjunction with the multi-configuration time-dependent Hartree (MCTDH) method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wodraszka, Robert, E-mail: Robert.Wodraszka@chem.queensu.ca; Carrington, Tucker, E-mail: Tucker.Carrington@queensu.ca

    In this paper, we propose a pruned, nondirect product multi-configuration time dependent Hartree (MCTDH) method for solving the Schrödinger equation. MCTDH uses optimized 1D basis functions, called single particle functions, but the size of the standard direct product MCTDH basis scales exponentially with D, the number of coordinates. We compare the pruned approach to standard MCTDH calculations for basis sizes small enough that the latter are possible and demonstrate that pruning the basis reduces the CPU cost of computing vibrational energy levels of acetonitrile (D = 12) by more than two orders of magnitude. Using the pruned method, it ismore » possible to do calculations with larger bases, for which the cost of standard MCTDH calculations is prohibitive. Pruning the basis complicates the evaluation of matrix-vector products. In this paper, they are done term by term for a sum-of-products Hamiltonian. When no attempt is made to exploit the fact that matrices representing some of the factors of a term are identity matrices, one needs only to carefully constrain indices. In this paper, we develop new ideas that make it possible to further reduce the CPU time by exploiting identity matrices.« less

  20. Performance Analysis of the NAS Y-MP Workload

    NASA Technical Reports Server (NTRS)

    Bergeron, Robert J.; Kutler, Paul (Technical Monitor)

    1997-01-01

    This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.

  1. A fast - Monte Carlo toolkit on GPU for treatment plan dose recalculation in proton therapy

    NASA Astrophysics Data System (ADS)

    Senzacqua, M.; Schiavi, A.; Patera, V.; Pioli, S.; Battistoni, G.; Ciocca, M.; Mairani, A.; Magro, G.; Molinelli, S.

    2017-10-01

    In the context of the particle therapy a crucial role is played by Treatment Planning Systems (TPSs), tools aimed to compute and optimize the tratment plan. Nowadays one of the major issues related to the TPS in particle therapy is the large CPU time needed. We developed a software toolkit (FRED) for reducing dose recalculation time by exploiting Graphics Processing Units (GPU) hardware. Thanks to their high parallelization capability, GPUs significantly reduce the computation time, up to factor 100 respect to a standard CPU running software. The transport of proton beams in the patient is accurately described through Monte Carlo methods. Physical processes reproduced are: Multiple Coulomb Scattering, energy straggling and nuclear interactions of protons with the main nuclei composing the biological tissues. FRED toolkit does not rely on the water equivalent translation of tissues, but exploits the Computed Tomography anatomical information by reconstructing and simulating the atomic composition of each crossed tissue. FRED can be used as an efficient tool for dose recalculation, on the day of the treatment. In fact it can provide in about one minute on standard hardware the dose map obtained combining the treatment plan, earlier computed by the TPS, and the current patient anatomic arrangement.

  2. Numerical study of the effects of icing on viscous flow over wings

    NASA Technical Reports Server (NTRS)

    Sankar, L. N.

    1994-01-01

    An improved hybrid method for computing unsteady compressible viscous flows is presented. This method divides the computational domain into two zones. In the outer zone, the unsteady full-potential equation (FPE) is solved. In the inner zone, the Navier-Stokes equations are solved using a diagonal form of an alternating-direction implicit (ADI) approximate factorization procedure. The two zones are tightly coupled so that steady and unsteady flows may be efficiently solved. Characteristic-based viscous/inviscid interface boundary conditions are employed to avoid spurious reflections at that interface. The resulting CPU times are less than 60 percent of that required for a full-blown Navier-Stokes analysis for steady flow applications and about 60 percent of the Navier-Stokes CPU times for unsteady flows in non-vector processing machines. Applications of the method are presented for a rectangular NACA 0012 wing in low subsonic steady flow at moderate and high angles of attack, and for an F-5 wing in steady and unsteady subsonic and transonic flows. Steady surface pressures are in very good agreement with experimental data and are essentially identical to Navier-Stokes predictions. Density contours show that shocks cross the viscous/inviscid interface smoothly, so that the accuracy of full Navier-Stokes equations can be retained with a significant savings in computational time.

  3. Planning for distributed workflows: constraint-based coscheduling of computational jobs and data placement in distributed environments

    NASA Astrophysics Data System (ADS)

    Makatun, Dzmitry; Lauret, Jérôme; Rudová, Hana; Šumbera, Michal

    2015-05-01

    When running data intensive applications on distributed computational resources long I/O overheads may be observed as access to remotely stored data is performed. Latencies and bandwidth can become the major limiting factor for the overall computation performance and can reduce the CPU/WallTime ratio to excessive IO wait. Reusing the knowledge of our previous research, we propose a constraint programming based planner that schedules computational jobs and data placements (transfers) in a distributed environment in order to optimize resource utilization and reduce the overall processing completion time. The optimization is achieved by ensuring that none of the resources (network links, data storages and CPUs) are oversaturated at any moment of time and either (a) that the data is pre-placed at the site where the job runs or (b) that the jobs are scheduled where the data is already present. Such an approach eliminates the idle CPU cycles occurring when the job is waiting for the I/O from a remote site and would have wide application in the community. Our planner was evaluated and simulated based on data extracted from log files of batch and data management systems of the STAR experiment. The results of evaluation and estimation of performance improvements are discussed in this paper.

  4. A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.

    PubMed

    Peng, Chao; Sahani, Sandip; Rushing, John

    2017-10-01

    We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.

  5. Automatic optimization of well locations in a North Sea fractured chalk reservoir using a front tracking reservoir simulator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rian, D.T.; Hage, A.

    1994-12-31

    A numerical simulator is often used as a reservoir management tool. One of its main purposes is to aid in the evaluation of number of wells, well locations and start time for wells. Traditionally, the optimization of a field development is done by a manual trial and error process. In this paper, an example of an automated technique is given. The core in the automization process is the reservoir simulator Frontline. Frontline is based on front tracking techniques, which makes it fast and accurate compared to traditional finite difference simulators. Due to its CPU-efficiency the simulator has been coupled withmore » an optimization module, which enables automatic optimization of location of wells, number of wells and start-up times. The simulator was used as an alternative method in the evaluation of waterflooding in a North Sea fractured chalk reservoir. Since Frontline, in principle, is 2D, Buckley-Leverett pseudo functions were used to represent the 3rd dimension. The area full field simulation model was run with up to 25 wells for 20 years in less than one minute of Vax 9000 CPU-time. The automatic Frontline evaluation indicated that a peripheral waterflood could double incremental recovery compared to a central pattern drive.« less

  6. Acceleration of discrete stochastic biochemical simulation using GPGPU.

    PubMed

    Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

    2015-01-01

    For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130.

  7. Acceleration of discrete stochastic biochemical simulation using GPGPU

    PubMed Central

    Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

    2015-01-01

    For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130. PMID:25762936

  8. Parental care in Tundra Swans during the pre-fledgling period

    USGS Publications Warehouse

    Earnst, Susan L.

    2002-01-01

    Among studies that have quantified the care of precocial young, few have investigated forms of parental care other than vigilance. During the pre-fledging period, Tundra Swan (Cygnus columbianus columbianus) parents provided simultaneous biparental care by foraging near each other and their cygnets, and cygnets spent more time foraging during bouts in which both parents were foraging nearby than when only one parent was foraging nearby. Parents spent nearly twice as much foraging time on land than did non-parents, a habitat in which cygnets foraged more intensely than parents (i.e., spent more time foraging during foraging bouts) and could graze on protein-rich sedges rather than use more difficult below-water foraging methods. Parents also spent more than twice as much time being vigilant and more than three times as much time defending their territory than non-parents, behaviors that presumably benefited cygents by decreasing predation risk and indirect foraging competition, respectively. Parents therefore incurred the costs of foraging less intensely during foraging bouts, spending more time interacting, more time in vigilance, and less time sleeping/preening than non-parents.

  9. Changes in time-use and drug use by young adults in poor neighbourhoods of Greater Buenos Aires, Argentina, after the political transitions of 2001-2002: Results of a survey

    PubMed Central

    2011-01-01

    Background In some countries, "Big Events" like crises and transitions have been followed by large increases in drug use, drug injection and HIV/AIDS. Argentina experienced an economic crisis and political transition in 2001/2002 that affected how people use their time. This paper studies how time use changes between years 2001 and 2004, subsequent to these events, were associated with drug consumption in poor neighbourhoods of Greater Buenos Aires. Methods In 2003-2004, 68 current injecting drug users (IDUs) and 235 young non-IDUs, aged 21-35, who lived in impoverished drug-impacted neighbourhoods in Greater Buenos Aires, were asked about time use then and in 2001. Data on weekly hours spent working or looking for work, doing housework/childcare, consuming drugs, being with friends, and hanging out in the neighbourhood, were studied in relation to time spent using drugs. Field observations and focus groups were also conducted. Results After 2001, among both IDUs and non-IDUs, mean weekly time spent working declined significantly (especially among IDUs); time spent looking for work increased, and time spent with friends and hanging out in the neighbourhood decreased. We found no increase in injecting or non-injecting drug consumption after 2001. Subjects most affected by the way the crises led to decreased work time and/or to increased time looking for work--and by the associated increase in time spent in one's neighbourhood--were most likely to increase their time using drugs. Conclusions Time use methods are useful to study changes in drug use and their relationships to every day life activities. In these previously-drug-impacted neighbourhoods, the Argentinean crisis did not lead to an increase in drug use, which somewhat contradicts our initial expectations. Nevertheless, those for whom the crises led to decreased work time, increased time looking for work, and increased time spent in indoor or outdoor neighbourhood environments, were likely to spend more time using drugs. These data suggest that young adults in traditionally less-impoverished neighbourhoods may be more vulnerable to Big Events than those in previously drug-impacted impoverished neighbourhoods. Since Big Events will continue to occur, research on the pathways that determine their sequelae is needed. PMID:21251290

  10. Changes in time-use and drug use by young adults in poor neighbourhoods of Greater Buenos Aires, Argentina, after the political transitions of 2001-2002: Results of a survey.

    PubMed

    Rossi, Diana; Zunino Singh, Dhan; Pawlowicz, María Pía; Touzé, Graciela; Bolyard, Melissa; Mateu-Gelabert, Pedro; Sandoval, Milagros; Friedman, Samuel R

    2011-01-20

    In some countries, "Big Events" like crises and transitions have been followed by large increases in drug use, drug injection and HIV/AIDS. Argentina experienced an economic crisis and political transition in 2001/2002 that affected how people use their time. This paper studies how time use changes between years 2001 and 2004, subsequent to these events, were associated with drug consumption in poor neighbourhoods of Greater Buenos Aires. In 2003-2004, 68 current injecting drug users (IDUs) and 235 young non-IDUs, aged 21-35, who lived in impoverished drug-impacted neighbourhoods in Greater Buenos Aires, were asked about time use then and in 2001. Data on weekly hours spent working or looking for work, doing housework/childcare, consuming drugs, being with friends, and hanging out in the neighbourhood, were studied in relation to time spent using drugs. Field observations and focus groups were also conducted. After 2001, among both IDUs and non-IDUs, mean weekly time spent working declined significantly (especially among IDUs); time spent looking for work increased, and time spent with friends and hanging out in the neighbourhood decreased.We found no increase in injecting or non-injecting drug consumption after 2001. Subjects most affected by the way the crises led to decreased work time and/or to increased time looking for work--and by the associated increase in time spent in one's neighbourhood--were most likely to increase their time using drugs. Time use methods are useful to study changes in drug use and their relationships to every day life activities. In these previously-drug-impacted neighbourhoods, the Argentinean crisis did not lead to an increase in drug use, which somewhat contradicts our initial expectations. Nevertheless, those for whom the crises led to decreased work time, increased time looking for work, and increased time spent in indoor or outdoor neighbourhood environments, were likely to spend more time using drugs. These data suggest that young adults in traditionally less-impoverished neighbourhoods may be more vulnerable to Big Events than those in previously drug-impacted impoverished neighbourhoods. Since Big Events will continue to occur, research on the pathways that determine their sequelae is needed.

  11. Implementation of ADI: Schemes on MIMD parallel computers

    NASA Technical Reports Server (NTRS)

    Vanderwijngaart, Rob F.

    1993-01-01

    In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.

  12. Invasive treatment of NSTEMI patients in German Chest Pain Units - Evidence for a treatment paradox.

    PubMed

    Schmidt, Frank P; Schmitt, Claus; Hochadel, Matthias; Giannitsis, Evangelos; Darius, Harald; Maier, Lars S; Schmitt, Claus; Heusch, Gerd; Voigtländer, Thomas; Mudra, Harald; Gori, Tommaso; Senges, Jochen; Münzel, Thomas

    2018-03-15

    Patients with non ST-segment elevation myocardial infarction (NSTEMI) represent the largest fraction of patients with acute coronary syndrome in German Chest Pain units. Recent evidence on early vs. selective percutaneous coronary intervention (PCI) is ambiguous with respect to effects on mortality, myocardial infarction (MI) and recurrent angina. With the present study we sought to investigate the prognostic impact of PCI and its timing in German Chest Pain Unit (CPU) NSTEMI patients. Data from 1549 patients whose leading diagnosis was NSTEMI were retrieved from the German CPU registry for the interval between 3/2010 and 3/2014. Follow-up was available at median of 167days after discharge. The patients were grouped into a higher (Group A) and lower risk group (Group B) according to GRACE score and additional criteria on admission. Group A had higher Killip classes, higher BNP levels, reduced EF and significant more triple vessel disease (p<0.001). Surprisingly, patients in group A less frequently received early diagnostic catheterization and PCI. While conservative management did not affect prognosis in Group B, higher-risk CPU-NSTEMI patients without PCI had a significantly worse survival. The present results reveal a substantial treatment gap in higher-risk NSTEMI patients in German Chest Pain Units. This treatment paradox may worsen prognosis in patients who could derive the largest benefit from early revascularization. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  13. A survey of time management and particular tasks undertaken by consultant microbiologists in the UK.

    PubMed

    Riordan, Terry; Cartwright, Keith; Cunningham, Richard; Logan, Margaret; Wright, Paul

    2007-05-01

    Medical microbiology practice encompasses a diverse range of activities. Consultant medical microbiologists (CMMs) attribute widely differing priorities to, and spend differing proportions of time on various components of the job. To obtain a professional consensus on what are high-priority and low-priority activities, and to identify the time spent on low-priority activities. National survey. Many respondents felt that time spent on report authorisation and telephoning of results was excessive, whereas time spent on ward-based work was inadequate. Timesaving could also be achieved through better prioritisation of infection-control activities. CMMs should apportion their time at work focusing on high-priority activities identified through professional consensus.

  14. Indoor and Outdoor Context-Specific Contributions to Early Adolescent Moderate to Vigorous Physical Activity as Measured by Combined Diary, Accelerometer, and GPS.

    PubMed

    Pearce, Matthew; Saunders, David H; Allison, Peter; Turner, Anthony P

    2018-01-01

    The distribution of adolescent moderate to vigorous physical activity (MVPA) across multiple contexts is unclear. This study examined indoor and outdoor leisure time in terms of being structured or unstructured and explored relationships with total daily MVPA. Between September 2012 and January 2014, 70 participants (aged 11-13 y) from 4 schools in Edinburgh wore an accelerometer and global positioning system receiver over 7 days, reporting structured physical activity using a diary. Time spent and MVPA were summarized according to indoor/outdoor location and whether activity was structured/unstructured. Independent associations between context-specific time spent and total daily MVPA were examined using a multivariate linear regression model. Very little time or MVPA was recorded in structured contexts. Unstructured outdoor leisure time was associated with an increase in total daily MVPA almost twice that of unstructured indoor leisure time [b value (95% confidence interval), 8.45 (1.71 to 14.48) vs 4.38 (0.20 to 8.22) minute increase per hour spent]. The association was stronger for time spent in structured outdoor leisure time [35.81 (20.60 to 52.27)]. Research and interventions should focus on strategies to facilitate time outdoors during unstructured leisure time and maximize MVPA once youth are outdoors. Increasing the proportion of youth engaging in structured activity may be beneficial given that, although time spent was limited, association with MVPA was strongest.

  15. High Capacity Single Table Performance Design Using Partitioning in Oracle or PostgreSQL

    DTIC Science & Technology

    2012-03-01

    Indicators ( KPIs ) 13  5.  Conclusion 14  List of Symbols, Abbreviations, and Acronyms 15  Distribution List 16 iv List of Figures Figure 1. Oracle...Figure 7. Time to seek and return one record. 4. Additional Key Performance Indicators ( KPIs ) In addition to pure response time, there are other...Laboratory ASM Automatic Storage Management CPU central processing unit I/O input/output KPIs key performance indicators OS operating system

  16. A mass, momentum, and energy conserving, fully implicit, scalable algorithm for the multi-dimensional, multi-species Rosenbluth-Fokker-Planck equation

    NASA Astrophysics Data System (ADS)

    Taitano, W. T.; Chacón, L.; Simakov, A. N.; Molvig, K.

    2015-09-01

    In this study, we demonstrate a fully implicit algorithm for the multi-species, multidimensional Rosenbluth-Fokker-Planck equation which is exactly mass-, momentum-, and energy-conserving, and which preserves positivity. Unlike most earlier studies, we base our development on the Rosenbluth (rather than Landau) form of the Fokker-Planck collision operator, which reduces complexity while allowing for an optimal fully implicit treatment. Our discrete conservation strategy employs nonlinear constraints that force the continuum symmetries of the collision operator to be satisfied upon discretization. We converge the resulting nonlinear system iteratively using Jacobian-free Newton-Krylov methods, effectively preconditioned with multigrid methods for efficiency. Single- and multi-species numerical examples demonstrate the advertised accuracy properties of the scheme, and the superior algorithmic performance of our approach. In particular, the discretization approach is numerically shown to be second-order accurate in time and velocity space and to exhibit manifestly positive entropy production. That is, H-theorem behavior is indicated for all the examples we have tested. The solution approach is demonstrated to scale optimally with respect to grid refinement (with CPU time growing linearly with the number of mesh points), and timestep (showing very weak dependence of CPU time with time-step size). As a result, the proposed algorithm delivers several orders-of-magnitude speedup vs. explicit algorithms.

  17. GPU-based prompt gamma ray imaging from boron neutron capture therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoon, Do-Kun; Jung, Joo-Young; Suk Suh, Tae, E-mail: suhsanta@catholic.ac.kr

    Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusions: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray image reconstruction using the GPU computation for BNCT simulations.« less

  18. TU-FG-BRB-07: GPU-Based Prompt Gamma Ray Imaging From Boron Neutron Capture Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, S; Suh, T; Yoon, D

    Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU).more » Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusion: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray reconstruction using the GPU computation for BNCT simulations.« less

  19. Evaluating Academic Journals Using Impact Factor and Local Citation Score

    ERIC Educational Resources Information Center

    Chung, Hye-Kyung

    2007-01-01

    This study presents a method for journal collection evaluation using citation analysis. Cost-per-use (CPU) for each title is used to measure cost-effectiveness with higher CPU scores indicating cost-effective titles. Use data are based on the impact factor and locally collected citation score of each title and is compared to the cost of managing…

  20. Can methadone maintenance for heroin-dependent patients retained in general practice reduce criminal conviction rates and time spent in prison?

    PubMed Central

    Keen, J; Rowse, G; Mathers, N; Campbell, M; Seivewright, N

    2000-01-01

    A retrospective analysis was made of the criminal records of 57 patients successfully retained in methadone maintenance at two general practices in Sheffield. Their criminal conviction rates and time spent in prison per year were compared for the periods before and after the start of their methadone programme. Overall, patients retained on methadone programmes in the general practices studied had significantly fewer convictions and cautions, and spent significantly less time in prison than they had before the start of treatment. PMID:10695069

  1. 7 CFR 52.50 - Travel and other expenses.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... and other expenses. Charges may be made to cover the cost of travel time incurred in connection with... hour. This includes time spent waiting for transportation as well as time spent traveling, but not to exceed eight hours of travel time for any one person for any one day: And provided further, that if...

  2. 7 CFR 52.50 - Travel and other expenses.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... and other expenses. Charges may be made to cover the cost of travel time incurred in connection with... hour. This includes time spent waiting for transportation as well as time spent traveling, but not to exceed eight hours of travel time for any one person for any one day: And provided further, that if...

  3. 7 CFR 52.50 - Travel and other expenses.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... and other expenses. Charges may be made to cover the cost of travel time incurred in connection with... hour. This includes time spent waiting for transportation as well as time spent traveling, but not to exceed eight hours of travel time for any one person for any one day: And provided further, that if...

  4. Effect of Quality and Quantity of Study on Student Grades.

    ERIC Educational Resources Information Center

    Dickinson, Donald J.; O'Connell, Debra Q.

    1990-01-01

    Findings from a study which examined the relationship between study time and test scores indicate that time spent organizing had a stronger relationship with course test scores than did total study time or time spent reading and reviewing. Subjects were 113 undergraduates who kept daily self-monitoring logs of study activities. (IAH)

  5. Using Time-on-Task Measurements to Understand Student Performance in a Physics Class: A Ten-Year Study

    NASA Astrophysics Data System (ADS)

    Stewart, John

    2015-04-01

    The amount of time spent on out-of-class activities such as working homework, reading, and studying for examinations is presented for 10 years of an introductory, calculus-based physics class at a large public university. While the class underwent significant change in the 10 years studied, the amount of time invested by students in weeks not containing an in-semester examination was constant and did not vary with the length of the reading or homework assignments. The amount of time spent preparing for examinations did change as the course was modified. The time spent on class assignments, both reading and homework, did not scale linearly with the length of the assignment. The time invested in both reading and homework per length of the assignment decreased as the assignments became longer. The class average time invested in examination preparation did change with the average performance on previous examinations in the same class, with more time spent in preparation for lower previous examination scores (R2 = 0 . 70).

  6. Physical inactivity post-stroke: a 3-year longitudinal study.

    PubMed

    Kunkel, Dorit; Fitton, Carolyn; Burnett, Malcolm; Ashburn, Ann

    2015-01-01

    To explore change in activity levels post-stroke. We measured activity levels using the activPAL™ in hospital and at 1, 2 and 3 years' post-stroke onset. Of the 74 participants (mean age 76 (SD 11), 39 men), 61 were assessed in hospital: 94% of time was spent in sitting/lying, 4% standing and 2% walking. Activity levels improved over time (complete cases n = 15); time spent sitting/lying decreased (p = 0.001); time spent standing, walking and number of steps increased (p = 0.001, p = 0.028 and p = 0.03, respectively). At year 3, 18% of time was spent in standing and 9% walking. Time spent upright correlated significantly with Barthel (r = 0.69 on admission, r = 0.68 on discharge, both p < 0.01) and functional ambulation category scores (r = 0.55 on admission, 0.63 on discharge, both p < 0.05); correlations remained significant at all assessment points. Depression (in hospital), left hemisphere infarction (Years 1-2), visual neglect (Year 2), poor mobility and balance (Years 1-3) correlated with poorer activity levels. People with stroke were inactive for the majority of time. Time spent upright improved significantly by 1 year post-stroke; improvements slowed down thereafter. Poor activity levels correlated with physical and psychological measures. Larger studies are indicated to identify predictors of activity levels. Implications for Rehabilitation Activity levels (measured using activPAL™ activity monitor), increased significantly by 1 year post-stroke but improvements slowed down at 2 and 3 years. People with stroke were inactive for the majority of their day in hospital and in the community. Poor activity levels correlated with physical and psychological measures. Larger studies are indicated to identify the most important predictors of activity levels.

  7. Objectively measured sedentary time and physical activity in women with fibromyalgia: a cross-sectional study.

    PubMed

    Ruiz, Jonatan R; Segura-Jiménez, Víctor; Ortega, Francisco B; Alvarez-Gallardo, Inmaculada C; Camiletti-Moirón, Daniel; Aparicio, Virginia A; Carbonell-Baeza, Ana; Femia, Pedro; Munguía-Izquierdo, Diego; Delgado-Fernández, Manuel

    2013-06-20

    To characterise levels of objectively measured sedentary time and physical activity in women with fibromyalgia. Cross-sectional study. Local Association of Fibromyalgia (Granada, Spain). The study comprised 94 women with diagnosed fibromyalgia who did not have other severe somatic or psychiatric disorders, or other diseases that prevent physical loading, able to ambulate and to communicate and capable and willing to provide informed consent. Sedentary time and physical activity were measured by accelerometry and expressed as time spent in sedentary behaviours, average physical activity intensity (counts/minute) and amount of time (minutes/day) spent in moderate intensity and in moderate-to-vigorous-intensity physical activity (MVPA). The proportion of women meeting the physical activity recommendations of 30 min/day of MVPA on 5 or more days a week was 60.6%. Women spent, on average, 71% of their waking time (approximately 10 h/day) in sedentary behaviours. Both sedentary behaviour and physical activity levels were similar across age groups, waist circumference and percentage body fat categories, years since clinical diagnosis, marital status, educational level and occupational status, regardless of the severity of the disease (all p>0.1). Time spent on moderate-intensity physical activity and MVPA was, however, lower in those with greater body mass index (BMI) (-6.6 min and -7 min, respectively, per BMI category increase, <25, 25-30, >30 kg/m(2); p values for trend were 0.056 and 0.051, respectively). Women spent, on average, 10 min less on MVPA (p<0.001) and 22 min less on sedentary behaviours during weekends compared with weekdays (p=0.051). These data provide an objective measure of the amount of time spent on sedentary activities and on physical activity in women with fibromyalgia.

  8. Trends in marriage and time spent single in sub-Saharan Africa: a comparative analysis of six population-based cohort studies and nine Demographic and Health Surveys.

    PubMed

    Marston, M; Slaymaker, E; Cremin, I; Floyd, S; McGrath, N; Kasamba, I; Lutalo, T; Nyirenda, M; Ndyanabo, A; Mupambireyi, Z; Zaba, B

    2009-04-01

    To describe trends in age at first sex (AFS), age at first marriage (AFM) and time spent single between events and to compare age-specific trends in marital status in six cohort studies. Cohort data from Uganda, Tanzania, South Africa, Zimbabwe and Malawi and Demographic and Health Survey (DHS) data from Uganda, Tanzania and Zimbabwe were analysed. Life table methods were used to calculate median AFS, AFM and time spent single. In each study, two surveys were chosen to compare marital status by age and identify changes over time. Median AFM was much higher in South Africa than in the other sites. Between the other populations there were considerable differences in median AFS and AFM (AFS 17-19 years for men and 16-19 years for women, AFM 21-24 years and 18-19 years, respectively, for the 1970-9 birth cohort). In all surveys, men reported a longer time spent single than women (median 4-7 years for men and 0-2 years for women). Median years spent single for women has increased, apart from in Manicaland. For men in Rakai it has decreased slightly over time but increased in Kisesa and Masaka. The DHS data showed similar trends to those in the cohort data. The age-specific proportion of married individuals has changed little over time. Median AFS, AFM and time spent single vary considerably among these populations. These three measures are underlying determinants of sexual risk and HIV infection, and they may partially explain the variation in HIV prevalence levels between these populations.

  9. Trends in marriage and time spent single in sub-Saharan Africa: a comparative analysis of six population-based cohort studies and nine Demographic and Health Surveys

    PubMed Central

    Marston, M; Slaymaker, E; Cremin, I; Floyd, S; McGrath, N; Kasamba, I; Lutalo, T; Nyirenda, M; Ndyanabo, A; Mupambireyi, Z; Żaba, B

    2009-01-01

    Objectives: To describe trends in age at first sex (AFS), age at first marriage (AFM) and time spent single between events and to compare age-specific trends in marital status in six cohort studies. Methods: Cohort data from Uganda, Tanzania, South Africa, Zimbabwe and Malawi and Demographic and Health Survey (DHS) data from Uganda, Tanzania and Zimbabwe were analysed. Life table methods were used to calculate median AFS, AFM and time spent single. In each study, two surveys were chosen to compare marital status by age and identify changes over time. Results: Median AFM was much higher in South Africa than in the other sites. Between the other populations there were considerable differences in median AFS and AFM (AFS 17–19 years for men and 16–19 years for women, AFM 21–24 years and 18–19 years, respectively, for the 1970–9 birth cohort). In all surveys, men reported a longer time spent single than women (median 4–7 years for men and 0–2 years for women). Median years spent single for women has increased, apart from in Manicaland. For men in Rakai it has decreased slightly over time but increased in Kisesa and Masaka. The DHS data showed similar trends to those in the cohort data. The age-specific proportion of married individuals has changed little over time. Conclusions: Median AFS, AFM and time spent single vary considerably among these populations. These three measures are underlying determinants of sexual risk and HIV infection, and they may partially explain the variation in HIV prevalence levels between these populations. PMID:19307343

  10. Associations between active commuting and physical activity in working adults: Cross-sectional results from the Commuting and Health in Cambridge study

    PubMed Central

    Yang, Lin; Panter, Jenna; Griffin, Simon J.; Ogilvie, David

    2012-01-01

    Objective To quantify the association between time spent in active commuting and in moderate to vigorous physical activity (MVPA) in a sample of working adults living in both urban and rural locations. Methods In 2009, participants in the Commuting and Health in Cambridge study were sent questionnaires enquiring about sociodemographic characteristics and weekly time spent in active commuting. They were also invited to wear an accelerometer for seven days. Accelerometer data were used to compute the time spent in MVPA. Multiple regression models were used to examine the association between time spent in active commuting and MVPA. Results 475 participants (70% female) provided valid data. On average, participants recorded 55 (SD: 23.02) minutes of MVPA per day. For women, reporting 150 or more minutes of active commuting per week was associated with an estimated 8.50 (95% CI: 1.75 to 51.26, p = 0.01) additional minutes of daily MVPA compared to those who reported no time in active commuting. No overall associations were found in men. Conclusions Promoting active commuting might be an important way of increasing levels of physical activity, particularly in women. Further research should assess whether increases in time spent in active commuting are associated with increases in physical activity. PMID:22964003

  11. One-day quantitative cross-sectional study of family information time in 90 intensive care units in France.

    PubMed

    Fassier, Thomas; Darmon, Michel; Laplace, Christian; Chevret, Sylvie; Schlemmer, Benoit; Pochard, Frédéric; Azoulay, Elie

    2007-01-01

    Providing family members with clear, honest, and timely information is a major task for intensive care unit physicians. Time spent informing families has been associated with effectiveness of information but has not been measured in specifically designed studies. To measure time spent informing families of intensive care unit patients. One-day cross-sectional study in 90 intensive care units in France. Clocked time spent by physicians informing the families of each of 951 patients hospitalized in the intensive care unit during a 24-hr period. Median family information time was 16 (interquartile range, 8-30) mins per patient, with 20% of the time spent explaining the diagnosis, 20% on explaining treatments, and 60% on explaining the prognosis. One third of the time was spent listening to family members. Multivariable analysis identified one factor associated with less information time (room with more than one bed) and seven factors associated with more information time, including five patient-related factors (surgery on the study day, higher Logistic Organ Dysfunction score, coma, mechanical ventilation, and worsening clinical status) and two family-related factors (first contact with family and interview with the spouse). Median information time was 20 (interquartile range, 10-39) mins when three factors were present and 106.5 (interquartile range, 103-110) mins when five were present. This study identifies factors associated with information time provided by critical care physicians to family members of critically ill patients. Whether information time correlates with communication difficulties or communication skills needs to be evaluated. Information time provided by residents and nurses should be studied.

  12. The effect of urban and rural habitats and resource type on activity budgets of commensal rhesus macaques (Macaca mulatta) in Bangladesh.

    PubMed

    Jaman, M Firoj; Huffman, Michael A

    2013-01-01

    Macaques are characterized by their wide distribution and ability to adapt to a variety of habitats. Activity budgets are affected by habitat type, season, and food availability in relation to differing age-sex class and individual requirements. We conducted a comparative study on two commensal rhesus groups, one living in a rural village and the other in the center of urban Dhaka, Bangladesh. The study was conducted in three different seasons between 2007 and 2009 in order to evaluate how habitat type and season affects their behavioral activities. Differences in food type and its availability between these two habitats were mainly responsible for the variations in activity budgets between groups. Feeding time in the rural group was significantly longer than that in the urban group. In contrast, grooming and object manipulation/play were significantly greater in the urban than the rural group. Seasonal variations in all major behaviors were significantly affected by group, with more time spent feeding in summer than in winter/dry season, and more time spent grooming and moving in winter/dry season than summer in the rural group. In contrast, time spent resting was greater in the monsoon and summer seasons than the winter/dry season in the urban group. Grooming time was greater in the winter/dry season than the monsoon and summer seasons. In both groups, immature of both sexes spent significantly more time on feeding and object manipulation/playing and less time resting than adults. Adult females spent more time grooming than males and immatures, of both sexes, in both groups. Moreover, the rural group spent most of their time feeding on garden/crop produce and wild plant food resources, while the urban group spent more time feeding on provisioned foods. These results showed that differences in the activity budgets of rural and urban dwelling macaques were due largely to the differences in available food resources. Commensal rhesus macaques show a high degree of behavioral flexibility in response to habitat and resource variability, and knowledge of these differences is important for the conservation and management of highly commensal primates.

  13. Why "Working Smarter" Isn't Working: White-Collar Productivity Improvement.

    ERIC Educational Resources Information Center

    Shaw, Edward

    2001-01-01

    Discusses the productivity and work days of white collar workers. Topics include productivity improvement; task analysis; the amount of time spent reading, and how to reduce it by improving writing skills; time spent in meetings; empowered time management; and sustaining a climate for change. (LRW)

  14. Emissivity of Rocket Plume Particulates

    DTIC Science & Technology

    1992-09-01

    V. EXPERIMENTAL RESULTS ........ ............... 29 VI. CONCLUSIONS AND RECOMMENDATIONS .... ........ 32 APPENDIX A. CATS -E SOFTWARE...interfaced through the CATS E Thermal Analysis software, which is MS-DOS based, and can be run on any 28b or higher CPU. This system allows real-time...body source to establish the parameters required by the CATS program for proper microscope/scanner interface. A complete description of microscope

  15. Vulnerability Model. A Simulation System for Assessing Damage Resulting from Marine Spills

    DTIC Science & Technology

    1975-06-01

    used and the scenario simulated. The test runs were made on an IBM 360/65 computer. Running times were generally between 15 and 35 CPU seconds...fect filrthcr north. A petroleum tank-truck operation was located within 600 feet Of L𔃻:- stock pond on which the crude oil had dammred itp . At 5 A-M

  16. Parental Involvement, Child Temperament, and Parents’ Work Hours: Differential Relations for Mothers and Fathers

    PubMed Central

    Brown, Geoffrey L.; McBride, Brent A.; Bost, Kelly K.; Shin, Nana

    2014-01-01

    This study examined how child temperament was related to parents’ time spent accessible to and interacting with their 2-year-olds. Bivariate analyses indicated that both fathers and mothers spent more time with temperamentally challenging children than easier children on workdays, but fathers spent less time with challenging children than easier children on non-workdays. After accounting for work hours, some associations between temperament and fathers’ workday involvement dropped to non-significance. For fathers, work hours also moderated the relation between irregular temperament and workday play. For mothers, work hours moderated the relation between both difficult and irregular temperament and workday interaction. Mothers also spent more time with girls (but not boys) who were temperamentally irregular. Results speak to the influence of child temperament on parenting behavior, and the differential construction of parenting roles as a function of child characteristics and patterns of work. PMID:25960588

  17. Parental Involvement, Child Temperament, and Parents' Work Hours: Differential Relations for Mothers and Fathers.

    PubMed

    Brown, Geoffrey L; McBride, Brent A; Bost, Kelly K; Shin, Nana

    2011-01-01

    This study examined how child temperament was related to parents' time spent accessible to and interacting with their 2-year-olds. Bivariate analyses indicated that both fathers and mothers spent more time with temperamentally challenging children than easier children on workdays, but fathers spent less time with challenging children than easier children on non-workdays. After accounting for work hours, some associations between temperament and fathers' workday involvement dropped to non-significance. For fathers, work hours also moderated the relation between irregular temperament and workday play. For mothers, work hours moderated the relation between both difficult and irregular temperament and workday interaction. Mothers also spent more time with girls (but not boys) who were temperamentally irregular. Results speak to the influence of child temperament on parenting behavior, and the differential construction of parenting roles as a function of child characteristics and patterns of work.

  18. A Reliability-Based Particle Filter for Humanoid Robot Self-Localization in RoboCup Standard Platform League

    PubMed Central

    Sánchez, Eduardo Munera; Alcobendas, Manuel Muñoz; Noguera, Juan Fco. Blanes; Gilabert, Ginés Benet; Simó Ten, José E.

    2013-01-01

    This paper deals with the problem of humanoid robot localization and proposes a new method for position estimation that has been developed for the RoboCup Standard Platform League environment. Firstly, a complete vision system has been implemented in the Nao robot platform that enables the detection of relevant field markers. The detection of field markers provides some estimation of distances for the current robot position. To reduce errors in these distance measurements, extrinsic and intrinsic camera calibration procedures have been developed and described. To validate the localization algorithm, experiments covering many of the typical situations that arise during RoboCup games have been developed: ranging from degradation in position estimation to total loss of position (due to falls, ‘kidnapped robot’, or penalization). The self-localization method developed is based on the classical particle filter algorithm. The main contribution of this work is a new particle selection strategy. Our approach reduces the CPU computing time required for each iteration and so eases the limited resource availability problem that is common in robot platforms such as Nao. The experimental results show the quality of the new algorithm in terms of localization and CPU time consumption. PMID:24193098

  19. Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics

    NASA Technical Reports Server (NTRS)

    Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.

    2001-01-01

    An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.

  20. High-Speed Particle-in-Cell Simulation Parallelized with Graphic Processing Units for Low Temperature Plasmas for Material Processing

    NASA Astrophysics Data System (ADS)

    Hur, Min Young; Verboncoeur, John; Lee, Hae June

    2014-10-01

    Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.

Top