Sample records for cache performance modeling

  1. An area model for on-chip memories and its application

    NASA Technical Reports Server (NTRS)

    Mulder, Johannes M.; Quach, Nhon T.; Flynn, Michael J.

    1991-01-01

    An area model suitable for comparing data buffers of different organizations and arbitrary sizes is described. The area model considers the supplied bandwidth of a memory cell and includes such buffer overhead as control logic, driver logic, and tag storage. The model gave less than 10 percent error when verified against real caches and register files. It is shown that, comparing caches and register files in terms of area for the same storage capacity, caches generally occupy more area per bit than register files for small caches because the overhead dominates the cache area at these sizes. For larger caches, the smaller storage cells in the cache provide a smaller total cache area per bit than the register set. Studying cache performance (traffic ratio) as a function of area, it is shown that, for small caches, direct-mapped caches perform significantly better than four-way set-associative caches and, for caches of medium areas, both direct-mapped and set-associative caches perform better than fully associative caches.

  2. A Measurement and Simulation Based Methodology for Cache Performance Modeling and Tuning

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cache performance for applications executing on shared memory multiprocessors by accurately predicting the effects of source code level modifications. Measurements on a single processor are initially used for identifying parts of code where cache utilization improvements may significantly impact the overall performance. Cache simulation based on trace-driven techniques can be carried out without gathering detailed address traces. Minimal runtime information for modeling cache performance of a selected code block includes: base virtual addresses of arrays, virtual addresses of variables, and loop bounds for that code block. Rest of the information is obtained from the source code. We show that the cache performance predictions are as reliable as those obtained through trace-driven simulations. This technique is particularly helpful to the exploration of various "what-if' scenarios regarding the cache performance impact for alternative code structures. We explain and validate this methodology using a simple matrix-matrix multiplication program. We then apply this methodology to predict and tune the cache performance of two realistic scientific applications taken from the Computational Fluid Dynamics (CFD) domain.

  3. Smart caching based on mobile agent of power WebGIS platform.

    PubMed

    Wang, Xiaohui; Wu, Kehe; Chen, Fei

    2013-01-01

    Power information construction is developing towards intensive, platform, distributed direction with the expansion of power grid and improvement of information technology. In order to meet the trend, power WebGIS was designed and developed. In this paper, we first discuss the architecture and functionality of power WebGIS, and then we study caching technology in detail, which contains dynamic display cache model, caching structure based on mobile agent, and cache data model. We have designed experiments of different data capacity to contrast performance between WebGIS with the proposed caching model and traditional WebGIS. The experimental results showed that, with the same hardware environment, the response time of WebGIS with and without caching model increased as data capacity growing, while the larger the data was, the higher the performance of WebGIS with proposed caching model improved.

  4. Smart Caching Based on Mobile Agent of Power WebGIS Platform

    PubMed Central

    Wang, Xiaohui; Wu, Kehe; Chen, Fei

    2013-01-01

    Power information construction is developing towards intensive, platform, distributed direction with the expansion of power grid and improvement of information technology. In order to meet the trend, power WebGIS was designed and developed. In this paper, we first discuss the architecture and functionality of power WebGIS, and then we study caching technology in detail, which contains dynamic display cache model, caching structure based on mobile agent, and cache data model. We have designed experiments of different data capacity to contrast performance between WebGIS with the proposed caching model and traditional WebGIS. The experimental results showed that, with the same hardware environment, the response time of WebGIS with and without caching model increased as data capacity growing, while the larger the data was, the higher the performance of WebGIS with proposed caching model improved. PMID:24288504

  5. Single-pass memory system evaluation for multiprogramming workloads

    NASA Technical Reports Server (NTRS)

    Conte, Thomas M.; Hwu, Wen-Mei W.

    1990-01-01

    Modern memory systems are composed of levels of cache memories, a virtual memory system, and a backing store. Varying more than a few design parameters and measuring the performance of such systems has traditionally be constrained by the high cost of simulation. Models of cache performance recently introduced reduce the cost simulation but at the expense of accuracy of performance prediction. Stack-based methods predict performance accurately using one pass over the trace for all cache sizes, but these techniques have been limited to fully-associative organizations. This paper presents a stack-based method of evaluating the performance of cache memories using a recurrence/conflict model for the miss ratio. Unlike previous work, the performance of realistic cache designs, such as direct-mapped caches, are predicted by the method. The method also includes a new approach to the problem of the effects of multiprogramming. This new technique separates the characteristics of the individual program from that of the workload. The recurrence/conflict method is shown to be practical, general, and powerful by comparing its performance to that of a popular traditional cache simulator. The authors expect that the availability of such a tool will have a large impact on future architectural studies of memory systems.

  6. Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.

  7. Cache-based error recovery for shared memory multiprocessor systems

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

    1989-01-01

    A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.

  8. Predictive Caching Using the TDAG Algorithm

    NASA Technical Reports Server (NTRS)

    Laird, Philip; Saul, Ronald

    1992-01-01

    We describe how the TDAG algorithm for learning to predict symbol sequences can be used to design a predictive cache store. A model of a two-level mass storage system is developed and used to calculate the performance of the cache under various conditions. Experimental simulations provide good confirmation of the model.

  9. High Performance Analytics with the R3-Cache

    NASA Astrophysics Data System (ADS)

    Eavis, Todd; Sayeed, Ruhan

    Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.

  10. A performance model for GPUs with caches

    DOE PAGES

    Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...

    2014-06-24

    To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less

  11. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

    DOE PAGES

    Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.; ...

    2017-09-20

    Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less

  12. A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.

    Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less

  13. Error recovery in shared memory multiprocessors using private caches

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

    1990-01-01

    The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.

  14. Combining instruction prefetching with partial cache locking to improve WCET in real-time systems.

    PubMed

    Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

    2013-01-01

    Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking.

  15. Combining Instruction Prefetching with Partial Cache Locking to Improve WCET in Real-Time Systems

    PubMed Central

    Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

    2013-01-01

    Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking. PMID:24386133

  16. A measurement-based study of concurrency in a multiprocessor

    NASA Technical Reports Server (NTRS)

    Mcguire, Patrick John

    1987-01-01

    A systematic measurement-based methodology for characterizing the amount of concurrency present in a workload, and the effect of concurrency on system performance indices such as cache miss rate and bus activity are developed. Hardware and software instrumentation of an Alliant FX/8 was used to obtain data from a real workload environment. Results show that 35% of the workload is concurrent, with the concurrent periods typically using all available processors. Measurements of periods of change in concurrency show uneven usage of processors during these times. Other system measures, including cache miss rate and processor bus activity, are analyzed with respect to the concurrency measures. Probability of a cache miss is seen to increase with concurrency. The change in cache miss rate is much more sensitive to the fraction of concurrent code in the worklaod than the number of processors active during concurrency. Regression models are developed to quantify the relationships between cache miss rate, bus activity, and the concurrency measures. The model for cache miss rate predicts an increase in the median miss rate value as much as 300% for a 100% increase in concurrency in the workload.

  17. The effect of code expanding optimizations on instruction cache design

    NASA Technical Reports Server (NTRS)

    Chen, William Y.; Chang, Pohua P.; Conte, Thomas M.; Hwu, Wen-Mei W.

    1991-01-01

    It is shown that code expanding optimizations have strong and non-intuitive implications on instruction cache design. Three types of code expanding optimizations are studied: instruction placement, function inline expansion, and superscalar optimizations. Overall, instruction placement reduces the miss ratio of small caches. Function inline expansion improves the performance for small cache sizes, but degrades the performance of medium caches. Superscalar optimizations increases the cache size required for a given miss ratio. On the other hand, they also increase the sequentiality of instruction access so that a simple load-forward scheme effectively cancels the negative effects. Overall, it is shown that with load forwarding, the three types of code expanding optimizations jointly improve the performance of small caches and have little effect on large caches.

  18. Re-caching by Western scrub-jays (Aphelocoma californica) cannot be attributed to stress.

    PubMed

    Thom, James M; Clayton, Nicola S

    2013-01-01

    Western scrub-jays (Aphelocoma californica) live double lives, storing food for the future while raiding the stores of other birds. One tactic scrub-jays employ to protect stores is "re-caching"-relocating caches out of sight of would-be thieves. Recent computational modelling work suggests that re-caching might be mediated not by complex cognition, but by a combination of memory failure and stress. The "Stress Model" asserts that re-caching is a manifestation of a general drive to cache, rather than a desire to protect existing stores. Here, we present evidence strongly contradicting the central assumption of these models: that stress drives caching, irrespective of social context. In Experiment (i), we replicate the finding that scrub-jays preferentially relocate food they were watched hiding. In Experiment (ii) we find no evidence that stress increases caching. In light of our results, we argue that the Stress Model cannot account for scrub-jay re-caching.

  19. An Effective Cache Algorithm for Heterogeneous Storage Systems

    PubMed Central

    Li, Yong; Feng, Dan

    2013-01-01

    Modern storage environment is commonly composed of heterogeneous storage devices. However, traditional cache algorithms exhibit performance degradation in heterogeneous storage systems because they were not designed to work with the diverse performance characteristics. In this paper, we present a new cache algorithm called HCM for heterogeneous storage systems. The HCM algorithm partitions the cache among the disks and adopts an effective scheme to balance the work across the disks. Furthermore, it applies benefit-cost analysis to choose the best allocation of cache block to improve the performance. Conducting simulations with a variety of traces and a wide range of cache size, our experiments show that HCM significantly outperforms the existing state-of-the-art storage-aware cache algorithms. PMID:24453890

  20. Engineering the CernVM-Filesystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

    NASA Astrophysics Data System (ADS)

    Dykstra, D.; Bockelman, B.; Blomer, J.; Herner, K.; Levshina, T.; Slyz, M.

    2015-12-01

    A common use pattern in the computing models of particle physics experiments is running many distributed applications that read from a shared set of data files. We refer to this data is auxiliary data, to distinguish it from (a) event data from the detector (which tends to be different for every job), and (b) conditions data about the detector (which tends to be the same for each job in a batch of jobs). Relatively speaking, conditions data also tends to be relatively small per job where both event data and auxiliary data are larger per job. Unlike event data, auxiliary data comes from a limited working set of shared files. Since there is spatial locality of the auxiliary data access, the use case appears to be identical to that of the CernVM- Filesystem (CVMFS). However, we show that distributing auxiliary data through CVMFS causes the existing CVMFS infrastructure to perform poorly. We utilize a CVMFS client feature called "alien cache" to cache data on existing local high-bandwidth data servers that were engineered for storing event data. This cache is shared between the worker nodes at a site and replaces caching CVMFS files on both the worker node local disks and on the site's local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and the Hadoop Distributed File System (HDFS) FUSE interface, and measured performance. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache HTTP interface. As a result, we have a design for an end-to-end high-bandwidth distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.

  1. Engineering the CernVM-Filesystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, D.; Bockelman, B.; Blomer, J.

    A common use pattern in the computing models of particle physics experiments is running many distributed applications that read from a shared set of data files. We refer to this data is auxiliary data, to distinguish it from (a) event data from the detector (which tends to be different for every job), and (b) conditions data about the detector (which tends to be the same for each job in a batch of jobs). Relatively speaking, conditions data also tends to be relatively small per job where both event data and auxiliary data are larger per job. Unlike event data, auxiliarymore » data comes from a limited working set of shared files. Since there is spatial locality of the auxiliary data access, the use case appears to be identical to that of the CernVM- Filesystem (CVMFS). However, we show that distributing auxiliary data through CVMFS causes the existing CVMFS infrastructure to perform poorly. We utilize a CVMFS client feature called 'alien cache' to cache data on existing local high-bandwidth data servers that were engineered for storing event data. This cache is shared between the worker nodes at a site and replaces caching CVMFS files on both the worker node local disks and on the site's local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and the Hadoop Distributed File System (HDFS) FUSE interface, and measured performance. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache HTTP interface. As a result, we have a design for an end-to-end high-bandwidth distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.« less

  2. The Effects of Cache Modification on Food Caching and Retrieval Behavior by Rats

    ERIC Educational Resources Information Center

    McKenzie, T.L.B.; Bird, L.R.; Roberts, W.A.

    2005-01-01

    Rats cached pieces of cheese on four different arms of an eight-arm radial maze. On a retrieval test given 45min later, rats learned to return to arms where food was cached before arms where food had not been cached. Tests were then performed in which cache sites on one side of the maze were always modified (pilfered or degraded), but cache sites…

  3. Don’t make cache too complex: A simple probability-based cache management scheme for SSDs

    PubMed Central

    Cho, Sangyeun; Choi, Jongmoo

    2017-01-01

    Solid-state drives (SSDs) have recently become a common storage component in computer systems, and they are fueled by continued bit cost reductions achieved with smaller feature sizes and multiple-level cell technologies. However, as the flash memory stores more bits per cell, the performance and reliability of the flash memory degrade substantially. To solve this problem, a fast non-volatile memory (NVM-)based cache has been employed within SSDs to reduce the long latency required to write data. Absorbing small writes in a fast NVM cache can also reduce the number of flash memory erase operations. To maximize the benefits of an NVM cache, it is important to increase the NVM cache utilization. In this paper, we propose and study ProCache, a simple NVM cache management scheme, that makes cache-entrance decisions based on random probability testing. Our scheme is motivated by the observation that frequently written hot data will eventually enter the cache with a high probability, and that infrequently accessed cold data will not enter the cache easily. Owing to its simplicity, ProCache is easy to implement at a substantially smaller cost than similar previously studied techniques. We evaluate ProCache and conclude that it achieves comparable performance compared to a more complex reference counter-based cache-management scheme. PMID:28358897

  4. Don't make cache too complex: A simple probability-based cache management scheme for SSDs.

    PubMed

    Baek, Seungjae; Cho, Sangyeun; Choi, Jongmoo

    2017-01-01

    Solid-state drives (SSDs) have recently become a common storage component in computer systems, and they are fueled by continued bit cost reductions achieved with smaller feature sizes and multiple-level cell technologies. However, as the flash memory stores more bits per cell, the performance and reliability of the flash memory degrade substantially. To solve this problem, a fast non-volatile memory (NVM-)based cache has been employed within SSDs to reduce the long latency required to write data. Absorbing small writes in a fast NVM cache can also reduce the number of flash memory erase operations. To maximize the benefits of an NVM cache, it is important to increase the NVM cache utilization. In this paper, we propose and study ProCache, a simple NVM cache management scheme, that makes cache-entrance decisions based on random probability testing. Our scheme is motivated by the observation that frequently written hot data will eventually enter the cache with a high probability, and that infrequently accessed cold data will not enter the cache easily. Owing to its simplicity, ProCache is easy to implement at a substantially smaller cost than similar previously studied techniques. We evaluate ProCache and conclude that it achieves comparable performance compared to a more complex reference counter-based cache-management scheme.

  5. Analysis of DNS Cache Effects on Query Distribution

    PubMed Central

    2013-01-01

    This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally. PMID:24396313

  6. Analysis of DNS cache effects on query distribution.

    PubMed

    Wang, Zheng

    2013-01-01

    This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally.

  7. Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

    NASA Astrophysics Data System (ADS)

    Zhao, Huatao; Luo, Xiao; Zhu, Chen; Watanabe, Takahiro; Zhu, Tianbo

    2017-07-01

    In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.

  8. WATCHMAN: A Data Warehouse Intelligent Cache Manager

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Shim, Junho; Vingralek, Radek

    1996-01-01

    Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

  9. A Global User-Driven Model for Tile Prefetching in Web Geographical Information Systems.

    PubMed

    Pan, Shaoming; Chong, Yanwen; Zhang, Hang; Tan, Xicheng

    2017-01-01

    A web geographical information system is a typical service-intensive application. Tile prefetching and cache replacement can improve cache hit ratios by proactively fetching tiles from storage and replacing the appropriate tiles from the high-speed cache buffer without waiting for a client's requests, which reduces disk latency and improves system access performance. Most popular prefetching strategies consider only the relative tile popularities to predict which tile should be prefetched or consider only a single individual user's access behavior to determine which neighbor tiles need to be prefetched. Some studies show that comprehensively considering all users' access behaviors and all tiles' relationships in the prediction process can achieve more significant improvements. Thus, this work proposes a new global user-driven model for tile prefetching and cache replacement. First, based on all users' access behaviors, a type of expression method for tile correlation is designed and implemented. Then, a conditional prefetching probability can be computed based on the proposed correlation expression mode. Thus, some tiles to be prefetched can be found by computing and comparing the conditional prefetching probability from the uncached tiles set and, similarly, some replacement tiles can be found in the cache buffer according to multi-step prefetching. Finally, some experiments are provided comparing the proposed model with other global user-driven models, other single user-driven models, and other client-side prefetching strategies. The results show that the proposed model can achieve a prefetching hit rate in approximately 10.6% ~ 110.5% higher than the compared methods.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Lingda; Hayes, Ari; Song, Shuaiwen

    Modern GPUs employ cache to improve memory system efficiency. However, large amount of cache space is underutilized due to irregular memory accesses and poor spatial locality which exhibited commonly in GPU applications. Our experiments show that using smaller cache lines could improve cache space utilization, but it also frequently suffers from significant performance loss by introducing large amount of extra cache requests. In this work, we propose a novel cache design named tag-split cache (TSC) that enables fine-grained cache storage to address the problem of cache space underutilization while keeping memory request number unchanged. TSC divides tag into two partsmore » to reduce storage overhead, and it supports multiple cache line replacement in one cycle.« less

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Zhang, Zhao; Vetter, Jeffrey S

    Recent trends of CMOS scaling and use of large last level caches (LLCs) have led to significant increase in the leakage energy consumption of LLCs and hence, managing their energy consumption has become extremely important in modern processor design. The conventional cache energy saving techniques require offline profiling or provide only coarse granularity of cache allocation. We present FlexiWay, a cache energy saving technique which uses dynamic cache reconfiguration. FlexiWay logically divides the cache sets into multiple (e.g. 16) modules and dynamically turns off suitable and possibly different number of cache ways in each module. FlexiWay has very small implementationmore » overhead and it provides fine-grain cache allocation even with caches of typical associativity, e.g. an 8-way cache. Microarchitectural simulations have been performed using an x86-64 simulator and workloads from SPEC2006 suite. Also, FlexiWay has been compared with two conventional energy saving techniques. The results show that FlexiWay provides largest energy saving and incurs only small loss in performance. For single, dual and quad core systems, the average energy saving using FlexiWay are 26.2%, 25.7% and 22.4%, respectively.« less

  12. Cache-enabled small cell networks: modeling and tradeoffs.

    PubMed

    Baştuǧ, Ejder; Bennis, Mehdi; Kountouris, Marios; Debbah, Mérouane

    We consider a network model where small base stations (SBSs) have caching capabilities as a means to alleviate the backhaul load and satisfy users' demand. The SBSs are stochastically distributed over the plane according to a Poisson point process (PPP) and serve their users either (i) by bringing the content from the Internet through a finite rate backhaul or (ii) by serving them from the local caches. We derive closed-form expressions for the outage probability and the average delivery rate as a function of the signal-to-interference-plus-noise ratio (SINR), SBS density, target file bitrate, storage size, file length, and file popularity. We then analyze the impact of key operating parameters on the system performance. It is shown that a certain outage probability can be achieved either by increasing the number of base stations or the total storage size. Our results and analysis provide key insights into the deployment of cache-enabled small cell networks (SCNs), which are seen as a promising solution for future heterogeneous cellular networks.

  13. A two-level cache for distributed information retrieval in search engines.

    PubMed

    Zhang, Weizhe; He, Hui; Ye, Jianwei

    2013-01-01

    To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.

  14. A Two-Level Cache for Distributed Information Retrieval in Search Engines

    PubMed Central

    Zhang, Weizhe; He, Hui; Ye, Jianwei

    2013-01-01

    To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621

  15. Performance of defect-tolerant set-associative cache memories

    NASA Technical Reports Server (NTRS)

    Frenzel, J. F.

    1991-01-01

    The increased use of on-chip cache memories has led researchers to investigate their performance in the presence of manufacturing defects. Several techniques for yield improvement are discussed and results are presented which indicate that set-associativity may be used to provide defect tolerance as well as improve the cache performance. Tradeoffs between several cache organizations and replacement strategies are investigated and it is shown that token-based replacement may be a suitable alternative to the widely-used LRU strategy.

  16. Changes in spatial memory mediated by experimental variation in food supply do not affect hippocampal anatomy in mountain chickadees (Poecile gambeli).

    PubMed

    Pravosudov, V V; Lavenex, P; Clayton, N S

    2002-05-01

    Earlier reports suggested that seasonal variation in food-caching behavior (caching intensity and cache retrieval accuracy) might correlate with morphological changes in the hippocampal formation, a brain structure thought to play a role in remembering cache locations. We demonstrated that changes in cache retrieval accuracy can also be triggered by experimental variation in food supply: captive mountain chickadees (Poecile gambeli) maintained on limited and unpredictable food supply were more accurate at recovering their caches and performed better on spatial memory tests than birds maintained on ad libitum food. In this study, we investigated whether these two treatment groups also differed in the volume and neuron number of the hippocampal formation. If variation in memory for food caches correlates with hippocampal size, then our birds with enhanced cache recovery and spatial memory performance should have larger hippocampal volumes and total neuron numbers. Contrary to this prediction we found no significant differences in volume or total neuron number of the hippocampal formation between the two treatment groups. Our results therefore indicate that changes in food-caching behavior and spatial memory performance, as mediated by experimental variations in food supply, are not necessarily accompanied by morphological changes in volume or neuron number of the hippocampal formation in fully developed, experienced food-caching birds. Copyright 2002 Wiley Periodicals, Inc.

  17. Effective Padding of Multi-Dimensional Arrays to Avoid Cache Conflict Misses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, Changwan; Bao, Wenlei; Cohen, Albert

    Caches are used to significantly improve performance. Even with high degrees of set-associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity, causing conflict misses and lowered performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays for a set associative cache for arbitrary tile sizes, In addition, we develop the first solution tomore » padding for nested tiles and multi-level caches. The techniques are in implemented in PAdvisor tool. Experimental results with multiple benchmarks demonstrate significant performance improvement from use of PAdvisor for padding.« less

  18. Corvid caching: Insights from a cognitive model.

    PubMed

    van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K

    2011-07-01

    Caching and recovery of food by corvids is well-studied, but some ambiguous results remain. To help clarify these, we built a computational cognitive model. It is inspired by similar models built for humans, and it assumes that memory strength depends on frequency and recency of use. We compared our model's behavior to that of real birds in previously published experiments. Our model successfully replicated the outcomes of two experiments on recovery behavior and two experiments on cache site choice. Our "virtual birds" reproduced declines in recovery accuracy across sessions, revisits to previously emptied cache sites, a lack of correlation between caching and recovery order, and a preference for caching in safe locations. The model also produced two new explanations. First, that Clark's nutcrackers may become less accurate as recovery progresses not because of differential memory for different cache sites, as was once assumed, but because of chance effects. And second, that Western scrub jays may choose their cache sites not on the basis of negative recovery experiences only, as was previously thought, but on the basis of positive recovery experiences instead. Alternatively, both "punishment" and "reward" may be playing a role. We conclude with a set of new insights, a testable prediction, and directions for future work. PsycINFO Database Record (c) 2011 APA, all rights reserved

  19. A Global User-Driven Model for Tile Prefetching in Web Geographical Information Systems

    PubMed Central

    Pan, Shaoming; Chong, Yanwen; Zhang, Hang; Tan, Xicheng

    2017-01-01

    A web geographical information system is a typical service-intensive application. Tile prefetching and cache replacement can improve cache hit ratios by proactively fetching tiles from storage and replacing the appropriate tiles from the high-speed cache buffer without waiting for a client’s requests, which reduces disk latency and improves system access performance. Most popular prefetching strategies consider only the relative tile popularities to predict which tile should be prefetched or consider only a single individual user's access behavior to determine which neighbor tiles need to be prefetched. Some studies show that comprehensively considering all users’ access behaviors and all tiles’ relationships in the prediction process can achieve more significant improvements. Thus, this work proposes a new global user-driven model for tile prefetching and cache replacement. First, based on all users’ access behaviors, a type of expression method for tile correlation is designed and implemented. Then, a conditional prefetching probability can be computed based on the proposed correlation expression mode. Thus, some tiles to be prefetched can be found by computing and comparing the conditional prefetching probability from the uncached tiles set and, similarly, some replacement tiles can be found in the cache buffer according to multi-step prefetching. Finally, some experiments are provided comparing the proposed model with other global user-driven models, other single user-driven models, and other client-side prefetching strategies. The results show that the proposed model can achieve a prefetching hit rate in approximately 10.6% ~ 110.5% higher than the compared methods. PMID:28085937

  20. Storageless and caching Tier-2 models in the UK context

    NASA Astrophysics Data System (ADS)

    Cadellin Skipsey, Samuel; Dewhurst, Alastair; Crooks, David; MacMahon, Ewan; Roy, Gareth; Smith, Oliver; Mohammed, Kashif; Brew, Chris; Britton, David

    2017-10-01

    Operational and other pressures have lead to WLCG experiments moving increasingly to a stratified model for Tier-2 resources, where “fat” Tier-2s (“T2Ds”) and “thin” Tier-2s (“T2Cs”) provide different levels of service. In the UK, this distinction is also encouraged by the terms of the current GridPP5 funding model. In anticipation of this, testing has been performed on the implications, and potential implementation, of such a distinction in our resources. In particular, this presentation presents the results of testing of storage T2Cs, where the “thin” nature is expressed by the site having either no local data storage, or only a thin caching layer; data is streamed or copied from a “nearby” T2D when needed by jobs. In OSG, this model has been adopted successfully for CMS AAA sites; but the network topology and capacity in the USA is significantly different to that in the UK (and much of Europe). We present the result of several operational tests: the in-production University College London (UCL) site, which runs ATLAS workloads using storage at the Queen Mary University of London (QMUL) site; the Oxford site, which has had scaling tests performed against T2Ds in various locations in the UK (to test network effects); and the Durham site, which has been testing the specific ATLAS caching solution of “Rucio Cache” integration with ARC’s caching layer.

  1. Binary mesh partitioning for cache-efficient visualization.

    PubMed

    Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno

    2010-01-01

    One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\\schmi O}(N\\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.

  2. Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

    NASA Astrophysics Data System (ADS)

    Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

    2017-05-01

    In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.

  3. Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization

    NASA Astrophysics Data System (ADS)

    Wu, Jianping; Ling, Ming; Zhang, Yang; Mei, Chen; Wang, Huan

    This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.

  4. The Optimization of In-Memory Space Partitioning Trees for Cache Utilization

    NASA Astrophysics Data System (ADS)

    Yeo, Myung Ho; Min, Young Soo; Bok, Kyoung Soo; Yoo, Jae Soo

    In this paper, a novel cache conscious indexing technique based on space partitioning trees is proposed. Many researchers investigated efficient cache conscious indexing techniques which improve retrieval performance of in-memory database management system recently. However, most studies considered data partitioning and targeted fast information retrieval. Existing data partitioning-based index structures significantly degrade performance due to the redundant accesses of overlapped spaces. Specially, R-tree-based index structures suffer from the propagation of MBR (Minimum Bounding Rectangle) information by updating data frequently. In this paper, we propose an in-memory space partitioning index structure for optimal cache utilization. The proposed index structure is compared with the existing index structures in terms of update performance, insertion performance and cache-utilization rate in a variety of environments. The results demonstrate that the proposed index structure offers better performance than existing index structures.

  5. Cost aware cache replacement policy in shared last-level cache for hybrid memory based fog computing

    NASA Astrophysics Data System (ADS)

    Jia, Gangyong; Han, Guangjie; Wang, Hao; Wang, Feng

    2018-04-01

    Fog computing requires a large main memory capacity to decrease latency and increase the Quality of Service (QoS). However, dynamic random access memory (DRAM), the commonly used random access memory, cannot be included into a fog computing system due to its high consumption of power. In recent years, non-volatile memories (NVM) such as Phase-Change Memory (PCM) and Spin-transfer torque RAM (STT-RAM) with their low power consumption have emerged to replace DRAM. Moreover, the currently proposed hybrid main memory, consisting of both DRAM and NVM, have shown promising advantages in terms of scalability and power consumption. However, the drawbacks of NVM, such as long read/write latency give rise to potential problems leading to asymmetric cache misses in the hybrid main memory. Current last level cache (LLC) policies are based on the unified miss cost, and result in poor performance in LLC and add to the cost of using NVM. In order to minimize the cache miss cost in the hybrid main memory, we propose a cost aware cache replacement policy (CACRP) that reduces the number of cache misses from NVM and improves the cache performance for a hybrid memory system. Experimental results show that our CACRP behaves better in LLC performance, improving performance up to 43.6% (15.5% on average) compared to LRU.

  6. Experimental evaluation of multiprocessor cache-based error recovery

    NASA Technical Reports Server (NTRS)

    Janssens, Bob; Fuchs, W. K.

    1991-01-01

    Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.

  7. OS friendly microprocessor architecture: Hardware level computer security

    NASA Astrophysics Data System (ADS)

    Jungwirth, Patrick; La Fratta, Patrick

    2016-05-01

    We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.

  8. Temperature and leakage aware techniques to improve cache reliability

    NASA Astrophysics Data System (ADS)

    Akaaboune, Adil

    Decreasing power consumption in small devices such as handhelds, cell phones and high-performance processors is now one of the most critical design concerns. On-chip cache memories dominate the chip area in microprocessors and thus arises the need for power efficient cache memories. Cache is the simplest cost effective method to attain high speed memory hierarchy and, its performance is extremely critical for high speed computers. Cache is used by the microprocessor for channeling the performance gap between processor and main memory (RAM) hence the memory bandwidth is frequently a bottleneck which can affect the peak throughput significantly. In the design of any cache system, the tradeoffs of area/cost, performance, power consumption, and thermal management must be taken into consideration. Previous work has mainly concentrated on performance and area/cost constraints. More recent works have focused on low power design especially for portable devices and media-processing systems, however fewer research has been done on the relationship between heat management, Leakage power and cost per die. Lately, the focus of power dissipation in the new generations of microprocessors has shifted from dynamic power to idle power, a previously underestimated form of power loss that causes battery charge to drain and shutdown too early due the waste of energy. The problem has been aggravated by the aggressive scaling of process; device level method used originally by designers to enhance performance, conserve dissipation and reduces the sizes of digital circuits that are increasingly condensed. This dissertation studies the impact of hotspots, in the cache memory, on leakage consumption and microprocessor reliability and durability. The work will first prove that by eliminating hotspots in the cache memory, leakage power will be reduced and therefore, the reliability will be improved. The second technique studied is data quality management that improves the quality of the data stored in the cache to reduce power consumption. The initial work done on this subject focuses on the type of data that increases leakage consumption and ways to manage without impacting the performance of the microprocessor. The second phase of the project focuses on managing the data storage in different blocks of the cache to smooth the leakage power as well as dynamic power consumption. The last technique is a voltage controlled cache to reduce the leakage consumption of the cache while in execution and even in idle state. Two blocks of the 4-way set associative cache go through a voltage regulator before getting to the voltage well, and the other two are directly connected to the voltage well. The idea behind this technique is to use the replacement algorithm information to increase or decrease voltage of the two blocks depending on the need of the information stored on them.

  9. Evaluating the effect of online data compression on the disk cache of a mass storage system

    NASA Technical Reports Server (NTRS)

    Pentakalos, Odysseas I.; Yesha, Yelena

    1994-01-01

    A trace driven simulation of the disk cache of a mass storage system was used to evaluate the effect of an online compression algorithm on various performance measures. Traces from the system at NASA's Center for Computational Sciences were used to run the simulation and disk cache hit ratios, number of files and bytes migrating to tertiary storage were measured. The measurements were performed for both an LRU and a size based migration algorithm. In addition to seeing the effect of online data compression on the disk cache performance measure, the simulation provided insight into the characteristics of the interactive references, suggesting that hint based prefetching algorithms are the only alternative for any future improvements to the disk cache hit ratio.

  10. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

    PubMed Central

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-01-01

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313

  11. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

    PubMed

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-03-22

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Zhang, Zhao

    With each CMOS technology generation, leakage energy consumption has been dramatically increasing and hence, managing leakage power consumption of large last-level caches (LLCs) has become a critical issue in modern processor design. In this paper, we present EnCache, a novel software-based technique which uses dynamic profiling-based cache reconfiguration for saving cache leakage energy. EnCache uses a simple hardware component called profiling cache, which dynamically predicts energy efficiency of an application for 32 possible cache configurations. Using these estimates, system software reconfigures the cache to the most energy efficient configuration. EnCache uses dynamic cache reconfiguration and hence, it does not requiremore » offline profiling or tuning the parameter for each application. Furthermore, EnCache optimizes directly for the overall memory subsystem (LLC and main memory) energy efficiency instead of the LLC energy efficiency alone. The experiments performed with an x86-64 simulator and workloads from SPEC2006 suite confirm that EnCache provides larger energy saving than a conventional energy saving scheme. For single core and dual-core system configurations, the average savings in memory subsystem energy over a shared baseline configuration are 30.0% and 27.3%, respectively.« less

  13. EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S

    To address the limitations of SRAM such as high-leakage and low-density, researchers have explored use of non-volatile memory (NVM) devices, such as ReRAM (resistive RAM) and STT-RAM (spin transfer torque RAM) for designing on-chip caches. A crucial limitation of NVMs, however, is that their write endurance is low and the large intra-set write variation introduced by existing cache management policies may further exacerbate this problem, thereby reducing the cache lifetime significantly. We present EqualChance, a technique to increase cache lifetime by reducing intra-set write variation. EqualChance works by periodically changing the physical cache-block location of a write-intensive data item withinmore » a set to achieve wear-leveling. Simulations using workloads from SPEC CPU2006 suite and HPC (high-performance computing) field show that EqualChance improves the cache lifetime by 4.29X. Also, its implementation overhead is small, and it incurs very small performance and energy loss.« less

  14. Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montry, G.R.; Benner, R.E.

    1985-12-01

    The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.

  15. List based prefetch

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boyle, Peter; Christ, Norman; Gara, Alan

    A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the listmore » address.« less

  16. List based prefetch

    DOEpatents

    Boyle, Peter [Edinburgh, GB; Christ, Norman [Irvington, NY; Gara, Alan [Yorktown Heights, NY; Kim,; Changhoan, [San Jose, CA; Mawhinney, Robert [New York, NY; Ohmacht, Martin [Yorktown Heights, NY; Sugavanam, Krishnan [Yorktown Heights, NY

    2012-08-28

    A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.

  17. Version pressure feedback mechanisms for speculative versioning caches

    DOEpatents

    Eichenberger, Alexandre E.; Gara, Alan; O& #x27; Brien, Kathryn M.; Ohmacht, Martin; Zhuang, Xiaotong

    2013-03-12

    Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.

  18. Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks

    PubMed Central

    Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok

    2016-01-01

    Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user’s quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities. PMID:27347975

  19. Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks.

    PubMed

    Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok

    2016-06-25

    Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user's quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities.

  20. Short-term observational spatial memory in Jackdaws (Corvus monedula) and Ravens (Corvus corax).

    PubMed

    Scheid, Christelle; Bugnyar, Thomas

    2008-10-01

    Observational spatial memory (OSM) refers to the ability of remembering food caches made by other individuals, enabling observers to find and pilfer the others' caches. Within birds, OSM has only been demonstrated in corvids, with more social species such as Mexican jays (Aphelocoma ultramarine) showing a higher accuracy of finding conspecific' caches than less social species such as Clark's nutcrackers (Nucifraga columbiana). However, socially dynamic corvids such as ravens (Corvus corax) are capable of sophisticated pilfering manoeuvres based on OSM. We here compared the performance of ravens and jackdaws (Corvus monedula) in a short-term OSM task. In contrast to ravens, jackdaws are socially cohesive but hardly cache and compete over food caches. Birds had to recover food pieces after watching a human experimenter hiding them in 2, 4 or 6 out of 10 possible locations. Results showed that for tests with two, four and six caches, ravens performed more accurately than expected by chance whereas jackdaws did not. Moreover, ravens made fewer re-visits to already inspected cache sites than jackdaws. These findings suggest that the development of observational spatial memory skills is linked with the species' reliance on food caches rather than with a social life style per se.

  1. A trace-driven analysis of name and attribute caching in a distributed system

    NASA Technical Reports Server (NTRS)

    Shirriff, Ken W.; Ousterhout, John K.

    1992-01-01

    This paper presents the results of simulating file name and attribute caching on client machines in a distributed file system. The simulation used trace data gathered on a network of about 40 workstations. Caching was found to be advantageous: a cache on each client containing just 10 directories had a 91 percent hit rate on name look ups. Entry-based name caches (holding individual directory entries) had poorer performance for several reasons, resulting in a maximum hit rate of about 83 percent. File attribute caching obtained a 90 percent hit rate with a cache on each machine of the attributes for 30 files. The simulations show that maintaining cache consistency between machines is not a significant problem; only 1 in 400 name component look ups required invalidation of a remotely cached entry. Process migration to remote machines had little effect on caching. Caching was less successful in heavily shared and modified directories such as /tmp, but there weren't enough references to /tmp overall to affect the results significantly. We estimate that adding name and attribute caching to the Sprite operating system could reduce server load by 36 percent and the number of network packets by 30 percent.

  2. Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency

    NASA Astrophysics Data System (ADS)

    Soderquist, Peter; Leeser, Miriam E.

    1999-01-01

    Digital video decoding, enabled by the MPEG-2 Video standard, is an important future application for embedded systems, particularly PDAs and other information appliances. Many such system require portability and wireless communication capabilities, and thus face severe limitations in size and power consumption. This places a premium on integration and efficiency, and favors software solutions for video functionality over specialized hardware. The processors in most embedded system currently lack the computational power needed to perform video decoding, but a related and equally important problem is the required data bandwidth, and the need to cost-effectively insure adequate data supply. MPEG data sets are very large, and generate significant amounts of excess memory traffic for standard data caches, up to 100 times the amount required for decoding. Meanwhile, cost and power limitations restrict cache sizes in embedded systems. Some systems, including many media processors, eliminate caches in favor of memories under direct, painstaking software control in the manner of digital signal processors. Yet MPEG data has locality which caches can exploit if properly optimized, providing fast, flexible, and automatic data supply. We propose a set of enhancements which target the specific needs of the heterogeneous types within the MPEG decoder working set. These optimizations significantly improve the efficiency of small caches, reducing cache-memory traffic by almost 70 percent, and can make an enhanced 4 KB cache perform better than a standard 1 MB cache. This performance improvement can enable high-resolution, full frame rate video playback in cheaper, smaller system than woudl otherwise be possible.

  3. Determinants of seed removal distance by scatter-hoarding rodents in deciduous forests.

    PubMed

    Moore, Jeffrey E; McEuen, Amy B; Swihart, Robert K; Contreras, Thomas A; Steele, Michael A

    2007-10-01

    Scatter-hoarding rodents should space food caches to maximize cache recovery rate (to minimize loss to pilferers) relative to the energetic cost of carrying food items greater distances. Optimization models of cache spacing make two predictions. First, spacing of caches should be greater for food items with greater energy content. Second, the mean distance between caches should increase with food abundance. However, the latter prediction fails to account for the effect of food abundance on the behavior of potential pilferers or on the ability of caching individuals to acquire food by means other than recovering their own caches. When considering these factors, shorter cache distances may be predicted in conditions of higher food abundance. We predicted that seed caching distances would be greater for food items of higher energy content and during lower ambient food abundance and that the effect of seed type on cache distance variation would be lower during higher food abundance. We recorded distances moved for 8636 seeds of five seed types at 15 locations in three forested sites in Pennsylvania, USA, and 29 forest fragments in Indiana, U.S.A., across five different years. Seed production was poor in three years and high in two years. Consistent with previous studies, seeds with greater energy content were moved farther than less profitable food items. Seeds were dispersed less far in seed-rich years than in seed-poor years, contrary to predictions of conventional models. Interactions were important, with seed type effects more evident in seed-poor years. These results suggest that, when food is superabundant, optimal cache distances are more strongly determined by minimizing energy cost of caching than by minimizing pilfering rates and that cache loss rates may be more strongly density-dependent in times of low seed abundance.

  4. An effective write policy for software coherence schemes

    NASA Technical Reports Server (NTRS)

    Chen, Yung-Chin; Veidenbaum, Alexander V.

    1992-01-01

    The authors study the write behavior and evaluate the performance of various write strategies and buffering techniques for a MIN-based multiprocessor system using the simple software coherence scheme. Hit ratios, memory latencies, total execution time, and total write traffic are used as the performance indices. The write-through write-allocate no-fetch cache using a write-back write buffer is shown to have a better performance than both write-through and write-back caches. This type of write buffer is effective in reducing the volume as well as bursts of write traffic. On average, the use of a write-back cache reduces by 60 percent the total write traffic generated by a write-through cache.

  5. Considering User's Access Pattern in Multimedia File Systems

    NASA Astrophysics Data System (ADS)

    Cho, KyoungWoon; Ryu, YeonSeung; Won, Youjip; Koh, Kern

    2002-12-01

    Legacy buffer cache management schemes for multimedia server are grounded at the assumption that the application sequentially accesses the multimedia file. However, user access pattern may not be sequential in some circumstances, for example, in distance learning application, where the user may exploit the VCR-like function(rewind and play) of the system and accesses the particular segments of video repeatedly in the middle of sequential playback. Such a looping reference can cause a significant performance degradation of interval-based caching algorithms. And thus an appropriate buffer cache management scheme is required in order to deliver desirable performance even under the workload that exhibits looping reference behavior. We propose Adaptive Buffer cache Management(ABM) scheme which intelligently adapts to the file access characteristics. For each opened file, ABM applies either the LRU replacement or the interval-based caching depending on the Looping Reference Indicator, which indicates that how strong temporally localized access pattern is. According to our experiment, ABM exhibits better buffer cache miss ratio than interval-based caching or LRU, especially when the workload exhibits not only sequential but also looping reference property.

  6. Efficacy of Code Optimization on Cache-based Processors

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.

  7. Achieving cost/performance balance ratio using tiered storage caching techniques: A case study with CephFS

    NASA Astrophysics Data System (ADS)

    Poat, M. D.; Lauret, J.

    2017-10-01

    As demand for widely accessible storage capacity increases and usage is on the rise, steady IO performance is desired but tends to suffer within multi-user environments. Typical deployments use standard hard drives as the cost per/GB is quite low. On the other hand, HDD based solutions for storage is not known to scale well with process concurrency and soon enough, high rate of IOPs create a “random access” pattern killing performance. Though not all SSDs are alike, SSDs are an established technology often used to address this exact “random access” problem. In this contribution, we will first discuss the IO performance of many different SSD drives (tested in a comparable and standalone manner). We will then be discussing the performance and integrity of at least three low-level disk caching techniques (Flashcache, dm-cache, and bcache) including individual policies, procedures, and IO performance. Furthermore, the STAR online computing infrastructure currently hosts a POSIX-compliant Ceph distributed storage cluster - while caching is not a native feature of CephFS (only exists in the Ceph Object store), we will show how one can implement a caching mechanism profiting from an implementation at a lower level. As our illustration, we will present our CephFS setup, IO performance tests, and overall experience from such configuration. We hope this work will service the community’s interest for using disk-caching mechanisms with applicable uses such as distributed storage systems and seeking an overall IO performance gain.

  8. Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning

    PubMed Central

    Ichnowski, Jeffrey; Prins, Jan F.; Alterovitz, Ron

    2014-01-01

    We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU’s cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot’s configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot. PMID:25419474

  9. Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning.

    PubMed

    Ichnowski, Jeffrey; Prins, Jan F; Alterovitz, Ron

    2014-05-01

    We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU's cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot's configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot.

  10. Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.

  11. The relationship between dominance, corticosterone, memory, and food caching in mountain chickadees (Poecile gambeli).

    PubMed

    Pravosudov, Vladimir V; Mendoza, Sally P; Clayton, Nicola S

    2003-08-01

    It has been hypothesized that in avian social groups subordinate individuals should maintain more energy reserves than dominants, as an insurance against increased perceived risk of starvation. Subordinates might also have elevated baseline corticosterone levels because corticosterone is known to facilitate fattening in birds. Recent experiments showed that moderately elevated corticosterone levels resulting from unpredictable food supply are correlated with enhanced cache retrieval efficiency and more accurate performance on a spatial memory task. Given the correlation between corticosterone and memory, a further prediction is that subordinates might be more efficient at cache retrieval and show more accurate performance on spatial memory tasks. We tested these predictions in dominant-subordinate pairs of mountain chickadees (Poecile gambeli). Each pair was housed in the same cage but caching behavior was tested individually in an adjacent aviary to avoid the confounding effects of small spaces in which birds could unnaturally and directly influence each other's behavior. In sharp contrast to our hypothesis, we found that subordinate chickadees cached less food, showed less efficient cache retrieval, and performed significantly worse on the spatial memory task than dominants. Although the behavioral differences could have resulted from social stress of subordination, and dominant birds reached significantly higher levels of corticosterone during their response to acute stress compared to subordinates, there were no significant differences between dominants and subordinates in baseline levels or in the pattern of adrenocortical stress response. We find no evidence, therefore, to support the hypothesis that subordinate mountain chickadees maintain elevated baseline corticosterone levels whereas lower caching rates and inferior cache retrieval efficiency might contribute to reduced survival of subordinates commonly found in food-caching parids.

  12. Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX)

    PubMed Central

    Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus

    2015-01-01

    Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 – Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning. PMID:26217710

  13. Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX).

    PubMed

    Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus

    2015-06-01

    Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 - Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning.

  14. Initial Performance Results on IBM POWER6

    NASA Technical Reports Server (NTRS)

    Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

    2008-01-01

    The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.

  15. Performance Prediction Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chennupati, Gopinath; Santhi, Nanadakishore; Eidenbenz, Stephen

    The Performance Prediction Toolkit (PPT), is a scalable co-design tool that contains the hardware and middle-ware models, which accept proxy applications as input in runtime prediction. PPT relies on Simian, a parallel discrete event simulation engine in Python or Lua, that uses the process concept, where each computing unit (host, node, core) is a Simian entity. Processes perform their task through message exchanges to remain active, sleep, wake-up, begin and end. The PPT hardware model of a compute core (such as a Haswell core) consists of a set of parameters, such as clock speed, memory hierarchy levels, their respective sizes,more » cache-lines, access times for different cache levels, average cycle counts of ALU operations, etc. These parameters are ideally read off a spec sheet or are learned using regression models learned from hardware counters (PAPI) data. The compute core model offers an API to the software model, a function called time_compute(), which takes as input a tasklist. A tasklist is an unordered set of ALU, and other CPU-type operations (in particular virtual memory loads and stores). The PPT application model mimics the loop structure of the application and replaces the computational kernels with a call to the hardware model's time_compute() function giving tasklists as input that model the compute kernel. A PPT application model thus consists of tasklists representing kernels and the high-er level loop structure that we like to think of as pseudo code. The key challenge for the hardware model's time_compute-function is to translate virtual memory accesses into actual cache hierarchy level hits and misses.PPT also contains another CPU core level hardware model, Analytical Memory Model (AMM). The AMM solves this challenge soundly, where our previous alternatives explicitly include the L1,L2,L3 hit-rates as inputs to the tasklists. Explicit hit-rates inevitably only reflect the application modeler's best guess, perhaps informed by a few small test problems using hardware counters; also, hard-coded hit-rates make the hardware model insensitive to changes in cache sizes. Alternatively, we use reuse distance distributions in the tasklists. In general, reuse profiles require the application modeler to run a very expensive trace analysis on the real code that realistically can be done at best for small examples.« less

  16. Store operations to maintain cache coherence

    DOEpatents

    Evangelinos, Constantinos; Nair, Ravi; Ohmacht, Martin

    2017-08-01

    In one embodiment, a computer-implemented method includes encountering a store operation during a compile-time of a program, where the store operation is applicable to a memory line. It is determined, by a computer processor, that no cache coherence action is necessary for the store operation. A store-without-coherence-action instruction is generated for the store operation, responsive to determining that no cache coherence action is necessary. The store-without-coherence-action instruction specifies that the store operation is to be performed without a cache coherence action, and cache coherence is maintained upon execution of the store-without-coherence-action instruction.

  17. Long-term moderate elevation of corticosterone facilitates avian food-caching behaviour and enhances spatial memory.

    PubMed

    Pravosudov, Vladimir V

    2003-12-22

    It is widely assumed that chronic stress and corresponding chronic elevations of glucocorticoid levels have deleterious effects on animals' brain functions such as learning and memory. Some animals, however, appear to maintain moderately elevated levels of glucocorticoids over long periods of time under natural energetically demanding conditions, and it is not clear whether such chronic but moderate elevations may be adaptive. I implanted wild-caught food-caching mountain chickadees (Poecile gambeli), which rely at least in part on spatial memory to find their caches, with 90-day continuous time-release corticosterone pellets designed to approximately double the baseline corticosterone levels. Corticosterone-implanted birds cached and consumed significantly more food and showed more efficient cache recovery and superior spatial memory performance compared with placebo-implanted birds. Thus, contrary to prevailing assumptions, long-term moderate elevations of corticosterone appear to enhance spatial memory in food-caching mountain chickadees. These results suggest that moderate chronic elevation of corticosterone may serve as an adaptation to unpredictable environments by facilitating feeding and food-caching behaviour and by improving cache-retrieval efficiency in food-caching birds.

  18. Controlled replication: reduce the capacity occupied by redundant replicas in tiled chip multiprocessors

    NASA Astrophysics Data System (ADS)

    Li, Hao; Xie, Lunguo

    2013-03-01

    The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.

  19. Optoelectronic-cache memory system architecture.

    PubMed

    Chiarulli, D M; Levitan, S P

    1996-05-10

    We present an investigation of the architecture of an optoelectronic cache that can integrate terabit optical memories with the electronic caches associated with high-performance uniprocessors and multiprocessors. The use of optoelectronic-cache memories enables these terabit technologies to provide transparently low-latency secondary memory with frame sizes comparable with disk pages but with latencies that approach those of electronic secondary-cache memories. This enables the implementation of terabit memories with effective access times comparable with the cycle times of current microprocessors. The cache design is based on the use of a smart-pixel array and combines parallel free-space optical input-output to-and-from optical memory with conventional electronic communication to the processor caches. This cache and the optical memory system to which it will interface provide a large random-access memory space that has a lower overall latency than that of magnetic disks and disk arrays. In addition, as a consequence of the high-bandwidth parallel input-output capabilities of optical memories, fault service times for the optoelectronic cache are substantially less than those currently achievable with any rotational media.

  20. A Survey Of Techniques for Managing and Leveraging Caches in GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    2014-09-01

    Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges ofmore » cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.« less

  1. Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

    PubMed Central

    Zheng, Da; Burns, Randal; Szalay, Alexander S.

    2013-01-01

    We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052

  2. Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

    PubMed

    Zheng, Da; Burns, Randal; Szalay, Alexander S

    2013-01-01

    We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.

  3. Study of cache performance in distributed environment for data processing

    NASA Astrophysics Data System (ADS)

    Makatun, Dzmitry; Lauret, Jérôme; Šumbera, Michal

    2014-06-01

    Processing data in distributed environment has found its application in many fields of science (Nuclear and Particle Physics (NPP), astronomy, biology to name only those). Efficiently transferring data between sites is an essential part of such processing. The implementation of caching strategies in data transfer software and tools, such as the Reasoner for Intelligent File Transfer (RIFT) being developed in the STAR collaboration, can significantly decrease network load and waiting time by reusing the knowledge of data provenance as well as data placed in transfer cache to further expand on the availability of sources for files and data-sets. Though, a great variety of caching algorithms is known, a study is needed to evaluate which one can deliver the best performance in data access considering the realistic demand patterns. Records of access to the complete data-sets of NPP experiments were analyzed and used as input for computer simulations. Series of simulations were done in order to estimate the possible cache hits and cache hits per byte for known caching algorithms. The simulations were done for cache of different sizes within interval 0.001 - 90% of complete data-set and low-watermark within 0-90%. Records of data access were taken from several experiments and within different time intervals in order to validate the results. In this paper, we will discuss the different data caching strategies from canonical algorithms to hybrid cache strategies, present the results of our simulations for the diverse algorithms, debate and identify the choice for the best algorithm in the context of Physics Data analysis in NPP. While the results of those studies have been implemented in RIFT, they can also be used when setting up cache in any other computational work-flow (Cloud processing for example) or managing data storages with partial replicas of the entire data-set.

  4. Xrootd in dCache - design and experiences

    NASA Astrophysics Data System (ADS)

    Behrmann, Gerd; Ozerov, Dmitry; Zangerl, Thomas

    2011-12-01

    dCache is a well established distributed storage solution used in both high energy physics computing and other disciplines. An overview of the implementation of the xrootd data access protocol within dCache is presented. The performance of various access mechanisms is studied and compared and it is concluded that our implementation is as perfomant as other protocols. This makes dCache a compelling alternative to the Scalla software suite implementation of xrootd, with added value from broad protocol support, including the IETF approved NFS 4.1 protocol.

  5. The Effects of Block Size on the Performance of Coherent Caches in Shared-Memory Multiprocessors

    DTIC Science & Technology

    1993-05-01

    increase with the bandwidth and latency. For those applications with poor spatial locality, the best choice of cache line size is determined by the...observation was used in the design of two schemes: LimitLESS di- rectories and Tag caches. LimitLESS directories [15] were designed for the ALEWIFE...small packets may be used to avoid network congestion. The most important factor influencing the choice of cache line size for a multipro- cessor is the

  6. Store-operate-coherence-on-value

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Dong; Heidelberger, Philip; Kumar, Sameer

    A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the receivedmore » store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.« less

  7. Corvid re-caching without 'theory of mind': a model.

    PubMed

    van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K

    2012-01-01

    Scrub jays are thought to use many tactics to protect their caches. For instance, they predominantly bury food far away from conspecifics, and if they must cache while being watched, they often re-cache their worms later, once they are in private. Two explanations have been offered for such observations, and they are intensely debated. First, the birds may reason about their competitors' mental states, with a 'theory of mind'; alternatively, they may apply behavioral rules learned in daily life. Although this second hypothesis is cognitively simpler, it does seem to require a different, ad-hoc behavioral rule for every caching and re-caching pattern exhibited by the birds. Our new theory avoids this drawback by explaining a large variety of patterns as side-effects of stress and the resulting memory errors. Inspired by experimental data, we assume that re-caching is not motivated by a deliberate effort to safeguard specific caches from theft, but by a general desire to cache more. This desire is brought on by stress, which is determined by the presence and dominance of onlookers, and by unsuccessful recovery attempts. We study this theory in two experiments similar to those done with real birds with a kind of 'virtual bird', whose behavior depends on a set of basic assumptions about corvid cognition, and a well-established model of human memory. Our results show that the 'virtual bird' acts as the real birds did; its re-caching reflects whether it has been watched, how dominant its onlooker was, and how close to that onlooker it has cached. This happens even though it cannot attribute mental states, and it has only a single behavioral rule assumed to be previously learned. Thus, our simulations indicate that corvid re-caching can be explained without sophisticated social cognition. Given our specific predictions, our theory can easily be tested empirically.

  8. Corvid Re-Caching without ‘Theory of Mind’: A Model

    PubMed Central

    van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K.

    2012-01-01

    Scrub jays are thought to use many tactics to protect their caches. For instance, they predominantly bury food far away from conspecifics, and if they must cache while being watched, they often re-cache their worms later, once they are in private. Two explanations have been offered for such observations, and they are intensely debated. First, the birds may reason about their competitors' mental states, with a ‘theory of mind’; alternatively, they may apply behavioral rules learned in daily life. Although this second hypothesis is cognitively simpler, it does seem to require a different, ad-hoc behavioral rule for every caching and re-caching pattern exhibited by the birds. Our new theory avoids this drawback by explaining a large variety of patterns as side-effects of stress and the resulting memory errors. Inspired by experimental data, we assume that re-caching is not motivated by a deliberate effort to safeguard specific caches from theft, but by a general desire to cache more. This desire is brought on by stress, which is determined by the presence and dominance of onlookers, and by unsuccessful recovery attempts. We study this theory in two experiments similar to those done with real birds with a kind of ‘virtual bird’, whose behavior depends on a set of basic assumptions about corvid cognition, and a well-established model of human memory. Our results show that the ‘virtual bird’ acts as the real birds did; its re-caching reflects whether it has been watched, how dominant its onlooker was, and how close to that onlooker it has cached. This happens even though it cannot attribute mental states, and it has only a single behavioral rule assumed to be previously learned. Thus, our simulations indicate that corvid re-caching can be explained without sophisticated social cognition. Given our specific predictions, our theory can easily be tested empirically. PMID:22396799

  9. EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S.

    Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less

  10. EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

    DOE PAGES

    Mittal, Sparsh; Vetter, Jeffrey S.

    2015-01-29

    Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less

  11. An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

    NASA Astrophysics Data System (ADS)

    Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

    2018-03-01

    Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.

  12. Efficacy of Code Optimization on Cache-Based Processors

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Saphir, William C.; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    In this paper a number of techniques for improving the cache performance of a representative piece of numerical software is presented. Target machines are popular processors from several vendors: MIPS R5000 (SGI Indy), MIPS R8000 (SGI PowerChallenge), MIPS R10000 (SGI Origin), DEC Alpha EV4 + EV5 (Cray T3D & T3E), IBM RS6000 (SP Wide-node), Intel PentiumPro (Ames' Whitney), Sun UltraSparc (NERSC's NOW). The optimizations all attempt to increase the locality of memory accesses. But they meet with rather varied and often counterintuitive success on the different computing platforms. We conclude that it may be genuinely impossible to obtain portable performance on the current generation of cache-based machines. At the least, it appears that the performance of modern commodity processors cannot be described with parameters defining the cache alone.

  13. A search game model of the scatter hoarder's problem

    PubMed Central

    Alpern, Steve; Fokkink, Robbert; Lidbetter, Thomas; Clayton, Nicola S.

    2012-01-01

    Scatter hoarders are animals (e.g. squirrels) who cache food (nuts) over a number of sites for later collection. A certain minimum amount of food must be recovered, possibly after pilfering by another animal, in order to survive the winter. An optimal caching strategy is one that maximizes the survival probability, given worst case behaviour of the pilferer. We modify certain ‘accumulation games’ studied by Kikuta & Ruckle (2000 J. Optim. Theory Appl.) and Kikuta & Ruckle (2001 Naval Res. Logist.), which modelled the problem of optimal diversification of resources against catastrophic loss, to include the depth at which the food is hidden at each caching site. Optimal caching strategies can then be determined as equilibria in a new ‘caching game’. We show how the distribution of food over sites and the site-depths of the optimal caching varies with the animal's survival requirements and the amount of pilfering. We show that in some cases, ‘decoy nuts’ are required to be placed above other nuts that are buried further down at the same site. Methods from the field of search games are used. Some empirically observed behaviour can be shown to be optimal in our model. PMID:22012971

  14. Security Enhancement Using Cache Based Reauthentication in WiMAX Based E-Learning System

    PubMed Central

    Rajagopal, Chithra; Bhuvaneshwaran, Kalaavathi

    2015-01-01

    WiMAX networks are the most suitable for E-Learning through their Broadcast and Multicast Services at rural areas. Authentication of users is carried out by AAA server in WiMAX. In E-Learning systems the users must be forced to perform reauthentication to overcome the session hijacking problem. The reauthentication of users introduces frequent delay in the data access which is crucial in delaying sensitive applications such as E-Learning. In order to perform fast reauthentication caching mechanism known as Key Caching Based Authentication scheme is introduced in this paper. Even though the cache mechanism requires extra storage to keep the user credentials, this type of mechanism reduces the 50% of the delay occurring during reauthentication. PMID:26351658

  15. Security Enhancement Using Cache Based Reauthentication in WiMAX Based E-Learning System.

    PubMed

    Rajagopal, Chithra; Bhuvaneshwaran, Kalaavathi

    2015-01-01

    WiMAX networks are the most suitable for E-Learning through their Broadcast and Multicast Services at rural areas. Authentication of users is carried out by AAA server in WiMAX. In E-Learning systems the users must be forced to perform reauthentication to overcome the session hijacking problem. The reauthentication of users introduces frequent delay in the data access which is crucial in delaying sensitive applications such as E-Learning. In order to perform fast reauthentication caching mechanism known as Key Caching Based Authentication scheme is introduced in this paper. Even though the cache mechanism requires extra storage to keep the user credentials, this type of mechanism reduces the 50% of the delay occurring during reauthentication.

  16. Effects of demanding foraging conditions on cache retrival accuracy in food-caching mountain chickadees (Poecile gambeli).

    PubMed

    Pravosudov, V V; Clayton, N S

    2001-02-22

    Birds rely, at least in part, on spatial memory for recovering previously hidden caches but accurate cache recovery may be more critical for birds that forage in harsh conditions where the food supply is limited and unpredictable. Failure to find caches in these conditions may potentially result in death from starvation. In order to test this hypothesis we compared the cache recovery behaviour of 24 wild-caught mountain chickadees (Poecile gambeli), half of which were maintained on a limited and unpredictable food supply while the rest were maintained on an ad libitum food supply for 60 days. We then tested their cache retrieval accuracy by allowing birds from both groups to cache seeds in the experimental room and recover them 5 hours later. Our results showed that birds maintained on a limited and unpredictable food supply made significantly fewer visits to non-cache sites when recovering their caches compared to birds maintained on ad libitum food. We found the same difference in performance in two versions of a one-trial associative learning task in which the birds had to rely on memory to find previously encountered hidden food. In a non-spatial memory version of the task, in which the baited feeder was clearly marked, there were no significant differences between the two groups. We therefore concluded that the two groups differed in their efficiency at cache retrieval. We suggest that this difference is more likely to be attributable to a difference in memory (encoding or recall) than to a difference in their motivation to search for hidden food, although the possibility of some motivational differences still exists. Overall, our results suggest that demanding foraging conditions favour more accurate cache retrieval in food-caching birds.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Evangelinos, Constantinos; Nair, Ravi; Ohmacht, Martin

    In one embodiment, a computer-implemented method includes encountering a store operation during a compile-time of a program, where the store operation is applicable to a memory line. It is determined, by a computer processor, that no cache coherence action is necessary for the store operation. A store-without-coherence-action instruction is generated for the store operation, responsive to determining that no cache coherence action is necessary. The store-without-coherence-action instruction specifies that the store operation is to be performed without a cache coherence action, and cache coherence is maintained upon execution of the store-without-coherence-action instruction.

  18. Analysis of the Intel 386 and i486 microprocessors for the Space Station Freedom Data Management System

    NASA Technical Reports Server (NTRS)

    Liu, Yuan-Kwei

    1991-01-01

    The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/386 DX memory hierachy appears to be the most beneficial change to the current DMS design at this time.

  19. Analysis of the Intel 386 and i486 microprocessors for the Space Station Freedom Data Management System

    NASA Technical Reports Server (NTRS)

    Liu, Yuan-Kwei

    1991-01-01

    The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/387 DX memory hierarchy appears to be the most beneficial change to the current DMS design at this time.

  20. Using Solid State Disk Array as a Cache for LHC ATLAS Data Analysis

    NASA Astrophysics Data System (ADS)

    Yang, W.; Hanushevsky, A. B.; Mount, R. P.; Atlas Collaboration

    2014-06-01

    User data analysis in high energy physics presents a challenge to spinning-disk based storage systems. The analysis is data intense, yet reads are small, sparse and cover a large volume of data files. It is also unpredictable due to users' response to storage performance. We describe here a system with an array of Solid State Disk as a non-conventional, standalone file level cache in front of the spinning disk storage to help improve the performance of LHC ATLAS user analysis at SLAC. The system uses several days of data access records to make caching decisions. It can also use information from other sources such as a work-flow management system. We evaluate the performance of the system both in terms of caching and its impact on user analysis jobs. The system currently uses Xrootd technology, but the technique can be applied to any storage system.

  1. Formal verification of an MMU and MMU cache

    NASA Technical Reports Server (NTRS)

    Schubert, E. T.

    1991-01-01

    We describe the formal verification of a hardware subsystem consisting of a memory management unit and a cache. These devices are verified independently and then shown to interact correctly when composed. The MMU authorizes memory requests and translates virtual addresses to real addresses. The cache improves performance by maintaining a LRU (least recently used) list from the memory resident segment table.

  2. Cache Locality Optimization for Recursive Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lifflander, Jonathan; Krishnamoorthy, Sriram

    We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less

  3. PCM-Based Durable Write Cache for Fast Disk I/O

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Zhuo; Wang, Bin; Carpenter, Patrick

    2012-01-01

    Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of-the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCM-based caches, and develop amore » novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.« less

  4. Data Resilience in the dCache Storage System

    DOE PAGES

    Rossi, A. L.; Adeyemi, F.; Ashish, A.; ...

    2017-11-23

    In this study we discuss design, implementation considerations, and performance of a new Resilience Service in the dCache storage system responsible for file availability and durability functionality.

  5. AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S

    2014-01-01

    Recently, researchers have explored way-based hybrid SRAM-NVM (non-volatile memory) last level caches (LLCs) to bring the best of SRAM and NVM together. However, the limited write endurance of NVMs restricts the lifetime of these hybrid caches. We present AYUSH, a technique to enhance the lifetime of hybrid caches, which works by using data-migration to preferentially use SRAM for storing frequently-reused data. Microarchitectural simulations confirm that AYUSH achieves larger improvement in lifetime than a previous technique and also maintains performance and energy efficiency. For single, dual and quad-core workloads, the average increase in cache lifetime with AYUSH is 6.90X, 24.06X andmore » 47.62X, respectively.« less

  6. Novel dynamic caching for hierarchically distributed video-on-demand systems

    NASA Astrophysics Data System (ADS)

    Ogo, Kenta; Matsuda, Chikashi; Nishimura, Kazutoshi

    1998-02-01

    It is difficult to simultaneously serve the millions of video streams that will be needed in the age of 'Mega-Media' networks by using only one high-performance server. To distribute the service load, caching servers should be location near users. However, in previously proposed caching mechanisms, the grade of service depends on whether the data is already cached at a caching server. To make the caching servers transparent to the users, the ability to randomly access the large volume of data stored in the central server should be supported, and the operational functions of the provided service should not be narrowly restricted. We propose a mechanism for constructing a video-stream-caching server that is transparent to the users and that will always support all special playback functions for all available programs to all the contents with a latency of only 1 or 2 seconds. This mechanism uses Variable-sized-quantum-segment- caching technique derived from an analysis of the historical usage log data generated by a line-on-demand-type service experiment and based on the basic techniques used by a time- slot-based multiple-stream video-on-demand server.

  7. Evolution of magnetic disk subsystems

    NASA Astrophysics Data System (ADS)

    Kaneko, Satoru

    1994-06-01

    The higher recording density of magnetic disk realized today has brought larger storage capacity per unit and smaller form factors. If the required access performance per MB is constant, the performance of large subsystems has to be several times better. This article describes mainly the technology for improving the performance of the magnetic disk subsystems and the prospects of their future evolution. Also considered are 'crosscall pathing' which makes the data transfer channel more effective, 'disk cache' which improves performance coupling with solid state memory technology, and 'RAID' which improves the availability and integrity of disk subsystems by organizing multiple disk drives in a subsystem. As a result, it is concluded that since the performance of the subsystem is dominated by that of the disk cache, maximation of the performance of the disk cache subsystems is very important.

  8. Using shadow page cache to improve isolated drivers performance.

    PubMed

    Zheng, Hao; Dong, Xiaoshe; Wang, Endong; Chen, Baoke; Zhu, Zhengdong; Liu, Chengzhe

    2015-01-01

    With the advantage of the reusability property of the virtualization technology, users can reuse various types and versions of existing operating systems and drivers in a virtual machine, so as to customize their application environment. In order to prevent users' virtualization environments being impacted by driver faults in virtual machine, Chariot examines the correctness of driver's write operations by the method of combining a driver's write operation capture and a driver's private access control table. However, this method needs to keep the write permission of shadow page table as read-only, so as to capture isolated driver's write operations through page faults, which adversely affect the performance of the driver. Based on delaying setting frequently used shadow pages' write permissions to read-only, this paper proposes an algorithm using shadow page cache to improve the performance of isolated drivers and carefully study the relationship between the performance of drivers and the size of shadow page cache. Experimental results show that, through the shadow page cache, the performance of isolated drivers can be greatly improved without impacting Chariot's reliability too much.

  9. Using Shadow Page Cache to Improve Isolated Drivers Performance

    PubMed Central

    Dong, Xiaoshe; Wang, Endong; Chen, Baoke; Zhu, Zhengdong; Liu, Chengzhe

    2015-01-01

    With the advantage of the reusability property of the virtualization technology, users can reuse various types and versions of existing operating systems and drivers in a virtual machine, so as to customize their application environment. In order to prevent users' virtualization environments being impacted by driver faults in virtual machine, Chariot examines the correctness of driver's write operations by the method of combining a driver's write operation capture and a driver's private access control table. However, this method needs to keep the write permission of shadow page table as read-only, so as to capture isolated driver's write operations through page faults, which adversely affect the performance of the driver. Based on delaying setting frequently used shadow pages' write permissions to read-only, this paper proposes an algorithm using shadow page cache to improve the performance of isolated drivers and carefully study the relationship between the performance of drivers and the size of shadow page cache. Experimental results show that, through the shadow page cache, the performance of isolated drivers can be greatly improved without impacting Chariot's reliability too much. PMID:25815373

  10. No evidence for memory interference across sessions in food hoarding marsh tits Poecile palustris under laboratory conditions.

    PubMed

    Urhan, A Utku; Brodin, Anders

    2015-05-01

    Scatter hoarding birds are known for their accurate spatial memory. In a previous experiment, we tested the retrieval accuracy in marsh tits in a typical laboratory set-up for this species. We also tested the performance of humans in this experimental set-up. Somewhat unexpectedly, humans performed much better than marsh tits. In the first five attempts, humans relocated almost 90 % of the caches they had hidden 5 h earlier. Marsh tits only relocated 25 % in the first five attempts and just above 40 % in the first ten attempts. Typically, in this type of experiment, the birds will be caching and retrieving many times in the same sites in the same experimental room. This is very different from the conditions in nature where hoarding parids only cache once in a caching site. Hence, it is possible that memories from previous sessions will disturb the formation of new memories. If there is such proactive interference, the prediction is that success should decay over sessions. Here, we have designed an experiment to investigate whether there is such memory interference in this type of experiment. We allowed marsh tits and humans to cache and retrieve in three repeated sessions without prior experience of the arena. The performance did not change over sessions, and on average, marsh tits correctly visited around 25 % of the caches in the first five attempts. The corresponding success in humans was constant across sessions, and it was around 90 % on average. We conclude that the somewhat poor performance of the marsh tits did not depend on proactive memory interference. We also discuss other possible reasons for why marsh tits in general do not perform better in laboratory experiments.

  11. Accurate low-cost methods for performance evaluation of cache memory systems

    NASA Technical Reports Server (NTRS)

    Laha, Subhasis; Patel, Janak H.; Iyer, Ravishankar K.

    1988-01-01

    Methods of simulation based on statistical techniques are proposed to decrease the need for large trace measurements and for predicting true program behavior. Sampling techniques are applied while the address trace is collected from a workload. This drastically reduces the space and time needed to collect the trace. Simulation techniques are developed to use the sampled data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a concept of primed cache is introduced to simulate large caches by the sampling-based method.

  12. Value-Based Caching in Information-Centric Wireless Body Area Networks

    PubMed Central

    Al-Turjman, Fadi M.; Imran, Muhammad; Vasilakos, Athanasios V.

    2017-01-01

    We propose a resilient cache replacement approach based on a Value of sensed Information (VoI) policy. To resolve and fetch content when the origin is not available due to isolated in-network nodes (fragmentation) and harsh operational conditions, we exploit a content caching approach. Our approach depends on four functional parameters in sensory Wireless Body Area Networks (WBANs). These four parameters are: age of data based on periodic request, popularity of on-demand requests, communication interference cost, and the duration for which the sensor node is required to operate in active mode to capture the sensed readings. These parameters are considered together to assign a value to the cached data to retain the most valuable information in the cache for prolonged time periods. The higher the value, the longer the duration for which the data will be retained in the cache. This caching strategy provides significant availability for most valuable and difficult to retrieve data in the WBANs. Extensive simulations are performed to compare the proposed scheme against other significant caching schemes in the literature while varying critical aspects in WBANs (e.g., data popularity, cache size, publisher load, connectivity-degree, and severe probabilities of node failures). These simulation results indicate that the proposed VoI-based approach is a valid tool for the retrieval of cached content in disruptive and challenging scenarios, such as the one experienced in WBANs, since it allows the retrieval of content for a long period even while experiencing severe in-network node failures. PMID:28106817

  13. Tier 3 batch system data locality via managed caches

    NASA Astrophysics Data System (ADS)

    Fischer, Max; Giffels, Manuel; Jung, Christopher; Kühn, Eileen; Quast, Günter

    2015-05-01

    Modern data processing increasingly relies on data locality for performance and scalability, whereas the common HEP approaches aim for uniform resource pools with minimal locality, recently even across site boundaries. To combine advantages of both, the High- Performance Data Analysis (HPDA) Tier 3 concept opportunistically establishes data locality via coordinated caches. In accordance with HEP Tier 3 activities, the design incorporates two major assumptions: First, only a fraction of data is accessed regularly and thus the deciding factor for overall throughput. Second, data access may fallback to non-local, making permanent local data availability an inefficient resource usage strategy. Based on this, the HPDA design generically extends available storage hierarchies into the batch system. Using the batch system itself for scheduling file locality, an array of independent caches on the worker nodes is dynamically populated with high-profile data. Cache state information is exposed to the batch system both for managing caches and scheduling jobs. As a result, users directly work with a regular, adequately sized storage system. However, their automated batch processes are presented with local replications of data whenever possible.

  14. Sparse Partial Equilibrium Tables in Chemically Resolved Reactive Flow

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vitello, P; Fried, L E; Pudliner, B

    2003-07-14

    The detonation of an energetic material is the result of a complex interaction between kinetic chemical reactions and hydrodynamics. Unfortunately, little is known concerning the detailed chemical kinetics of detonations in energetic materials. CHEETAH uses rate laws to treat species with the slowest chemical reactions, while assuming other chemical species are in equilibrium. CHEETAH supports a wide range of elements and condensed detonation products and can also be applied to gas detonations. A sparse hash table of equation of state values, called the ''cache'' is used in CHEETAH to enhance the efficiency of kinetic reaction calculations. For large-scale parallel hydrodynamicmore » calculations, CHEETAH uses MPI communication to updates to the cache. We present here details of the sparse caching model used in the CHEETAH. To demonstrate the efficiency of modeling using a sparse cache model we consider detonations in energetic materials.« less

  15. Sex, estradiol, and spatial memory in a food-caching corvid.

    PubMed

    Rensel, Michelle A; Ellis, Jesse M S; Harvey, Brigit; Schlinger, Barney A

    2015-09-01

    Estrogens significantly impact spatial memory function in mammalian species. Songbirds express the estrogen synthetic enzyme aromatase at relatively high levels in the hippocampus and there is evidence from zebra finches that estrogens facilitate performance on spatial learning and/or memory tasks. It is unknown, however, whether estrogens influence hippocampal function in songbirds that naturally exhibit memory-intensive behaviors, such as cache recovery observed in many corvid species. To address this question, we examined the impact of estradiol on spatial memory in non-breeding Western scrub-jays, a species that routinely participates in food caching and retrieval in nature and in captivity. We also asked if there were sex differences in performance or responses to estradiol. Utilizing a combination of an aromatase inhibitor, fadrozole, with estradiol implants, we found that while overall cache recovery rates were unaffected by estradiol, several other indices of spatial memory, including searching efficiency and efficiency to retrieve the first item, were impaired in the presence of estradiol. In addition, males and females differed in some performance measures, although these differences appeared to be a consequence of the nature of the task as neither sex consistently out-performed the other. Overall, our data suggest that a sustained estradiol elevation in a food-caching bird impairs some, but not all, aspects of spatial memory on an innate behavioral task, at times in a sex-specific manner. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. SEX, ESTRADIOL, AND SPATIAL MEMORY IN A FOOD-CACHING CORVID

    PubMed Central

    Rensel, Michelle A.; Ellis, Jesse M.S.; Harvey, Brigit; Schlinger, Barney A.

    2015-01-01

    Estrogens significantly impact spatial memory function in mammalian species. Songbirds express the estrogen synthetic enzyme aromatase at relatively high levels in the hippocampus and there is evidence from zebra finches that estrogens facilitate performance on spatial learning and/or memory tasks. It is unknown, however, whether estrogens influence hippocampal function in songbirds that naturally exhibit memory-intensive behaviors, such as cache recovery observed in many corvid species. To address this question, we examined the impact of estradiol on spatial memory in non-breeding Western scrub-jays, a species that routinely participates in food caching and retrieval in nature and in captivity. We also asked if there were sex differences in performance or responses to estradiol. Utilizing a combination of an aromatase inhibitor, fadrozole, with estradiol implants, we found that while overall cache recovery rates were unaffected by estradiol, several other indices of spatial memory, including searching efficiency and efficiency to retrieve the first item, were impaired in the presence of estradiol. In addition, males and females differed in some performance measures, although these differences appeared to be a consequence of the nature of the task as neither sex consistently out-performed the other. Overall, our data suggest that a sustained estradiol elevation in a food-caching bird impairs some, but not all, aspects of spatial memory on an innate behavioral task, at times in a sex-specific manner. PMID:26232613

  17. Performance of hashed cache data migration schemes on multicomputers

    NASA Technical Reports Server (NTRS)

    Hiranandani, Seema; Saltz, Joel; Mehrotra, Piyush; Berryman, Harry

    1991-01-01

    After conducting an examination of several data-migration mechanisms which permit an explicit and controlled mapping of data to memory, a set of schemes for storage and retrieval of off-processor array elements is experimentally evaluated and modeled. All schemes considered have their basis in the use of hash tables for efficient access of nonlocal data. The techniques in question are those of hashed cache, partial enumeration, and full enumeration; in these, nonlocal data are stored in hash tables, so that the operative difference lies in the amount of memory used by each scheme and in the retrieval mechanism used for nonlocal data.

  18. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    NASA Astrophysics Data System (ADS)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  19. A test of the adaptive specialization hypothesis: population differences in caching, memory, and the hippocampus in black-capped chickadees (Poecile atricapilla).

    PubMed

    Pravosudov, Vladimir V; Clayton, Nicola S

    2002-08-01

    To test the hypothesis that accurate cache recovery is more critical for birds that live in harsh conditions where the food supply is limited and unpredictable, the authors compared food caching, memory, and the hippocampus of black-capped chickadees (Poecile atricapilla) from Alaska and Colorado. Under identical laboratory conditions, Alaska chickadees (a) cached significantly more food; (b) were more efficient at cache recovery: (c) performed more accurately on one-trial associative learning tasks in which birds had to rely on spatial memory, but did not differ when tested on a nonspatial version of this task; and (d) had significantly larger hippocampal volumes containing more neurons compared with Colorado chickadees. The results support the hypothesis that these population differences may reflect adaptations to a harsh environment.

  20. Modified stretched exponential model of computer system resources management limitations-The case of cache memory

    NASA Astrophysics Data System (ADS)

    Strzałka, Dominik; Dymora, Paweł; Mazurek, Mirosław

    2018-02-01

    In this paper we present some preliminary results in the field of computer systems management with relation to Tsallis thermostatistics and the ubiquitous problem of hardware limited resources. In the case of systems with non-deterministic behaviour, management of their resources is a key point that guarantees theirs acceptable performance and proper working. This is very wide problem that stands for many challenges in financial, transport, water and food, health, etc. areas. We focus on computer systems with attention paid to cache memory and propose to use an analytical model that is able to connect non-extensive entropy formalism, long-range dependencies, management of system resources and queuing theory. Obtained analytical results are related to the practical experiment showing interesting and valuable results.

  1. Cache Energy Optimization Techniques For Modern Processors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    2013-01-01

    Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In thismore » book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both newcomers and veterans in the field of cache power management. It will help graduate students, CAD tool developers and designers in understanding the need of energy efficiency in modern computing systems. Further, it will be useful for researchers in gaining insights into algorithms and techniques for micro-architectural and system-level energy optimization using dynamic cache reconfiguration. We sincerely believe that the ``food for thought'' presented in this book will inspire the readers to develop even better ideas for designing ``green'' processors of tomorrow.« less

  2. Constant time worker thread allocation via configuration caching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eichenberger, Alexandre E; O'Brien, John K. P.

    Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.

  3. Replication Strategy for Spatiotemporal Data Based on Distributed Caching System

    PubMed Central

    Xiong, Lian; Tao, Yang; Xu, Juan; Zhao, Lun

    2018-01-01

    The replica strategy in distributed cache can effectively reduce user access delay and improve system performance. However, developing a replica strategy suitable for varied application scenarios is still quite challenging, owing to differences in user access behavior and preferences. In this paper, a replication strategy for spatiotemporal data (RSSD) based on a distributed caching system is proposed. By taking advantage of the spatiotemporal locality and correlation of user access, RSSD mines high popularity and associated files from historical user access information, and then generates replicas and selects appropriate cache node for placement. Experimental results show that the RSSD algorithm is simple and efficient, and succeeds in significantly reducing user access delay. PMID:29342897

  4. Horizontally scaling dChache SRM with the Terracotta platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perelmutov, T.; Crawford, M.; Moibenko, A.

    2011-01-01

    The dCache disk caching file system has been chosen by a majority of LHC experiments Tier 1 centers for their data storage needs. It is also deployed at many Tier 2 centers. The Storage Resource Manager (SRM) is a standardized grid storage interface and a single point of remote entry into dCache, and hence is a critical component. SRM must scale to increasing transaction rates and remain resilient against changing usage patterns. The initial implementation of the SRM service in dCache suffered from an inability to support clustered deployment, and its performance was limited by the hardware of a singlemore » node. Using the Terracotta platform, we added the ability to horizontally scale the dCache SRM service to run on multiple nodes in a cluster configuration, coupled with network load balancing. This gives site administrators the ability to increase the performance and reliability of SRM service to face the ever-increasing requirements of LHC data handling. In this paper we will describe the previous limitations of the architecture SRM server and how the Terracotta platform allowed us to readily convert single node service into a highly scalable clustered application.« less

  5. Performance assessment of EMR systems based on post-relational database.

    PubMed

    Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji

    2012-08-01

    Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.

  6. Cache domains that are homologous to, but different from PAS domains comprise the largest superfamily of extracellular sensors in prokaryotes

    DOE PAGES

    Upadhyay, Amit A.; Fleetwood, Aaron D.; Adebali, Ogun; ...

    2016-04-06

    Cellular receptors usually contain a designated sensory domain that recognizes the signal. Per/Arnt/Sim (PAS) domains are ubiquitous sensors in thousands of species ranging from bacteria to humans. Although PAS domains were described as intracellular sensors, recent structural studies revealed PAS-like domains in extracytoplasmic regions in several transmembrane receptors. However, these structurally defined extracellular PAS-like domains do not match sequence-derived PAS domain models, and thus their distribution across the genomic landscape remains largely unknown. Here we show that structurally defined extracellular PAS-like domains belong to the Cache superfamily, which is homologous to, but distinct from the PAS superfamily. Our newly builtmore » computational models enabled identification of Cache domains in tens of thousands of signal transduction proteins including those from important pathogens and model organisms.Moreover, we show that Cache domains comprise the dominant mode of extracellular sensing in prokaryotes.« less

  7. Cache domains that are homologous to, but different from PAS domains comprise the largest superfamily of extracellular sensors in prokaryotes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Upadhyay, Amit A.; Fleetwood, Aaron D.; Adebali, Ogun

    Cellular receptors usually contain a designated sensory domain that recognizes the signal. Per/Arnt/Sim (PAS) domains are ubiquitous sensors in thousands of species ranging from bacteria to humans. Although PAS domains were described as intracellular sensors, recent structural studies revealed PAS-like domains in extracytoplasmic regions in several transmembrane receptors. However, these structurally defined extracellular PAS-like domains do not match sequence-derived PAS domain models, and thus their distribution across the genomic landscape remains largely unknown. Here we show that structurally defined extracellular PAS-like domains belong to the Cache superfamily, which is homologous to, but distinct from the PAS superfamily. Our newly builtmore » computational models enabled identification of Cache domains in tens of thousands of signal transduction proteins including those from important pathogens and model organisms.Moreover, we show that Cache domains comprise the dominant mode of extracellular sensing in prokaryotes.« less

  8. Research on computer systems benchmarking

    NASA Technical Reports Server (NTRS)

    Smith, Alan Jay (Principal Investigator)

    1996-01-01

    This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.

  9. Improving energy efficiency of Embedded DRAM Caches for High-end Computing Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong

    2014-01-01

    With increasing system core-count, the size of last level cache (LLC) has increased and since SRAM consumes high leakage power, power consumption of LLCs is becoming a significant fraction of processor power consumption. To address this, researchers have used embedded DRAM (eDRAM) LLCs which consume low-leakage power. However, eDRAM caches consume a significant amount of energy in the form of refresh energy. In this paper, we propose ESTEEM, an energy saving technique for embedded DRAM caches. ESTEEM uses dynamic cache reconfiguration to turn-off a portion of the cache to save both leakage and refresh energy. It logically divides the cachemore » sets into multiple modules and turns-off possibly different number of ways in each module. Microarchitectural simulations confirm that ESTEEM is effective in improving performance and energy efficiency and provides better results compared to a recently-proposed eDRAM cache energy saving technique, namely Refrint. For single and dual-core simulations, the average saving in memory subsystem (LLC+main memory) on using ESTEEM is 25.8% and 32.6%, respectively and average weighted speedup are 1.09X and 1.22X, respectively. Additional experiments confirm that ESTEEM works well for a wide-range of system parameters.« less

  10. GPU-Accelerated Forward and Back-Projections with Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction.

    PubMed

    Ha, S; Matej, S; Ispiryan, M; Mueller, K

    2013-02-01

    We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.

  11. GPU-Accelerated Forward and Back-Projections With Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction

    NASA Astrophysics Data System (ADS)

    Ha, S.; Matej, S.; Ispiryan, M.; Mueller, K.

    2013-02-01

    We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.

  12. Prolonging the arctic pulse: long-term exploitation of cached eggs by arctic foxes when lemmings are scarce.

    PubMed

    Samelius, Gustaf; Alisauskas, Ray T; Hobson, Keith A; Larivière, Serge

    2007-09-01

    1. Many ecosystems are characterized by pulses of dramatically higher than normal levels of foods (pulsed resources) to which animals often respond by caching foods for future use. However, the extent to which animals use cached foods and how this varies in relation to fluctuations in other foods is poorly understood in most animals. 2. Arctic foxes Alopex lagopus (L.) cache thousands of eggs annually at large goose colonies where eggs are often superabundant during the nesting period by geese. We estimated the contribution of cached eggs to arctic fox diets in spring and autumn, when geese were not present in the study area, by comparing stable isotope ratios (delta(13)C and delta(15)N) of fox tissues with those of their foods using a multisource mixing model in Program IsoSource. 3. The contribution of cached eggs to arctic fox diets was inversely related to collared lemming Dicrostonyx groenlandicus (Traill) abundance; the contribution of cached eggs to overall fox diets increased from < 28% in years when collared lemmings were abundant to 30-74% in years when collared lemmings were scarce. 4. Further, arctic foxes used cached eggs well into the following spring (almost 1 year after eggs were acquired) - a pattern that differs from that of carnivores generally storing foods for only a few days before consumption. 5. This study showed that long-term use of eggs that were cached when geese were superabundant at the colony in summer varied with fluctuations in collared lemming abundance (a key component in arctic fox diets throughout most of their range) and suggests that cached eggs functioned as a buffer when collared lemmings were scarce.

  13. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

    NASA Astrophysics Data System (ADS)

    Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

    2018-02-01

    Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

  14. Minimizing End-to-End Interference in I/O Stacks Spanning Shared Multi-Level Buffer Caches

    ERIC Educational Resources Information Center

    Patrick, Christina M.

    2011-01-01

    This thesis presents an end-to-end interference minimizing uniquely designed high performance I/O stack that spans multi-level shared buffer cache hierarchies accessing shared I/O servers to deliver a seamless high performance I/O stack. In this thesis, I show that I can build a superior I/O stack which minimizes the inter-application interference…

  15. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation.

    PubMed

    Pezzulo, Giovanni; Rigoli, Francesco; Chersi, Fabian

    2013-01-01

    Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available "cached" value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated "Value of Information" exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus - ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation.

  16. Cache and energy efficient algorithms for Nussinov's RNA Folding.

    PubMed

    Zhao, Chunchun; Sahni, Sartaj

    2017-12-06

    An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.

  17. DSP code optimization based on cache

    NASA Astrophysics Data System (ADS)

    Xu, Chengfa; Li, Chengcheng; Tang, Bin

    2013-03-01

    DSP program's running efficiency on board is often lower than which via the software simulation during the program development, which is mainly resulted from the user's improper use and incomplete understanding of the cache-based memory. This paper took the TI TMS320C6455 DSP as an example, analyzed its two-level internal cache, and summarized the methods of code optimization. Processor can achieve its best performance when using these code optimization methods. At last, a specific algorithm application in radar signal processing is proposed. Experiment result shows that these optimization are efficient.

  18. Using Minimum-Surface Bodies for Iteration Space Partitioning

    NASA Technical Reports Server (NTRS)

    Frumlin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A number of known techniques for improving cache performance in scientific computations involve the reordering of the iteration space. Some of these reorderings can be considered as coverings of the iteration space with the sets having good surface-to-volume ratio. Use of such sets reduces the number of cache misses in computations of local operators having the iteration space as a domain. We study coverings of iteration spaces represented by structured and unstructured grids. For structured grids we introduce a covering based on successive minima tiles of the interference lattice of the grid. We show that the covering has good surface-to-volume ratio and present a computer experiment showing actual reduction of the cache misses achieved by using these tiles. For unstructured grids no cache efficient covering can be guaranteed. We present a triangulation of a 3-dimensional cube such that any local operator on the corresponding grid has significantly larger number of cache misses than a similar operator on a structured grid.

  19. Minimizing Cache Misses Using Minimum-Surface Bodies

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; VanderWijngaart, Rob; Biegel, Bryan (Technical Monitor)

    2002-01-01

    A number of known techniques for improving cache performance in scientific computations involve the reordering of the iteration space. Some of these reorderings can be considered as coverings of the iteration space with the sets having good surface-to-volume ratio. Use of such sets reduces the number of cache misses in computations of local operators having the iteration space as a domain. First, we derive lower bounds which any algorithm must suffer while computing a local operator on a grid. Then we explore coverings of iteration spaces represented by structured and unstructured grids which allow us to approach these lower bounds. For structured grids we introduce a covering by successive minima tiles of the interference lattice of the grid. We show that the covering has low surface-to-volume ratio and present a computer experiment showing actual reduction of the cache misses achieved by using these tiles. For planar unstructured grids we show existence of a covering which reduces the number of cache misses to the level of structured grids. On the other hand, we present a triangulation of a 3-dimensional cube such that any local operator on the corresponding grid has significantly larger number of cache misses than a similar operator on a structured grid.

  20. Random Fill Cache Architecture (Preprint)

    DTIC Science & Technology

    2014-10-01

    a concrete example, we show how the cache collision attack works to extract the AES encryption keys (e.g., in the OpenSSL implementation of AES). AES...each round are implemented as table lookups for performance reasons. OpenSSL uses ten 1-KB lookup tables, five for encryption and five for decryption

  1. Parallel Semi-Implicit Spectral Element Atmospheric Model

    NASA Astrophysics Data System (ADS)

    Fournier, A.; Thomas, S.; Loft, R.

    2001-05-01

    The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.

  2. Towards Transparent Throughput Elasticity for IaaS Cloud Storage: Exploring the Benefits of Adaptive Block-Level Caching

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nicolae, Bogdan; Riteau, Pierre; Keahey, Kate

    Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. The authors' proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and thus more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored. Furthermore, they introduce a performance and cost prediction methodology that can be used both independently tomore » estimate in advance what trade-off between performance and cost is possible, as well as an optimization technique that enables better cache size selection to meet the desired performance level with minimal cost. The authors demonstrate the benefits of their proposal both for microbenchmarks and for two real-life applications using large-scale experiments.« less

  3. On the Efficacy of Source Code Optimizations for Cache-Based Systems

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Saphir, William C.

    1998-01-01

    Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.

  4. On the Efficacy of Source Code Optimizations for Cache-Based Systems

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.

  5. Federated or cached searches: Providing expected performance from multiple invasive species databases

    NASA Astrophysics Data System (ADS)

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-06-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search "deep" web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  6. Federated or cached searches: providing expected performance from multiple invasive species databases

    USGS Publications Warehouse

    Graham, Jim; Jarnevich, Catherine S.; Simpson, Annie; Newman, Gregory J.; Stohlgren, Thomas J.

    2011-01-01

    Invasive species are a universal global problem, but the information to identify them, manage them, and prevent invasions is stored around the globe in a variety of formats. The Global Invasive Species Information Network is a consortium of organizations working toward providing seamless access to these disparate databases via the Internet. A distributed network of databases can be created using the Internet and a standard web service protocol. There are two options to provide this integration. First, federated searches are being proposed to allow users to search “deep” web documents such as databases for invasive species. A second method is to create a cache of data from the databases for searching. We compare these two methods, and show that federated searches will not provide the performance and flexibility required from users and a central cache of the datum are required to improve performance.

  7. Achieving High Performance on the i860 Microprocessor

    NASA Technical Reports Server (NTRS)

    Lee, King; Kutler, Paul (Technical Monitor)

    1998-01-01

    The i860 is a high performance microprocessor used in the Intel Touchstone project. This paper proposes a paradigm for programming the i860 that is modelled on the vector instructions of the Cray computers. Fortran callable assembler subroutines were written that mimic the concurrent vector instructions of the Cray. Cache takes the place of vector registers. Using this paradigm we have achieved twice the performance of compiled code on a traditional solve.

  8. Resource-Efficient Data-Intensive System Designs for High Performance and Capacity

    DTIC Science & Technology

    2015-09-01

    76, 79, 80, and 81.] [9] Anirudh Badam, KyoungSoo Park, Vivek S. Pai, and Larry L. Peterson. HashCache: cache storage for the next billion. In Proc...Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows , Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A

  9. Population-based analysis of Alzheimer's disease risk alleles implicates genetic interactions.

    PubMed

    Ebbert, Mark T W; Ridge, Perry G; Wilson, Andrew R; Sharp, Aaron R; Bailey, Matthew; Norton, Maria C; Tschanz, JoAnn T; Munger, Ronald G; Corcoran, Christopher D; Kauwe, John S K

    2014-05-01

    Reported odds ratios and population attributable fractions (PAF) for late-onset Alzheimer's disease (LOAD) risk loci (BIN1, ABCA7, CR1, MS4A4E, CD2AP, PICALM, MS4A6A, CD33, and CLU) come from clinically ascertained samples. Little is known about the combined PAF for these LOAD risk alleles and the utility of these combined markers for case-control prediction. Here we evaluate these loci in a large population-based sample to estimate PAF and explore the effects of additive and nonadditive interactions on LOAD status prediction performance. 2419 samples from the Cache County Memory Study were genotyped for APOE and nine LOAD risk loci from AlzGene.org. We used logistic regression and receiver operator characteristic analysis to assess the LOAD status prediction performance of these loci using additive and nonadditive models and compared odds ratios and PAFs between AlzGene.org and Cache County. Odds ratios were comparable between Cache County and AlzGene.org when identical single nucleotide polymorphisms were genotyped. PAFs from AlzGene.org ranged from 2.25% to 37%; those from Cache County ranged from .05% to 20%. Including non-APOE alleles significantly improved LOAD status prediction performance (area under the curve = .80) over APOE alone (area under the curve = .78) when not constrained to an additive relationship (p < .03). We identified potential allelic interactions (p values uncorrected): CD33-MS4A4E (synergy factor = 5.31; p < .003) and CLU-MS4A4E (synergy factor = 3.81; p < .016). Although nonadditive interactions between loci significantly improve diagnostic ability, the improvement does not reach the desired sensitivity or specificity for clinical use. Nevertheless, these results suggest that understanding gene-gene interactions may be important in resolving Alzheimer's disease etiology. Copyright © 2014 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  10. Is random access memory random?

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.

  11. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  12. The ontogeny of food-caching behaviour in New Zealand robins (Petroica longipes).

    PubMed

    Clark, Lisabertha L; Shaw, Rachael C

    2018-06-01

    Hoarding or caching behaviour is a widely-used paradigm for examining a range of cognitive processes in birds, such as social cognition and spatial memory. However, much is still unknown about how caching develops in young birds, especially in the wild. Studying the ontogeny of caching in the wild will help researchers to identify the mechanisms that shape this advantageous foraging strategy. We examined the ontogeny of food caching behaviour in a wild New Zealand passerine, the North Island robin (Petroica longipes). For 12-weeks following fledging, we observed 34 juveniles to examine the development of caching and cache retrieval. Additionally, we compared the caching behaviour of juveniles at 12 weeks post-fledging to 35 adult robins to determine whether juveniles had developed adult-like caching behaviour by this age. Juveniles began caching mealworms shortly after achieving foraging independency. Multivariate analyses revealed that caching rate increased and handling time decreased with increasing age. Juveniles spontaneously began retrieving caches as soon as they had begun to cache and their retrieval rates then remained constant throughout their ensuing development. Likewise, the number of sites used by juveniles did not change with age. Juvenile sex, caregiver sex and the duration of post-fledging parental care did not influence the development of caching, cache retrieval, the number of cache sites used and the time juveniles spent handling mealworms. At 12 weeks post-fledging, juveniles demonstrated levels of caching, cache retrieval and cache site usage that were comparable to adults. However, juvenile prey handling time was still longer than adults. The spontaneous emergence of cache retrieval and the consistency in the number of cache sites used throughout development suggests that these aspects of caching in North Island robins are likely to be innate, but that age and experience have an important role in the development of adult caching behaviours. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Clark’s Nutcrackers (Nucifraga columbiana) Flexibly Adapt Caching Behavior to a Cooperative Context

    PubMed Central

    Clary, Dawson; Kelly, Debbie M.

    2016-01-01

    Corvids recognize when their caches are at risk of being stolen by others and have developed strategies to protect these caches from pilferage. For instance, Clark’s nutcrackers will suppress the number of caches they make if being observed by a potential thief. However, cache protection has most often been studied using competitive contexts, so it is unclear whether corvids can adjust their caching in beneficial ways to accommodate non-competitive situations. Therefore, we examined whether Clark’s nutcrackers, a non-social corvid, would flexibly adapt their caching behaviors to a cooperative context. To do so, birds were given a caching task during which caches made by one individual were reciprocally exchanged for the caches of a partner bird over repeated trials. In this scenario, if caching behaviors can be flexibly deployed, then the birds should recognize the cooperative nature of the task and maintain or increase caching levels over time. However, if cache protection strategies are applied independent of social context and simply in response to cache theft, then cache suppression should occur. In the current experiment, we found that the birds maintained caching throughout the experiment. We report that males increased caching in response to a manipulation in which caches were artificially added, suggesting the birds could adapt to the cooperative nature of the task. Additionally, we show that caching decisions were not solely due to motivational factors, instead showing an additional influence attributed to the behavior of the partner bird. PMID:27826273

  14. The Use of Proxy Caches for File Access in a Multi-Tier Grid Environment

    NASA Astrophysics Data System (ADS)

    Brun, R.; Duellmann, D.; Ganis, G.; Hanushevsky, A.; Janyst, L.; Peters, A. J.; Rademakers, F.; Sindrilaru, E.

    2011-12-01

    The use of proxy caches has been extensively studied in the HEP environment for efficient access of database data and showed significant performance with only very moderate operational effort at higher grid tiers (T2, T3). In this contribution we propose to apply the same concept to the area of file access and analyse the possible performance gains, operational impact on site services and applicability to different HEP use cases. Base on a proof-of-concept studies with a modified XROOT proxy server we review the cache efficiency and overheads for access patterns of typical ROOT based analysis programs. We conclude with a discussion of the potential role of this new component at the different tiers of a distributed computing grid.

  15. The Use of Proxy Caches for File Access in a Multi-Tier Grid Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brun, R.; Dullmann, D.; Ganis, G.

    2012-04-19

    The use of proxy caches has been extensively studied in the HEP environment for efficient access of database data and showed significant performance with only very moderate operational effort at higher grid tiers (T2, T3). In this contribution we propose to apply the same concept to the area of file access and analyze the possible performance gains, operational impact on site services and applicability to different HEP use cases. Base on a proof-of-concept studies with a modified XROOT proxy server we review the cache efficiency and overheads for access patterns of typical ROOT based analysis programs. We conclude with amore » discussion of the potential role of this new component at the different tiers of a distributed computing grid.« less

  16. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  17. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE PAGES

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut; ...

    2013-01-01

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  18. Soft-core processor study for node-based architectures.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James

    2008-09-01

    Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less

  19. Cooperation and information replication in wireless networks.

    PubMed

    Poularakis, Konstantinos; Tassiulas, Leandros

    2016-03-06

    A significant portion of today's network traffic is due to recurring downloads of a few popular contents. It has been observed that replicating the latter in caches installed at network edges-close to users-can drastically reduce network bandwidth usage and improve content access delay. Such caching architectures are gaining increasing interest in recent years as a way of dealing with the explosive traffic growth, fuelled further by the downward slope in storage space price. In this work, we provide an overview of caching with a particular emphasis on emerging network architectures that enable caching at the radio access network. In this context, novel challenges arise due to the broadcast nature of the wireless medium, which allows simultaneously serving multiple users tuned into a multicast stream, and the mobility of the users who may be frequently handed off from one cell tower to another. Existing results indicate that caching at the wireless edge has a great potential in removing bottlenecks on the wired backbone networks. Taking into consideration the schedule of multicast service and mobility profiles is crucial to extract maximum benefit in network performance. © 2016 The Author(s).

  20. Clark's nutcracker spatial memory: the importance of large, structural cues.

    PubMed

    Bednekoff, Peter A; Balda, Russell P

    2014-02-01

    Clark's nutcrackers, Nucifraga columbiana, cache and recover stored seeds in high alpine areas including areas where snowfall, wind, and rockslides may frequently obscure or alter cues near the cache site. Previous work in the laboratory has established that Clark's nutcrackers use spatial memory to relocate cached food. Following from aspects of this work, we performed experiments to test the importance of large, structural cues for Clark's nutcracker spatial memory. Birds were no more accurate in recovering caches when more objects were on the floor of a large experimental room nor when this room was subdivided with a set of panels. However, nutcrackers were consistently less accurate in this large room than in a small experimental room. Clark's nutcrackers probably use structural features of experimental rooms as important landmarks during recovery of cached food. This use of large, extremely stable cues may reflect the imperfect reliability of smaller, closer cues in the natural habitat of Clark's nutcrackers. This article is part of a Special Issue entitled: CO3 2013. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    NASA Astrophysics Data System (ADS)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  2. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  3. DESTINY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2015-03-10

    DESTINY is a comprehensive tool for modeling 3D and 2D cache designs using SRAM,embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM), and phase change RAM (PCN). In its purpose, it is similar to CACTI, CACTI-3DD or NVSim. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g. latency, area or energy-delay product) for agiven memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a given optimization target, etc. DESTINY has been validated against several cache prototypes. DESTINY is expected to boost studies ofmore » next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers.« less

  4. Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.

    PubMed

    Ferreira, Miguel; Roma, Nuno; Russo, Luis M S

    2014-05-30

    HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.

  5. Chaining for Flexible and High-Performance Key-Value Systems

    DTIC Science & Technology

    2012-09-01

    store that is fault tolerant achieves high performance and availability, and offers strong data consistency? We present a new replication protocol...effective high performance data access and analytics, many sites use simpler data model “ NoSQL ” systems. ese systems store and retrieve data only by...DRAM, Flash, and disk-based storage; can act as an unreliable cache or a durable store ; and can offer strong or weak data consistency. e value of

  6. Distributed performance counters

    DOEpatents

    Davis, Kristan D; Evans, Kahn C; Gara, Alan; Satterfield, David L

    2013-11-26

    A plurality of first performance counter modules is coupled to a plurality of processing cores. The plurality of first performance counter modules is operable to collect performance data associated with the plurality of processing cores respectively. A plurality of second performance counter modules are coupled to a plurality of L2 cache units, and the plurality of second performance counter modules are operable to collect performance data associated with the plurality of L2 cache units respectively. A central performance counter module may be operable to coordinate counter data from the plurality of first performance counter modules and the plurality of second performance modules, the a central performance counter module, the plurality of first performance counter modules, and the plurality of second performance counter modules connected by a daisy chain connection.

  7. ROPEC - ROtary PErcussive Coring Drill for Mars Sample Return

    NASA Technical Reports Server (NTRS)

    Chu, Philip; Spring, Justin; Zacny, Kris

    2014-01-01

    The ROtary Percussive Coring Drill is a light weight, flight-like, five-actuator drilling system prototype designed to acquire core material from rock targets for the purposes of Mars Sample Return. In addition to producing rock cores for sample caching, the ROPEC drill can be integrated with a number of end effectors to perform functions such as rock surface abrasion, dust and debris removal, powder and regolith acquisition, and viewing of potential cores prior to caching. The ROPEC drill and its suite of end effectors have been demonstrated with a five degree of freedom Robotic Arm mounted to a mobility system with a prototype sample cache and bit storage station.

  8. Effects of experience and social context on prospective caching strategies by scrub jays.

    PubMed

    Emery, N J; Clayton, N S

    2001-11-22

    Social life has costs associated with competition for resources such as food. Food storing may reduce this competition as the food can be collected quickly and hidden elsewhere; however, it is a risky strategy because caches can be pilfered by others. Scrub jays (Aphelocoma coerulescens) remember 'what', 'where' and 'when' they cached. Like other corvids, they remember where conspecifics have cached, pilfering them when given the opportunity, but may also adjust their own caching strategies to minimize potential pilfering. To test this, jays were allowed to cache either in private (when the other bird's view was obscured) or while a conspecific was watching, and then recover their caches in private. Here we show that jays with prior experience of pilfering another bird's caches subsequently re-cached food in new cache sites during recovery trials, but only when they had been observed caching. Jays without pilfering experience did not, even though they had observed other jays caching. Our results suggest that jays relate information about their previous experience as a pilferer to the possibility of future stealing by another bird, and modify their caching strategy accordingly.

  9. Cache-Cache Comparison for Supporting Meaningful Learning

    ERIC Educational Resources Information Center

    Wang, Jingyun; Fujino, Seiji

    2015-01-01

    The paper presents a meaningful discovery learning environment called "cache-cache comparison" for a personalized learning support system. The processing of seeking hidden relations or concepts in "cache-cache comparison" is intended to encourage learners to actively locate new knowledge in their knowledge framework and check…

  10. a Cache Design Method for Spatial Information Visualization in 3d Real-Time Rendering Engine

    NASA Astrophysics Data System (ADS)

    Dai, X.; Xiong, H.; Zheng, X.

    2012-07-01

    A well-designed cache system has positive impacts on the 3D real-time rendering engine. As the amount of visualization data getting larger, the effects become more obvious. They are the base of the 3D real-time rendering engine to smoothly browsing through the data, which is out of the core memory, or from the internet. In this article, a new kind of caches which are based on multi threads and large file are introduced. The memory cache consists of three parts, the rendering cache, the pre-rendering cache and the elimination cache. The rendering cache stores the data that is rendering in the engine; the data that is dispatched according to the position of the view point in the horizontal and vertical directions is stored in the pre-rendering cache; the data that is eliminated from the previous cache is stored in the eliminate cache and is going to write to the disk cache. Multi large files are used in the disk cache. When a disk cache file size reaches the limit length(128M is the top in the experiment), no item will be eliminated from the file, but a new large cache file will be created. If the large file number is greater than the maximum number that is pre-set, the earliest file will be deleted from the disk. In this way, only one file is opened for writing and reading, and the rest are read-only so the disk cache can be used in a high asynchronous way. The size of the large file is limited in order to map to the core memory to save loading time. Multi-thread is used to update the cache data. The threads are used to load data to the rendering cache as soon as possible for rendering, to load data to the pre-rendering cache for rendering next few frames, and to load data to the elimination cache which is not necessary for the moment. In our experiment, two threads are designed. The first thread is to organize the memory cache according to the view point, and created two threads: the adding list and the deleting list, the adding list index the data that should be loaded to the pre-rendering cache immediately, the deleting list index the data that is no longer visible in the rendering scene and should be moved to the eliminate cache; the other thread is to move the data in the memory and disk cache according to the adding and the deleting list, and create the download requests when the data is indexed in the adding but cannot be found either in memory cache or disk cache, eliminate cache data is moved to the disk cache when the adding list and deleting are empty. The cache designed as described above in our experiment shows reliable and efficient, and the data loading time and files I/O time decreased sharply, especially when the rendering data getting larger.

  11. Optimizing Maintenance of Constraint-Based Database Caches

    NASA Astrophysics Data System (ADS)

    Klein, Joachim; Braun, Susanne

    Caching data reduces user-perceived latency and often enhances availability in case of server crashes or network failures. DB caching aims at local processing of declarative queries in a DBMS-managed cache close to the application. Query evaluation must produce the same results as if done at the remote database backend, which implies that all data records needed to process such a query must be present and controlled by the cache, i. e., to achieve “predicate-specific” loading and unloading of such record sets. Hence, cache maintenance must be based on cache constraints such that “predicate completeness” of the caching units currently present can be guaranteed at any point in time. We explore how cache groups can be maintained to provide the data currently needed. Moreover, we design and optimize loading and unloading algorithms for sets of records keeping the caching units complete, before we empirically identify the costs involved in cache maintenance.

  12. Current desires of conspecific observers affect cache-protection strategies in California scrub-jays and Eurasian jays.

    PubMed

    Ostojić, Ljerka; Legg, Edward W; Brecht, Katharina F; Lange, Florian; Deininger, Chantal; Mendl, Michael; Clayton, Nicola S

    2017-01-23

    Many corvid species accurately remember the locations where they have seen others cache food, allowing them to pilfer these caches efficiently once the cachers have left the scene [1]. To protect their caches, corvids employ a suite of different cache-protection strategies that limit the observers' visual or acoustic access to the cache site [2,3]. In cases where an observer's sensory access cannot be reduced it has been suggested that cachers might be able to minimise the risk of pilfering if they avoid caching food the observer is most motivated to pilfer [4]. In the wild, corvids have been reported to pilfer others' caches as soon as possible after the caching event [5], such that the cacher might benefit from adjusting its caching behaviour according to the observer's current desire. In the current study, observers pilfered according to their current desire: they preferentially pilfered food that they were not sated on. Cachers adjusted their caching behaviour accordingly: they protected their caches by selectively caching food that observers were not motivated to pilfer. The same cache-protection behaviour was found when cachers could not see on which food the observers were sated. Thus, the cachers' ability to respond to the observer's desire might have been driven by the observer's behaviour at the time of caching. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  13. Pilfering Eurasian jays use visual and acoustic information to locate caches.

    PubMed

    Shaw, Rachael C; Clayton, Nicola S

    2014-11-01

    Pilfering corvids use observational spatial memory to accurately locate caches that they have seen another individual make. Accordingly, many corvid cache-protection strategies limit the transfer of visual information to potential thieves. Eurasian jays (Garrulus glandarius) employ strategies that reduce the amount of visual and auditory information that is available to competitors. Here, we test whether or not the jays recall and use both visual and auditory information when pilfering other birds' caches. When jays had no visual or acoustic information about cache locations, the proportion of available caches that they found did not differ from the proportion expected if jays were searching at random. By contrast, after observing and listening to a conspecific caching in gravel or sand, jays located a greater proportion of caches, searched more frequently in the correct substrate type and searched in fewer empty locations to find the first cache than expected. After only listening to caching in gravel and sand, jays also found a larger proportion of caches and searched in the substrate type where they had heard caching take place more frequently than expected. These experiments demonstrate that Eurasian jays possess observational spatial memory and indicate that pilfering jays may gain information about cache location merely by listening to caching. This is the first evidence that a corvid may use recalled acoustic information to locate and pilfer caches.

  14. Sparse Partial Equilibrium Tables in Chemically Resolved Reactive Flow

    NASA Astrophysics Data System (ADS)

    Vitello, Peter; Fried, Laurence E.; Pudliner, Brian; McAbee, Tom

    2004-07-01

    The detonation of an energetic material is the result of a complex interaction between kinetic chemical reactions and hydrodynamics. Unfortunately, little is known concerning the detailed chemical kinetics of detonations in energetic materials. CHEETAH uses rate laws to treat species with the slowest chemical reactions, while assuming other chemical species are in equilibrium. CHEETAH supports a wide range of elements and condensed detonation products and can also be applied to gas detonations. A sparse hash table of equation of state values is used in CHEETAH to enhance the efficiency of kinetic reaction calculations. For large-scale parallel hydrodynamic calculations, CHEETAH uses parallel communication to updates to the cache. We present here details of the sparse caching model used in the CHEETAH coupled to an ALE hydrocode. To demonstrate the efficiency of modeling using a sparse cache model we consider detonations in energetic materials.

  15. Checkpointing in speculative versioning caches

    DOEpatents

    Eichenberger, Alexandre E; Gara, Alan; Gschwind, Michael K; Ohmacht, Martin

    2013-08-27

    Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.

  16. Rapid effects of corticosterone on cache recovery in mountain chickadees (Parus gambeli).

    PubMed

    Saldanha, C J; Schlinger, B A; Clayton, N S

    2000-03-01

    Environmental perturbations increase adrenal activity in several vertebrates. Increases in corticosterone may serve as a proximate trigger whereby organisms can rapidly adapt their behavior to survive environmental fluctuations. In food-caching songbirds, inclement weather may present the need to alter caching and/or retrieval behaviors to ensure food supplies. We hypothesized that corticosterone may increase the rate of caching and/or retrieval behaviors in the mountain chickadee, a food-storing songbird, and tested if these potential effects were mediated by alterations in appetite, activity, or memory for cache sites. Corticosterone or vehicle was administered to subjects 5 min prior to either caching or recovery in a naturalistic laboratory paradigm during which we recorded the number of caching events, sites visited, and seeds eaten (caching) or caches recovered, total sites visited, cache-related visits, and non-cache-related visits (recovery). Data were analyzed using nested ANOVA for treatment within sequential trial. There was no effect on any caching behaviors following treatment. However, birds treated with corticosterone during retrieval recovered more seeds and tended to visit more cache-related sites than did controls. Since groups did not differ in the number of seeds eaten or the total number of sites visited, it seems unlikely that corticosterone affected appetite or activity. Rapid surges in corticosterone may increase the efficacy of an underlying memory process for cache sites which is reflected in higher cache recovery in corticosterone-treated birds than in controls. Thus, rapid alterations in plasma corticosterone following environmental change may alter memory-reliant behaviors which promote survival in the food-caching mountain chickadee. Copyright 2000 Academic Press.

  17. Design and Analysis of Optimization Algorithms to Minimize Cryptographic Processing in BGP Security Protocols.

    PubMed

    Sriram, Vinay K; Montgomery, Doug

    2017-07-01

    The Internet is subject to attacks due to vulnerabilities in its routing protocols. One proposed approach to attain greater security is to cryptographically protect network reachability announcements exchanged between Border Gateway Protocol (BGP) routers. This study proposes and evaluates the performance and efficiency of various optimization algorithms for validation of digitally signed BGP updates. In particular, this investigation focuses on the BGPSEC (BGP with SECurity extensions) protocol, currently under consideration for standardization in the Internet Engineering Task Force. We analyze three basic BGPSEC update processing algorithms: Unoptimized, Cache Common Segments (CCS) optimization, and Best Path Only (BPO) optimization. We further propose and study cache management schemes to be used in conjunction with the CCS and BPO algorithms. The performance metrics used in the analyses are: (1) routing table convergence time after BGPSEC peering reset or router reboot events and (2) peak-second signature verification workload. Both analytical modeling and detailed trace-driven simulation were performed. Results show that the BPO algorithm is 330% to 628% faster than the unoptimized algorithm for routing table convergence in a typical Internet core-facing provider edge router.

  18. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  19. Cache write generate for parallel image processing on shared memory architectures.

    PubMed

    Wittenbrink, C M; Somani, A K; Chen, C H

    1996-01-01

    We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.

  20. Efficient implementation of parallel three-dimensional FFT on clusters of PCs

    NASA Astrophysics Data System (ADS)

    Takahashi, Daisuke

    2003-05-01

    In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.

  1. Analysis of cache for streaming tape drive

    NASA Technical Reports Server (NTRS)

    Chinnaswamy, V.

    1993-01-01

    A tape subsystem consists of a controller and a tape drive. Tapes are used for backup, data interchange, and software distribution. The backup operation is addressed. During a backup operation, data is read from disk, processed in CPU, and then sent to tape. The processing speeds of a disk subsystem, CPU, and a tape subsystem are likely to be different. A powerful CPU can read data from a fast disk, process it, and supply the data to the tape subsystem at a faster rate than the tape subsystem can handle. On the other hand, a slow disk drive and a slow CPU may not be able to supply data fast enough to keep a tape drive busy all the time. The backup process may supply data to tape drive in bursts. Each burst may be followed by an idle period. Depending on the nature of the file distribution in the disk, the input stream to the tape subsystem may vary significantly during backup. To compensate for these differences and optimize the utilization of a tape subsystem, a cache or buffer is introduced in the tape controller. Most of the tape drives today are streaming tape drives. A streaming tape drive goes into reposition when there is no data from the controller. Once the drive goes into reposition, the controller can receive data, but it cannot supply data to the tape drive until the drive completes its reposition. A controller can also receive data from the host and send data to the tape drive at the same time. The relationship of cache size, host transfer rate, drive transfer rate, reposition, and ramp up times for optimal performance of the tape subsystem are investigated. Formulas developed will also show the advantages of cache watermarks to increase the streaming time of the tape drive, maximum loss due to insufficient cache, tradeoffs between cache and reposition times and the effectiveness of cache on a streaming tape drive due to idle times or interruptions due in host transfers. Several mathematical formulas are developed to predict the performance of the tape drive. Some examples are given illustrating the usefulness of these formulas. Finally, a summary and some conclusions are provided.

  2. Way-Scaling to Reduce Power of Cache with Delay Variation

    NASA Astrophysics Data System (ADS)

    Goudarzi, Maziar; Matsumura, Tadayuki; Ishihara, Tohru

    The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all) of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Yao; Balaprakash, Prasanna; Meng, Jiayuan

    We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list ofmore » architectural scaling options for future processor designs.« less

  4. Predictive Cache Modeling and Analysis

    DTIC Science & Technology

    2011-11-01

    metaheuristic /bin-packing algorithm to optimize task placement based on task communication characterization. Our previous work on task allocation showed...Cache Miss Minimization Technology To efficiently explore combinations and discover nearly-optimal task-assignment algorithms , we extended to our...it was possible to use our algorithmic techniques to decrease network bandwidth consumption by ~25%. In this effort, we adapted these existing

  5. Domain Wall Fermion Inverter on Pentium 4

    NASA Astrophysics Data System (ADS)

    Pochinsky, Andrew

    2005-03-01

    A highly optimized domain wall fermion inverter has been developed as part of the SciDAC lattice initiative. By designing the code to minimize memory bus traffic, it achieves high cache reuse and performance in excess of 2 GFlops for out of L2 cache problem sizes on a GigE cluster with 2.66 GHz Xeon processors. The code uses the SciDAC QMP communication library.

  6. Security in the CernVM File System and the Frontier Distributed Database Caching System

    NASA Astrophysics Data System (ADS)

    Dykstra, D.; Blomer, J.

    2014-06-01

    Both the CernVM File System (CVMFS) and the Frontier Distributed Database Caching System (Frontier) distribute centrally updated data worldwide for LHC experiments using http proxy caches. Neither system provides privacy or access control on reading the data, but both control access to updates of the data and can guarantee the authenticity and integrity of the data transferred to clients over the internet. CVMFS has since its early days required digital signatures and secure hashes on all distributed data, and recently Frontier has added X.509-based authenticity and integrity checking. In this paper we detail and compare the security models of CVMFS and Frontier.

  7. Seedling Establishment of Coast Live Oak in Relation to Seed Caching by Jays

    Treesearch

    Joe R. McBride; Ed Norberg; Sheauchi Cheng; Ahmad Mossadegh

    1991-01-01

    The purpose of this study was to simulate the caching of acorns by jays and rodents to see if less costly procedures could be developed for the establishment of coast live oak (Quercus agrifolia). Four treatments [(1) random - single acorn cache, (2) regular - single acorn cache, (3) regular - 5 acorn cache, (4) regular - 10 acorn cache] were planted...

  8. Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

    NASA Technical Reports Server (NTRS)

    Probst, David K.

    1993-01-01

    A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.

  9. A random rule model of surface growth

    NASA Astrophysics Data System (ADS)

    Mello, Bernardo A.

    2015-02-01

    Stochastic models of surface growth are usually based on randomly choosing a substrate site to perform iterative steps, as in the etching model, Mello et al. (2001) [5]. In this paper I modify the etching model to perform sequential, instead of random, substrate scan. The randomicity is introduced not in the site selection but in the choice of the rule to be followed in each site. The change positively affects the study of dynamic and asymptotic properties, by reducing the finite size effect and the short-time anomaly and by increasing the saturation time. It also has computational benefits: better use of the cache memory and the possibility of parallel implementation.

  10. Polymorphous Computing Architecture (PCA) Kernel Benchmark Measurements on the MIT Raw Microprocessor

    DTIC Science & Technology

    2006-06-14

    Robert Graybill . A Raw hoard for the use of this project was provided by the Computer Architecture Croup at the Massachusetts Institute of Technology...simulator is presented by MIT as being an accurate model of the Raw chip, we have found that it does not accurately model the board. Our comparison...G4 processor, model 7410. with a 32 kbyte level-1 cache on-chip and a 2 Mbyte L2 cache connected through a 250 MH/ bus [12]. Each node has 256 Mbyte

  11. Caching strategies for improving performance of web-based Geographic applications

    NASA Astrophysics Data System (ADS)

    Liu, M.; Brodzik, M.; Collins, J. A.; Lewis, S.; Oldenburg, J.

    2012-12-01

    The NASA Operation IceBridge mission collects airborne remote sensing measurements to bridge the gap between NASA's Ice, Cloud and Land Elevation Satellite (ICESat) mission and the upcoming ICESat-2 mission. The IceBridge Data Portal from the National Snow and Ice Data Center provides an intuitive web interface for accessing IceBridge mission observations and measurements. Scientists and users usually do not have knowledge about the individual campaigns but are interested in data collected in a specific place. We have developed a high-performance map interface to allow users to quickly zoom to an area of interest and see any Operation IceBridge overflights. The map interface consists of two layers: the user can pan and zoom on the base map layer; the flight line layer that overlays the base layer provides all the campaign missions that intersect with the current map view. The user can click on the flight campaigns and download the data as needed. The OpenGIS® Web Map Service Interface Standard (WMS) provides a simple HTTP interface for requesting geo-registered map images from one or more distributed geospatial databases. Web Feature Service (WFS) provides an interface allowing requests for geographical features across the web using platform-independent calls. OpenLayers provides vector support (points, polylines and polygons) to build a WMS/WFS client for displaying both layers on the screen. Map Server, an open source development environment for building spatially enabled internet applications, is serving the WMS and WFS spatial data to OpenLayers. Early releases of the portal displayed unacceptably poor load time performance for flight lines and the base map tiles. This issue was caused by long response times from the map server in generating all map tiles and flight line vectors. We resolved the issue by implementing various caching strategies on top of the WMS and WFS services, including the use of Squid (www.squid-cache.org) to cache frequently-used content. Our presentation includes the architectural design of the application, and how we use OpenLayers, WMS and WFS with Squid to build a responsive web application capable of efficiently displaying geospatial data to allow the user to quickly interact with the displayed information. We describe the design, implementation and performance improvement of our caching strategies, and the tools and techniques developed to assist our data caching strategies.

  12. California scrub-jays reduce visual cues available to potential pilferers by matching food colour to caching substrate.

    PubMed

    Kelley, Laura A; Clayton, Nicola S

    2017-07-01

    Some animals hide food to consume later; however, these caches are susceptible to theft by conspecifics and heterospecifics. Caching animals can use protective strategies to minimize sensory cues available to potential pilferers, such as caching in shaded areas and in quiet substrate. Background matching (where object patterning matches the visual background) is commonly seen in prey animals to reduce conspicuousness, and caching animals may also use this tactic to hide caches, for example, by hiding coloured food in a similar coloured substrate. We tested whether California scrub-jays ( Aphelocoma californica ) camouflage their food in this way by offering them caching substrates that either matched or did not match the colour of food available for caching. We also determined whether this caching behaviour was sensitive to social context by allowing the birds to cache when a conspecific potential pilferer could be both heard and seen (acoustic and visual cues present), or unseen (acoustic cues only). When caching events could be both heard and seen by a potential pilferer, birds cached randomly in matching and non-matching substrates. However, they preferentially hid food in the substrate that matched the food colour when only acoustic cues were present. This is a novel cache protection strategy that also appears to be sensitive to social context. We conclude that studies of cache protection strategies should consider the perceptual capabilities of the cacher and potential pilferers. © 2017 The Author(s).

  13. Consolidation and reconsolidation of memory in black-capped chickadees (Poecile atricapillus).

    PubMed

    Barrett, Matthew C; Sherry, David F

    2012-12-01

    Multiple phases of protein synthesis are necessary for the synaptic modifications that consolidate long-term memory. The reconsolidation hypothesis supposes that information in long-term memory becomes labile and subject to change when retrieved and must be reconsolidated into long-term memory. The current study used the protein synthesis inhibitor anisomycin to examine memory consolidation in birds and to test the reconsolidation hypothesis. Black-capped chickadees store food and usually remember which of their caches they have emptied and which they have left full. In Experiment 1, anisomycin was injected either immediately and 2 hr after food caching, or 4 and 6 hr after food caching. Inhibition of protein synthesis impaired memory for cache sites 24 and 48 hr later. In Experiment 2, it was hypothesized that long-term memory for food caches becomes labile as predicted by the reconsolidation hypothesis when birds search for caches. Anisomycin was administered immediately after chickadees had searched for their caches. Inhibition of protein synthesis should disrupt memory for caches left full if these sites are retrieved from long-term memory and require reconsolidation. Control birds were later more likely to revisit full caches than caches they had emptied. Birds given anisomycin revisited both kinds of caches and did not distinguish between them. This result shows that reconsolidation of full caches into long-term memory is not necessary following search for cache sites, but also shows that protein synthesis-dependent consolidation is required for updating the status of emptied caches.

  14. Proceedings: Sisal `93

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J.T.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less

  15. Memory for Multiple Cache Locations and Prey Quantities in a Food-Hoarding Songbird

    PubMed Central

    Armstrong, Nicola; Garland, Alexis; Burns, K. C.

    2012-01-01

    Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training. PMID:23293622

  16. Memory for multiple cache locations and prey quantities in a food-hoarding songbird.

    PubMed

    Armstrong, Nicola; Garland, Alexis; Burns, K C

    2012-01-01

    Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training.

  17. SIDECACHE: Information access, management and dissemination framework for web services.

    PubMed

    Doderer, Mark S; Burkhardt, Cory; Robbins, Kay A

    2011-06-14

    Many bioinformatics algorithms and data sets are deployed using web services so that the results can be explored via the Internet and easily integrated into other tools and services. These services often include data from other sites that is accessed either dynamically or through file downloads. Developers of these services face several problems because of the dynamic nature of the information from the upstream services. Many publicly available repositories of bioinformatics data frequently update their information. When such an update occurs, the developers of the downstream service may also need to update. For file downloads, this process is typically performed manually followed by web service restart. Requests for information obtained by dynamic access of upstream sources is sometimes subject to rate restrictions. SideCache provides a framework for deploying web services that integrate information extracted from other databases and from web sources that are periodically updated. This situation occurs frequently in biotechnology where new information is being continuously generated and the latest information is important. SideCache provides several types of services including proxy access and rate control, local caching, and automatic web service updating. We have used the SideCache framework to automate the deployment and updating of a number of bioinformatics web services and tools that extract information from remote primary sources such as NCBI, NCIBI, and Ensembl. The SideCache framework also has been used to share research results through the use of a SideCache derived web service.

  18. A Scalable proxy cache for Grid Data Access

    NASA Astrophysics Data System (ADS)

    Cristian Cirstea, Traian; Just Keijser, Jan; Koeroo, Oscar Arthur; Starink, Ronald; Templon, Jeffrey Alan

    2012-12-01

    We describe a prototype grid proxy cache system developed at Nikhef, motivated by a desire to construct the first building block of a future https-based Content Delivery Network for grid infrastructures. Two goals drove the project: firstly to provide a “native view” of the grid for desktop-type users, and secondly to improve performance for physics-analysis type use cases, where multiple passes are made over the same set of data (residing on the grid). We further constrained the design by requiring that the system should be made of standard components wherever possible. The prototype that emerged from this exercise is a horizontally-scalable, cooperating system of web server / cache nodes, fronted by a customized webDAV server. The webDAV server is custom only in the sense that it supports http redirects (providing horizontal scaling) and that the authentication module has, as back end, a proxy delegation chain that can be used by the cache nodes to retrieve files from the grid. The prototype was deployed at Nikhef and tested at a scale of several terabytes of data and approximately one hundred fast cores of computing. Both small and large files were tested, in a number of scenarios, and with various numbers of cache nodes, in order to understand the scaling properties of the system. For properly-dimensioned cache-node hardware, the system showed speedup of several integer factors for the analysis-type use cases. These results and others are presented and discussed.

  19. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    DOE PAGES

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; ...

    2017-11-23

    Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less

  20. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    NASA Astrophysics Data System (ADS)

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo; Tsaris, Aristeidis; Norman, Andrew; Lyon, Adam; Ross, Robert

    2017-10-01

    The SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.

  1. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mubarak, Misbah; Ding, Pengfei; Aliaga, Leo

    Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulationsmore » are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less

  2. Calculating Reuse Distance from Source Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Narayanan, Sri Hari Krishna; Hovland, Paul

    The efficient use of a system is of paramount importance in high-performance computing. Applications need to be engineered for future systems even before the architecture of such a system is clearly known. Static performance analysis that generates performance bounds is one way to approach the task of understanding application behavior. Performance bounds provide an upper limit on the performance of an application on a given architecture. Predicting cache hierarchy behavior and accesses to main memory is a requirement for accurate performance bounds. This work presents our static reuse distance algorithm to generate reuse distance histograms. We then use these histogramsmore » to predict cache miss rates. Experimental results for kernels studied show that the approach is accurate.« less

  3. A cache-aided multiprocessor rollback recovery scheme

    NASA Technical Reports Server (NTRS)

    Wu, Kun-Lung; Fuchs, W. Kent

    1989-01-01

    This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.

  4. Algorithms for Data Intensive Applications on Intelligent and Smart Memories

    DTIC Science & Technology

    2003-03-01

    editors). Parallel Algorithms and Architectures. North Holland, 1986. [8] P. Diniz . USC ISI, Personal Communication, March, 2001. [9] M. Frigo, C. E ...hierarchy as well as the Translation Lookaside Buer TLB aect the e ectiveness of cache friendly optimizations These penalties vary among...processors and cause large variations in the e ectiveness of cache performance optimizations The area of graph problems is fundamental in a wide variety of

  5. Effect of Spatial Locality Prefetching on Structural Locality

    DTIC Science & Technology

    1991-12-01

    Pollution module calculates the SLC and CAM cache pollution percentages. And finally, the Generate Reference Frequency List module produces the output...3.2.5 Generate Reference Frequency List 3.2.6 Each program module in the structure chart is mapped into an Ada package. By performing this encapsulation...call routine to generate reference -- frequency list -- end if -- end loop -- close input, output, and reference files end Cache Simulator Figure 3.5

  6. Mobile Thread Task Manager

    NASA Technical Reports Server (NTRS)

    Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

    2013-01-01

    The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.

  7. Caching Servers for ATLAS

    NASA Astrophysics Data System (ADS)

    Gardner, R. W.; Hanushevsky, A.; Vukotic, I.; Yang, W.

    2017-10-01

    As many LHC Tier-3 and some Tier-2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either mode. They can also operate autonomously (i.e. demand driven), be centrally managed (i.e. a Rucio managed cache), or operate in both modes. We explore the pros and cons of various configurations as well as practical requirements for caching to be effective. While we focus on XRootD caches, the analysis should apply to other kinds of caches as well.

  8. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  9. An evaluation of memory accuracy in food hoarding marsh tits Poecile palustris--how accurate are they compared to humans?

    PubMed

    Brodin, Anders; Urhan, A Utku

    2013-07-01

    Laboratory studies of scatter hoarding birds have become a model system for spatial memory studies. Considering that such birds are known to have a good spatial memory, recovery success in lab studies seems low. In parids (titmice and chickadees) typically ranging between 25 and 60% if five seeds are cached in 50-128 available caching sites. Since these birds store many thousands of food items in nature in one autumn one might expect that they should easily retrieve five seeds in a laboratory where they know the environment with its caching sites in detail. We designed a laboratory set up to be as similar as possible with previous studies and trained wild caught marsh tits Poecile palustris to store and retrieve in this set up. Our results agree closely with earlier studies, of the first ten looks around 40% were correct when the birds had stored five seeds in 100 available sites both 5 and 24h after storing. The cumulative success curve suggests high success during the first 15 looks where after it declines. Humans performed much better, in the first five looks most subjects were 100% correct. We discuss possible reasons for why the birds were not doing better. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Prefetching in file systems for MIMD multiprocessors

    NASA Technical Reports Server (NTRS)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  11. Experience with HEP analysis on mounted filesystems.

    NASA Astrophysics Data System (ADS)

    Fuhrmann, Patrick; Gasthuber, Martin; Kemp, Yves; Ozerov, Dmitry

    2012-12-01

    We present results on different approaches on mounted filesystems in use or under investigation at DESY. dCache, established since long as a storage system for physics data has implemented the NFS v4.1/pNFS protocol. New performance results will be shown with the most current version of the dCache server. In addition to the native usage of the mounted filesystem in a LAN environment, the results are given for the performance of the dCache NFS v4.1/pNFS in WAN case. Several commercial vendors are currently in alpha or beta phase of adding the NFS v4.1/pNFS protocol to their storage appliances. We will test some of these vendor solutions for their readiness for HEP analysis. DESY has recently purchased an IBM Sonas system. We will present the result of a thorough performance evaluation using the native protocols NFS (v3 or v4) and GPFS. As the emphasis is on the usability for end user analysis, we will use latest ROOT versions and current end user analysis code for benchmark scenarios.

  12. Differentiated strategies for improving streaming service quality

    NASA Astrophysics Data System (ADS)

    An, Hui; Chen, Xin-Meng

    2005-02-01

    With the explosive growth of streaming services, users are becoming more and more sensitive to its quality of service. To handle these problems, the research community focuses of the application of caching and replication techniques. But most approaches try to find specific strategies of caching of replication that suit for streaming service characteristics and to design some kind of universal policy to deal with all streaming objects. This paper explores the combination of caching and replication for improving streaming service quality and demonstrates that it makes sense to incorporate two technologies. It provides a system model and discusses some related issues of how to determining a refreshable streaming object and which refreshment policies a refreshable object should use.

  13. Organizing the pantry: cache management improves quality of overwinter food stores in a montane mammal

    USGS Publications Warehouse

    Jakopak, Rhiannon P.; Hall, L. Embere; Chalfoun, Anna D.

    2017-01-01

    Many mammals create food stores to enhance overwinter survival in seasonal environments. Strategic arrangement of food within caches may facilitate the physical integrity of the cache or improve access to high-quality food to ensure that cached resources meet future nutritional demands. We used the American pika (Ochotona princeps), a food-caching lagomorph, to evaluate variation in haypile (cache) structure (i.e., horizontal layering by plant functional group) in Wyoming, United States. Fifty-five percent of 62 haypiles contained at least 2 discrete layers of vegetation. Adults and juveniles layered haypiles in similar proportions. The probability of layering increased with haypile volume, but not haypile number per individual or nearby forage diversity. Vegetation cached in layered haypiles was also higher in nitrogen compared to vegetation in unlayered piles. We found that American pikas frequently structured their food caches, structured caches were larger, and the cached vegetation in structured piles was of higher nutritional quality. Improving access to stable, high-quality vegetation in haypiles, a critical overwinter food resource, may allow individuals to better persist amidst harsh conditions.

  14. Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER

    PubMed Central

    2014-01-01

    Background HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar’s striped processing pattern with Intel SSE2 instruction set extension. Results A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. Conclusions The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model’s size. PMID:24884826

  15. Center for Technology for Advanced Scientific Componet Software (TASCS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Govindaraju, Madhusudhan

    Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less

  16. Understanding Consistency Maintenance in Service Discovery Architectures during Communication Failure

    DTIC Science & Technology

    2002-07-01

    our general model include: (1) service user (SU), (2) service manager (SM), and (3) service cache manager ( SCM ), where the SCM is an optional...maintained by SMs that satisfy specific requirements. Where employed, the SCM operates as an intermediary, matching advertised SDs of SMs to...Directory Service Agent (optional) not applicableLookup ServiceService Cache Manager ( SCM ) Service URL Service Type Service Attributes Template URL

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Millar, A. P.; Baranova, T.; Behrmann, G.

    For over a decade, dCache has been synonymous with large-capacity, fault-tolerant storage using commodity hardware that supports seamless data migration to and from tape. In this paper we provide some recent news of changes within dCache and the community surrounding it. We describe the flexible nature of dCache that allows both externally developed enhancements to dCache facilities and the adoption of new technologies. Finally, we present information about avenues the dCache team is exploring for possible future improvements in dCache.

  18. DARPA Status Report - November 1988

    DTIC Science & Technology

    1988-11-01

    style used in the applic4#ons reference to that block was by processor j. where j It. We was influenced by it. MACH is a multiprocessor operating S call...it can be order they occurred. However. the exact time at which the treated specially in memory management , and so most of the reference wa, made is...on cache consistency performance, sophisti- peak can be explained as clinging references that occur when cated cache management schemes that take

  19. Caching Joint Shortcut Routing to Improve Quality of Service for Information-Centric Networking.

    PubMed

    Huang, Baixiang; Liu, Anfeng; Zhang, Chengyuan; Xiong, Naixue; Zeng, Zhiwen; Cai, Zhiping

    2018-05-29

    Hundreds of thousands of ubiquitous sensing (US) devices have provided an enormous number of data for Information-Centric Networking (ICN), which is an emerging network architecture that has the potential to solve a great variety of issues faced by the traditional network. A Caching Joint Shortcut Routing (CJSR) scheme is proposed in this paper to improve the Quality of service (QoS) for ICN. The CJSR scheme mainly has two innovations which are different from other in-network caching schemes: (1) Two routing shortcuts are set up to reduce the length of routing paths. Because of some inconvenient transmission processes, the routing paths of previous schemes are prolonged, and users can only request data from Data Centers (DCs) until the data have been uploaded from Data Producers (DPs) to DCs. Hence, the first kind of shortcut is built from DPs to users directly. This shortcut could release the burden of whole network and reduce delay. Moreover, in the second shortcut routing method, a Content Router (CR) which could yield shorter length of uploading routing path from DPs to DCs is chosen, and then data packets are uploaded through this chosen CR. In this method, the uploading path shares some segments with the pre-caching path, thus the overall length of routing paths is reduced. (2) The second innovation of the CJSR scheme is that a cooperative pre-caching mechanism is proposed so that QoS could have a further increase. Besides being used in downloading routing, the pre-caching mechanism can also be used when data packets are uploaded towards DCs. Combining uploading and downloading pre-caching, the cooperative pre-caching mechanism exhibits high performance in different situations. Furthermore, to address the scarcity of storage size, an algorithm that could make use of storage from idle CRs is proposed. After comparing the proposed scheme with five existing schemes via simulations, experiments results reveal that the CJSR scheme could reduce the total number of processed interest packets by 54.8%, enhance the cache hits of each CR and reduce the number of total hop counts by 51.6% and cut down the length of routing path for users to obtain their interested data by 28.6⁻85.7% compared with the traditional NDN scheme. Moreover, the length of uploading routing path could be decreased by 8.3⁻33.3%.

  20. Introduction of Virtualization Technology to Multi-Process Model Checking

    NASA Technical Reports Server (NTRS)

    Leungwattanakit, Watcharin; Artho, Cyrille; Hagiya, Masami; Tanabe, Yoshinori; Yamamoto, Mitsuharu

    2009-01-01

    Model checkers find failures in software by exploring every possible execution schedule. Java PathFinder (JPF), a Java model checker, has been extended recently to cover networked applications by caching data transferred in a communication channel. A target process is executed by JPF, whereas its peer process runs on a regular virtual machine outside. However, non-deterministic target programs may produce different output data in each schedule, causing the cache to restart the peer process to handle the different set of data. Virtualization tools could help us restore previous states of peers, eliminating peer restart. This paper proposes the application of virtualization technology to networked model checking, concentrating on JPF.

  1. A Distributed Cache Update Deployment Strategy in CDN

    NASA Astrophysics Data System (ADS)

    E, Xinhua; Zhu, Binjie

    2018-04-01

    The CDN management system distributes content objects to the edge of the internet to achieve the user's near access. Cache strategy is an important problem in network content distribution. A cache strategy was designed in which the content effective diffusion in the cache group, so more content was storage in the cache, and it improved the group hit rate.

  2. Instruction-Level Characterization of Scientific Computing Applications Using Hardware Performance Counters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Y.; Cameron, K.W.

    1998-11-24

    Workload characterization has been proven an essential tool to architecture design and performance evaluation in both scientific and commercial computing areas. Traditional workload characterization techniques include FLOPS rate, cache miss ratios, CPI (cycles per instruction or IPC, instructions per cycle) etc. With the complexity of sophisticated modern superscalar microprocessors, these traditional characterization techniques are not powerful enough to pinpoint the performance bottleneck of an application on a specific microprocessor. They are also incapable of immediately demonstrating the potential performance benefit of any architectural or functional improvement in a new processor design. To solve these problems, many people rely on simulators,more » which have substantial constraints especially on large-scale scientific computing applications. This paper presents a new technique of characterizing applications at the instruction level using hardware performance counters. It has the advantage of collecting instruction-level characteristics in a few runs virtually without overhead or slowdown. A variety of instruction counts can be utilized to calculate some average abstract workload parameters corresponding to microprocessor pipelines or functional units. Based on the microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. In particular, the analysis results can provide some insight to the problem that only a small percentage of processor peak performance can be achieved even for many very cache-friendly codes. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. Eventually, these abstract parameters can lead to the creation of an analytical microprocessor pipeline model and memory hierarchy model.« less

  3. PROSPECTIVE STUDY OF READY-TO-EAT BREAKFAST CEREAL CONSUMPTION AND COGNITIVE DECLINE AMONG ELDERLY MEN AND WOMEN IN CACHE COUNTY, UTAH, STUDY ON MEMORY, HEALTH, AND AGING

    PubMed Central

    WENGREEN, H.; NELSON, C.; MUNGER, R.G.; CORCORAN, C.

    2013-01-01

    Objective To examine associations between frequency of ready-to-eat-cereal (RTEC) consumption and cognitive function among elderly men and women of the Cache County Study on Memory and Aging in Utah. Design A population-based prospective cohort study established in Cache County, Utah in 1995. Setting and Participants 3831 men and women > 65 years of age who were living in Cache County, Utah in 1995. Measurement Diet was assessed using a 142-item food frequency questionnaire at baseline. Cognitive function was assessed using an adapted version of the Modified Mini-Mental State examination (3MS) at baseline and three subsequent interviews over 11 years. RTEC consumption was defined as daily, weekly, or infrequent use. Results In multivariable models, more frequent RTEC consumption was not associated with a cognitive benefit. Those consuming RTEC weekly but less than daily scored higher on their baseline 3MS than did those consuming RTEC more or less frequently (91.7, 90.6, 90.6, respectively; p-value <0.001). This association was maintained across 11 years of observation such that those consuming RTEC weekly but less than daily declined on average 3.96 points compared to an average 5.13 and 4.57 point decline for those consuming cereal more or less frequently (p-value = 0.0009). Conclusion Those consuming RTEC at least daily had poorer cognitive performance at baseline and over 11 years of follow-up compared to those who consumed cereal more or less frequently. RTEC is a nutrient dense food, but should not replace the consumption of other healthy foods in the diets’ of elderly people. Associations between RTEC consumption, dietary patterns, and cognitive function deserve further study. PMID:21369668

  4. Forest rodents provide directed dispersal of Jeffrey pine seeds

    USGS Publications Warehouse

    Briggs, J.S.; Wall, S.B.V.; Jenkins, S.H.

    2009-01-01

    Some species of animals provide directed dispersal of plant seeds by transporting them nonrandomly to microsites where their chances of producing healthy seedlings are enhanced. We investigated whether this mutualistic interaction occurs between granivorous rodents and Jeffrey pine (Pinus jeffreyi) in the eastern Sierra Nevada by comparing the effectiveness of random abiotic seed dispersal with the dispersal performed by four species of rodents: deer mice (Peromyscus maniculatus), yellow-pine and long-eared chipmunks (Tamias amoenus and T. quadrimaculatus), and golden-mantled ground squirrels (Spermophilus lateralis). We conducted two caching studies using radio-labeled seeds, the first with individual animals in field enclosures and the second with a community of rodents in open forest. We used artificial caches to compare the fates of seeds placed at the range of microsites and depths used by animals with the fates of seeds dispersed abiotically. Finally, we examined the distribution and survival of naturally establishing seedlings over an eight-year period.Several lines of evidence suggested that this community of rodents provided directed dispersal. Animals preferred to cache seeds in microsites that were favorable for emergence or survival of seedlings and avoided caching in microsites in which seedlings fared worst. Seeds buried at depths typical of animal caches (5–25 mm) produced at least five times more seedlings than did seeds on the forest floor. The four species of rodents differed in the quality of dispersal they provided. Small, shallow caches made by deer mice most resembled seeds dispersed by abiotic processes, whereas many of the large caches made by ground squirrels were buried too deeply for successful emergence of seedlings. Chipmunks made the greatest number of caches within the range of depths and microsites favorable for establishment of pine seedlings. Directed dispersal is an important element of the population dynamics of Jeffrey pine, a dominant tree species in the eastern Sierra Nevada. Quantifying the occurrence and dynamics of directed dispersal in this and other cases will contribute to better understanding of mutualistic coevolution of plants and animals and to more effective management of ecosystems in which directed dispersal is a keystone process.

  5. Elements of episodic-like memory in animals.

    PubMed

    Clayton, N S; Griffiths, D P; Emery, N J; Dickinson, A

    2001-09-29

    A number of psychologists have suggested that episodic memory is a uniquely human phenomenon and, until recently, there was little evidence that animals could recall a unique past experience and respond appropriately. Experiments on food-caching memory in scrub jays question this assumption. On the basis of a single caching episode, scrub jays can remember when and where they cached a variety of foods that differ in the rate at which they degrade, in a way that is inexplicable by relative familiarity. They can update their memory of the contents of a cache depending on whether or not they have emptied the cache site, and can also remember where another bird has hidden caches, suggesting that they encode rich representations of the caching event. They make temporal generalizations about when perishable items should degrade and also remember the relative time since caching when the same food is cached in distinct sites at different times. These results show that jays form integrated memories for the location, content and time of caching. This memory capability fulfils Tulving's behavioural criteria for episodic memory and is thus termed 'episodic-like'. We suggest that several features of episodic memory may not be unique to humans.

  6. Addressing Inter-set Write-Variation for Improving Lifetime of Non-Volatile Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S

    We propose a technique which minimizes inter-set write variation in NVM caches for improving its lifetime. Our technique uses cache coloring scheme to add a software-controlled mapping layer between groups of physical pages (called memory regions) and cache sets. Periodically, the number of writes to different colors of the cache is computed and based on this result, the mapping of a few colors is changed to channel the write traffic to least utilized cache colors. This change helps to achieve wear-leveling.

  7. Cache placement, pilfering, and a recovery advantage in a seed-dispersing rodent: Could predation of scatter hoarders contribute to seedling establishment?

    NASA Astrophysics Data System (ADS)

    Steele, Michael A.; Bugdal, Melissa; Yuan, Amy; Bartlow, Andrew; Buzalewski, Jarrod; Lichti, Nathan; Swihart, Robert

    2011-11-01

    Scatter-hoarding mammals are thought to rely on spatial memory to relocate food caches. Yet, we know little about how long these granivores (primarily rodents) recall specific cache locations or whether individual hoarders have an advantage when recovering their own caches. Indeed, a few recent studies suggest that high rates of pilferage are common and that individual hoarders may not have a retriever's advantage. We tested this hypothesis in a high-density (>7 animals/ha) population of eastern gray squirrels ( Sciurus carolinensis) by presenting individually marked animals (>20) with tagged acorns, mapping cache sites, and following the fate of seed caches. PIT tags allowed us to monitor individual seeds without disturbing cache sites. Acorns only remained in the caches for 12-119 h (0.5-5 d). However, when we live-trapped and removed some animals from the site immediately after they stored seeds (thus simulating predation), their seed caches remained intact for significantly longer periods (16-27 d). Cache duration corresponded roughly to the time at which squirrels were returned to the study area. These results suggest that squirrels have a retriever's advantage and may remember specific cache sites longer than previously thought. We further suggest that predation of scatter hoarders who store seeds for long periods and also possess a recovery advantage may be one important mechanism by which seed establishment is achieved.

  8. A Survey of Architectural Techniques For Improving Cache Power Efficiency

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-worldmore » processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.« less

  9. Automated Cache Performance Analysis And Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mohror, Kathryn

    While there is no lack of performance counter tools for coarse-grained measurement of cache activity, there is a critical lack of tools for relating data layout to cache behavior to application performance. Generally, any nontrivial optimizations are either not done at all, or are done ”by hand” requiring significant time and expertise. To the best of our knowledge no tool available to users measures the latency of memory reference instructions for partic- ular addresses and makes this information available to users in an easy-to-use and intuitive way. In this project, we worked to enable the Open|SpeedShop performance analysis tool tomore » gather memory reference latency information for specific instructions and memory ad- dresses, and to gather and display this information in an easy-to-use and intuitive way to aid performance analysts in identifying problematic data structures in their codes. This tool was primarily designed for use in the supercomputer domain as well as grid, cluster, cloud-based parallel e-commerce, and engineering systems and middleware. Ultimately, we envision a tool to automate optimization of application cache layout and utilization in the Open|SpeedShop performance analysis tool. To commercialize this soft- ware, we worked to develop core capabilities for gathering enhanced memory usage per- formance data from applications and create and apply novel methods for automatic data structure layout optimizations, tailoring the overall approach to support existing supercom- puter and cluster programming models and constraints. In this Phase I project, we focused on infrastructure necessary to gather performance data and present it in an intuitive way to users. With the advent of enhanced Precise Event-Based Sampling (PEBS) counters on recent Intel processor architectures and equivalent technology on AMD processors, we are now in a position to access memory reference information for particular addresses. Prior to the introduction of PEBS counters, cache behavior could only be measured reliably in the ag- gregate across tens or hundreds of thousands of instructions. With the newest iteration of PEBS technology, cache events can be tied to a tuple of instruction pointer, target address (for both loads and stores), memory hierarchy, and observed latency. With this information we can now begin asking questions regarding the efficiency of not only regions of code, but how these regions interact with particular data structures and how these interactions evolve over time. In the short term, this information will be vital for performance analysts understanding and optimizing the behavior of their codes for the memory hierarchy. In the future, we can begin to ask how data layouts might be changed to improve performance and, for a particular application, what the theoretical optimal performance might be. The overall benefit to be produced by this effort was a commercial quality easy-to- use and scalable performance tool that will allow both beginner and experienced parallel programmers to automatically tune their applications for optimal cache usage. Effective use of such a tool can literally save weeks of performance tuning effort. Easy to use. With the proposed innovations, finding and fixing memory performance issues would be more automated and hide most to all of the performance engineer exper- tise ”under the hood” of the Open|SpeedShop performance tool. One of the biggest public benefits from the proposed innovations is that it makes performance analysis more usable to a larger group of application developers. Intuitive reporting of results. The Open|SpeedShop performance analysis tool has a rich set of intuitive, yet detailed reports for presenting performance results to application developers. Our goal was to leverage this existing technology to present the results from our memory performance addition to Open|SpeedShop. Suitable for experts as well as novices. Application performance is getting more difficult to measure as the hardware platforms they run on become more complicated. This makes life difficult for the application developer, in that they need to know more about the hardware platform, including the memory system hierarchy, in order to understand the performance of their application. Some application developers are comfortable in that sce- nario, while others want to do their scientific research and not have to understand all the nuances in the hardware platform they are running their application on. Our proposed innovations were aimed to support both experts and novice performance analysts. Useful in many markets. The enhancement to Open|SpeedShop would appeal to a broader market space, as it will be useful in scientific, commercial, and cloud computing environments. Our goal was to use technology developed initially at the and Lawrence Livermore Na- tional Laboratory combined with the development and commercial software experience of the Argo Navis Technologies, LLC (ANT) to form a powerful combination to delivery these objectives.« less

  10. Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent

    PubMed Central

    De Sa, Christopher; Feldman, Matthew; Ré, Christopher; Olukotun, Kunle

    2018-01-01

    Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called Buckwild! that uses both asynchronous execution and low-precision computation. We introduce the DMGC model, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and show that it provides a way to both classify these algorithms and model their performance. We leverage this insight to propose and analyze techniques to improve the speed of low-precision SGD. First, we propose software optimizations that can increase throughput on existing CPUs by up to 11×. Second, we propose architectural changes, including a new cache technique we call an obstinate cache, that increase throughput beyond the limits of current-generation hardware. We also implement and analyze low-precision SGD on the FPGA, which is a promising alternative to the CPU for future SGD systems. PMID:29391770

  11. Cache Scheme Based on Pre-Fetch Operation in ICN

    PubMed Central

    Duan, Jie; Wang, Xiong; Xu, Shizhong; Liu, Yuanni; Xu, Chuan; Zhao, Guofeng

    2016-01-01

    Many recent researches focus on ICN (Information-Centric Network), in which named content becomes the first citizen instead of end-host. In ICN, Named content can be further divided into many small sized chunks, and chunk-based communication has merits over content-based communication. The universal in-network cache is one of the fundamental infrastructures for ICN. In this work, a chunk-level cache mechanism based on pre-fetch operation is proposed. The main idea is that, routers with cache store should pre-fetch and cache the next chunks which may be accessed in the near future according to received requests and cache policy for reducing the users’ perceived latency. Two pre-fetch driven modes are present to answer when and how to pre-fetch. The LRU (Least Recently Used) is employed for the cache replacement. Simulation results show that the average user perceived latency and hops can be decreased by employed this cache mechanism based on pre-fetch operation. Furthermore, we also demonstrate that the results are influenced by many factors, such as the cache capacity, Zipf parameters and pre-fetch window size. PMID:27362478

  12. Accelerating a Particle-in-Cell Simulation Using a Hybrid Counting Sort

    NASA Astrophysics Data System (ADS)

    Bowers, K. J.

    2001-11-01

    In this article, performance limitations of the particle advance in a particle-in-cell (PIC) simulation are discussed. It is shown that the memory subsystem and cache-thrashing severely limit the speed of such simulations. Methods to implement a PIC simulation under such conditions are explored. An algorithm based on a counting sort is developed which effectively eliminates PIC simulation cache thrashing. Sustained performance gains of 40 to 70 percent are measured on commodity workstations for a minimal 2d2v electrostatic PIC simulation. More complete simulations are expected to have even better results as larger simulations are usually even more memory subsystem limited.

  13. Side Channel Attacks on STTRAM and Low Overhead Countermeasures

    DTIC Science & Technology

    2017-03-20

    introduce security vulnerabilities and expose the cache memory to side channel attacks. In this paper, we propose a side channel attack (SCA) model...where the adversary can monitor the supply current of the memory array to partially identify the sensi- tive cache data that is being read or written. We...propose solutions such as short retention STTRAM, obfuscation of SCA using 1-bit parity, multi-bit random write, and, neutral- izing the SCA using

  14. A High-Precision Counter Using the DSP Technique

    DTIC Science & Technology

    2004-09-01

    DSP is not good enough to process all the 1-second samples. The cache memory is also not sufficient to store all the sampling data. So we cut the...sampling number in a cycle is not good enough to achieve an accuracy less than 2×10-11. For this reason, a correlation operation is performed for... not good enough to process all the 1-second samples. The cache memory is also not sufficient to store all the sampling data. We will solve this

  15. Compiler-directed cache management in multiprocessors

    NASA Technical Reports Server (NTRS)

    Cheong, Hoichi; Veidenbaum, Alexander V.

    1990-01-01

    The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.

  16. Collaborative video caching scheme over OFDM-based long-reach passive optical networks

    NASA Astrophysics Data System (ADS)

    Li, Yan; Dai, Shifang; Chang, Xiangmao

    2018-07-01

    Long-reach passive optical networks (LR-PONs) are now considered as a desirable access solution for cost-efficiently delivering broadband services by integrating metro network with access network, among which orthogonal frequency division multiplexing (OFDM)-based LR-PONs gain greater research interests due to their good robustness and high spectrum efficiency. In such attractive OFDM-based LR-PONs, however, it is still challenging to effectively provide video service, which is one of the most popular and profitable broadband services, for end users. Given that more video requesters (i.e., end users) far away from optical line terminal (OLT) are served in OFDM-based LR-PONs, it is efficiency-prohibitive to use traditional video delivery model, which relies on the OLT to transmit videos to requesters, for providing video service, due to the model will incur not only larger video playback delay but also higher downstream bandwidth consumption. In this paper, we propose a novel video caching scheme that to collaboratively cache videos on distributed optical network units (ONUs) which are closer to end users, and thus to timely and cost-efficiently provide videos for requesters by ONUs over OFDM-based LR-PONs. We firstly construct an OFDM-based LR-PON architecture to enable the cooperation among ONUs while caching videos. Given a limited storage capacity of each ONU, we then propose collaborative approaches to cache videos on ONUs with the aim to maximize the local video hit ratio (LVHR), i.e., the proportion of video requests that can be directly satisfied by ONUs, under diverse resources requirements and requests distributions of videos. Simulations are finally conducted to evaluate the efficiency of our proposed scheme.

  17. Programmable partitioning for high-performance coherence domains in a multiprocessor system

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

    2011-01-25

    A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.

  18. Research on mixed network architecture collaborative application model

    NASA Astrophysics Data System (ADS)

    Jing, Changfeng; Zhao, Xi'an; Liang, Song

    2009-10-01

    When facing complex requirements of city development, ever-growing spatial data, rapid development of geographical business and increasing business complexity, collaboration between multiple users and departments is needed urgently, however conventional GIS software (such as Client/Server model or Browser/Server model) are not support this well. Collaborative application is one of the good resolutions. Collaborative application has four main problems to resolve: consistency and co-edit conflict, real-time responsiveness, unconstrained operation, spatial data recoverability. In paper, application model called AMCM is put forward based on agent and multi-level cache. AMCM can be used in mixed network structure and supports distributed collaborative. Agent is an autonomous, interactive, initiative and reactive computing entity in a distributed environment. Agent has been used in many fields such as compute science and automation. Agent brings new methods for cooperation and the access for spatial data. Multi-level cache is a part of full data. It reduces the network load and improves the access and handle of spatial data, especially, in editing the spatial data. With agent technology, we make full use of its characteristics of intelligent for managing the cache and cooperative editing that brings a new method for distributed cooperation and improves the efficiency.

  19. Cache as point of coherence in multiprocessor system

    DOEpatents

    Blumrich, Matthias A.; Ceze, Luis H.; Chen, Dong; Gara, Alan; Heidelberger, Phlip; Ohmacht, Martin; Steinmacher-Burow, Burkhard; Zhuang, Xiaotong

    2016-11-29

    In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.

  20. A Comparison of Three Programming Models for Adaptive Applications

    NASA Technical Reports Server (NTRS)

    Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

    2000-01-01

    We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.

  1. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  2. A Performance Evaluation of the Cray X1 for Scientific Applications

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

    2004-01-01

    The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

  3. A Refreshable, On-line Cache for HST Data Retrieval

    NASA Astrophysics Data System (ADS)

    Fraquelli, Dorothy A.; Ellis, Tracy A.; Ridgaway, Michael; DPAS Team

    2016-01-01

    We discuss upgrades to the HST Data Processing System, with an emphasis on the changes Hubble Space Telescope (HST) Archive users will experience. In particular, data are now held on-line (in a cache) removing the need to reprocess the data every time they are requested from the Archive. OTFR (on the fly reprocessing) has been replaced by a reprocessing system, which runs in the background. Data in the cache are automatically placed in the reprocessing queue when updated calibration reference files are received or when an improved calibration algorithm is installed. Data in the on-line cache are expected to be the most up to date version. These changes were phased in throughout 2015 for all active instruments.The on-line cache was populated instrument by instrument over the course of 2015. As data were placed in the cache, the flag that triggers OTFR was reset so that OTFR no longer runs on these data. "Hybrid" requests to the Archive are handled transparently, with data not yet in the cache provided via OTFR and the remaining data provided from the cache. Users do not need to make separate requests.Users of the MAST Portal will be able to download data from the cache immediately. For data not in the cache, the Portal will send the user to the standard "Retrieval Options Page," allowing the user to direct the Archive to process and deliver the data.The classic MAST Search and Retrieval interface has the same look and feel as previously. Minor changes, unrelated to the cache, have been made to the format of the Retrieval Options Page.

  4. An Analysis of Instruction-Cached SIMD Computer Architecture

    DTIC Science & Technology

    1993-12-01

    ASSEBLE SIMULATE SCHEDULE VERIFY :t og ... . .. ... V~JSRUCTONSFOR PECIIEDCOMPARE ASSEMBLEI SIMULATE Ift*U1II ~ ~ SCHEDULEIinw ;. & VERIFY...Cache to Place Blocks ................. 70 4.5.4 Step 4: Schedule Cache Blocks ............................. 70 4.5.5 Step 5: Store Cache Blocks...167 B.4 Scheduler .............................................. 167 B.4.1 Basic Block Definition

  5. Architectural Techniques For Managing Non-volatile Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less

  6. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  7. Effects of simulated mountain lion caching on decomposition of ungulate carcasses

    USGS Publications Warehouse

    Bischoff-Mattson, Z.; Mattson, D.

    2009-01-01

    Caching of animal remains is common among carnivorous species of all sizes, yet the effects of caching on larger prey are unstudied. We conducted a summer field experiment designed to test the effects of simulated mountain lion (Puma concolor) caching on mass loss, relative temperature, and odor dissemination of 9 prey-like carcasses. We deployed all but one of the carcasses in pairs, with one of each pair exposed and the other shaded and shallowly buried (cached). Caching substantially reduced wastage during dry and hot (drought) but not wet and cool (monsoon) periods, and it also reduced temperature and discernable odor to some degree during both seasons. These results are consistent with the hypotheses that caching serves to both reduce competition from arthropods and microbes and reduce odds of detection by larger vertebrates such as bears (Ursus spp.), wolves (Canis lupus), or other lions.

  8. Fault Tolerant Cache Schemes

    NASA Astrophysics Data System (ADS)

    Tu, H.-Yu.; Tasneem, Sarah

    Most of modern microprocessors employ on—chip cache memories to meet the memory bandwidth demand. These caches are now occupying a greater real es tate of chip area. Also, continuous down scaling of transistors increases the possi bility of defects in the cache area which already starts to occupies more than 50% of chip area. For this reason, various techniques have been proposed to tolerate defects in cache blocks. These techniques can be classified into three different cat egories, namely, cache line disabling, replacement with spare block, and decoder reconfiguration without spare blocks. This chapter examines each of those fault tol erant techniques with a fixed typical size and organization of L1 cache, through extended simulation using SPEC2000 benchmark on individual techniques. The de sign and characteristics of each technique are summarized with a view to evaluate the scheme. We then present our simulation results and comparative study of the three different methods.

  9. Conditional load and store in a shared memory

    DOEpatents

    Blumrich, Matthias A; Ohmacht, Martin

    2015-02-03

    A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.

  10. Quantifying animal movement for caching foragers: the path identification index (PII) and cougars, Puma concolor.

    PubMed

    Ironside, Kirsten E; Mattson, David J; Theimer, Tad; Jansen, Brian; Holton, Brandon; Arundel, Terence; Peters, Michael; Sexton, Joseph O; Edwards, Thomas C

    2017-01-01

    Many studies of animal movement have focused on directed versus area-restricted movement, which rely on correlations between step-length and turn-angles and on stationarity through time to define behavioral states. Although these approaches might apply well to grazing in patchy landscapes, species that either feed for short periods on large, concentrated food sources or cache food exhibit movements that are difficult to model using the traditional metrics of turn-angle and step-length alone. We used GPS telemetry collected from a prey-caching predator, the cougar ( Puma concolor, Linnaeus ), to test whether combining metrics of site recursion, spatiotemporal clustering, speed, and turning into an index of movement using partial sums, improves the ability to identify caching behavior. The index was used to identify changes in movement characteristics over time and segment paths into behavioral classes. The identification of behaviors from the Path Identification Index (PII) was evaluated using field investigations of cougar activities at GPS locations. We tested for statistical stationarity across behaviors for use of topographic view-sheds. Changes in the frequency and duration of PII were useful for identifying seasonal activities such as migration, gestation, and denning. The comparison of field investigations of cougar activities to behavioral PII classes resulted in an overall classification accuracy of 81%. Changes in behaviors were reflected in cougars' use of topographic view-sheds, resulting in statistical nonstationarity over time, and revealed important aspects of hunting behavior. Incorporating metrics of site recursion and spatiotemporal clustering revealed the temporal structure in movements of a caching forager. The movement index PII, shows promise for identifying behaviors in species that frequently return to specific locations such as food caches, watering holes, or dens, and highlights the potential role memory and cognitive abilities play in determining animal movements.

  11. Cache coherency without line exclusivity in MP systems having store-in caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pomerene, J.H.; Puzak, T.R.; Rechtschaffen, R.N.

    1983-11-01

    By modifying the function of the storage control unit, a multiprocessor (MP) system having store-in caches is enabled to operate with the same versatility as an MP system having store-through caches, thereby eliminating the requirement for line exclusivity and greatly reducing the occurrence of cross-interrogates.

  12. Nature as a treasure map! Teaching geoscience with the help of earth caches?!

    NASA Astrophysics Data System (ADS)

    Zecha, Stefanie; Schiller, Thomas

    2015-04-01

    This presentation looks at how earth caches are influence the learning process in the field of geo science in non-formal education. The development of mobile technologies using Global Positioning System (GPS) data to point geographical location together with the evolving Web 2.0 supporting the creation and consumption of content, suggest a potential for collaborative informal learning linked to location. With the help of the GIS in smartphones you can go directly in nature, search for information by your smartphone, and learn something about nature. Earth caches are a very good opportunity, which are organized and supervised geocaches with special information about physical geography high lights. Interested people can inform themselves about aspects in geoscience area by earth caches. The main question of this presentation is how these caches are created in relation to learning processes. As is not possible, to analyze all existing earth caches, there was focus on Bavaria and a certain feature of earth caches. At the end the authors show limits and potentials for the use of earth caches and give some remark for the future.

  13. On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

    NASA Astrophysics Data System (ADS)

    Krishnappa, Dilip Kumar; Khemmarat, Samamon; Gao, Lixin; Zink, Michael

    Lately researchers are looking at ways to reduce the delay on video playback through mechanisms like prefetching and caching for Video-on-Demand (VoD) services. The usage of prefetching and caching also has the potential to reduce the amount of network bandwidth usage, as most popular requests are served from a local cache rather than the server containing the original content. In this paper, we investigate the advantages of having such a prefetching and caching scheme for a free hosting service of professionally created video (movies and TV shows) named "hulu". We look into the advantages of using a prefetching scheme where the most popular videos of the week, as provided by the hulu website, are prefetched and compare this approach with a conventional LRU caching scheme with limited storage space and a combined scheme of prefetching and caching. Results from our measurement and analysis shows that employing a basic caching scheme at the proxy yields a hit ratio of up to 77.69%, but requires storage of about 236GB. Further analysis shows that a prefetching scheme where the top-100 popular videos of the week are downloaded to the proxy yields a hit ratio of 44% with a storage requirement of 10GB. A LRU caching scheme with a storage limitation of 20GB can achieve a hit ratio of 55% but downloads 4713 videos to achieve such high hit ratio compared to 100 videos in prefetching scheme, whereas a scheme with both prefetching and caching with the same storage yields a hit ratio of 59% with download requirement of 4439 videos. We find that employing a scheme of prefetching along with caching with trade-off on the storage will yield a better hit ratio and bandwidth saving than individual caching or prefetching schemes.

  14. Performance Modeling of Network-Attached Storage Device Based Hierarchical Mass Storage Systems

    NASA Technical Reports Server (NTRS)

    Menasce, Daniel A.; Pentakalos, Odysseas I.

    1995-01-01

    Network attached storage devices improve I/O performance by separating control and data paths and eliminating host intervention during the data transfer phase. Devices are attached to both a high speed network for data transfer and to a slower network for control messages. Hierarchical mass storage systems use disks to cache the most recently used files and a combination of robotic and manually mounted tapes to store the bulk of the files in the file system. This paper shows how queuing network models can be used to assess the performance of hierarchical mass storage systems that use network attached storage devices as opposed to host attached storage devices. Simulation was used to validate the model. The analytic model presented here can be used, among other things, to evaluate the protocols involved in 1/0 over network attached devices.

  15. 44 CFR 208.24 - Purchase and maintenance of items not listed on Equipment Cache List.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... items not listed on Equipment Cache List. 208.24 Section 208.24 Emergency Management and Assistance... of items not listed on Equipment Cache List. (a) Requests for purchase or maintenance of equipment and supplies not appearing on the Equipment Cache List, or that exceed the number specified in the...

  16. A VLSI VAX chip set

    NASA Astrophysics Data System (ADS)

    Johnson, W. N.; Herrick, W. V.; Grundmann, W. J.

    1984-10-01

    For the first time, VLSI technology is used to compress the full functinality and comparable performance of the VAX 11/780 super-minicomputer into a 1.2 M transistor microprocessor chip set. There was no subsetting of the 304 instruction set and the 17 data types, nor reduction in hardware support for the 4 Gbyte virtual memory management architecture. The chipset supports an integral 8 kbyte memory cache, a 13.3 Mbyte/s system bus, and sophisticated multiprocessing. High performance is achieved through microcode optimizations afforded by the large control store, tightly coupled address and data caches, the use of internal and external 32 bit datapaths, the extensive aplication of both microlevel and macrolevel pipelining, and the use of specialized hardware assists.

  17. A Performance Evaluation of the Cray X1 for Scientific Applications

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

    2003-01-01

    The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

  18. Performance of Service-Discovery Architectures in Response to Node Failures

    DTIC Science & Technology

    2003-06-01

    cache manager ( SCM ). Multiple SCMs can be used to mitigate the effect of SCM failure. In both architectures, service discovery occurs passively, via...employed, the SCM operates as an intermediary, matching advertised SDs of SMs to SD requirements provided by SUs. In this study, each SM manages one SP...architecture in our experimental topology: with 12 SMs, one SU, and up to three SCMs . To animate our three-party model, we chose discovery behaviors from the

  19. Forest resources of the Wasatch-Cache National Forest

    Treesearch

    Renee A. O' Brien; Jesse Pope

    1997-01-01

    The 1,215,219 acres in the Wasatch-Cache National Forest encompass 863,906 acres of forest land, made up of 90 percent (776,239 acres) "timberland" and 10 percent (87,667 acres) "woodland." The other 351,313 acres of the Wasatch-Cache are nonforest or water (fig. 1). This report discusses forest land only. In the Wasatch-Cache, 26 percent...

  20. Problems faced by food-caching corvids and the evolution of cognitive solutions

    PubMed Central

    Grodzinski, Uri; Clayton, Nicola S.

    2010-01-01

    The scatter hoarding of food, or caching, is a widespread and well-studied behaviour. Recent experiments with caching corvids have provided evidence for episodic-like memory, future planning and possibly mental attribution, all cognitive abilities that were thought to be unique to humans. In addition to the complexity of making flexible, informed decisions about caching and recovering, this behaviour is underpinned by a motivationally controlled compulsion to cache. In this review, we shall first discuss the compulsive side of caching both during ontogeny and in the caching behaviour of adult corvids. We then consider some of the problems that these birds face and review the evidence for the cognitive abilities they use to solve them. Thus, the emergence of episodic-like memory is viewed as a solution for coping with food perishability, while the various cache-protection and pilfering strategies may be sophisticated tools to deprive competitors of information, either by reducing the quality of information they can gather, or invalidating the information they already have. Finally, we shall examine whether such future-oriented behaviour involves future planning and ask why this and other cognitive abilities might have evolved in corvids. PMID:20156820

  1. A Morphometric Assessment of the Intended Function of Cached Clovis Points

    PubMed Central

    Buchanan, Briggs; Kilby, J. David; Huckell, Bruce B.; O'Brien, Michael J.; Collard, Mark

    2012-01-01

    A number of functions have been proposed for cached Clovis points. The least complicated hypothesis is that they were intended to arm hunting weapons. It has also been argued that they were produced for use in rituals or in connection with costly signaling displays. Lastly, it has been suggested that some cached Clovis points may have been used as saws. Here we report a study in which we morphometrically compared Clovis points from caches with Clovis points recovered from kill and camp sites to test two predictions of the hypothesis that cached Clovis points were intended to arm hunting weapons: 1) cached points should be the same shape as, but generally larger than, points from kill/camp sites, and 2) cached points and points from kill/camp sites should follow the same allometric trajectory. The results of the analyses are consistent with both predictions and therefore support the hypothesis. A follow-up review of the fit between the results of the analyses and the predictions of the other hypotheses indicates that the analyses support only the hunting equipment hypothesis. We conclude from this that cached Clovis points were likely produced with the intention of using them to arm hunting weapons. PMID:22348012

  2. Do Clark's nutcrackers demonstrate what-where-when memory on a cache-recovery task?

    PubMed

    Gould, Kristy L; Ort, Amy J; Kamil, Alan C

    2012-01-01

    What-where-when (WWW) memory during cache recovery was investigated in six Clark's nutcrackers. During caching, both red- and blue-colored pine seeds were cached by the birds in holes filled with sand. Either a short (3 day) retention interval (RI) or a long (9 day) RI was followed by a recovery session during which caches were replaced with either a single seed or wooden bead depending upon the color of the cache and length of the retention interval. Knowledge of what was in the cache (seed or bead), where it was located, and when the cache had been made (3 or 9 days ago) were the three WWW memory components under investigation. Birds recovered items (bead or seed) at above chance levels, demonstrating accurate spatial memory. They also recovered seeds more than beads after the long RI, but not after the short RI, when they recovered seeds and beads equally often. The differential recovery after the long RI demonstrates that nutcrackers may have the capacity for WWW memory during this task, but it is not clear why it was influenced by RI duration.

  3. Problems faced by food-caching corvids and the evolution of cognitive solutions.

    PubMed

    Grodzinski, Uri; Clayton, Nicola S

    2010-03-27

    The scatter hoarding of food, or caching, is a widespread and well-studied behaviour. Recent experiments with caching corvids have provided evidence for episodic-like memory, future planning and possibly mental attribution, all cognitive abilities that were thought to be unique to humans. In addition to the complexity of making flexible, informed decisions about caching and recovering, this behaviour is underpinned by a motivationally controlled compulsion to cache. In this review, we shall first discuss the compulsive side of caching both during ontogeny and in the caching behaviour of adult corvids. We then consider some of the problems that these birds face and review the evidence for the cognitive abilities they use to solve them. Thus, the emergence of episodic-like memory is viewed as a solution for coping with food perishability, while the various cache-protection and pilfering strategies may be sophisticated tools to deprive competitors of information, either by reducing the quality of information they can gather, or invalidating the information they already have. Finally, we shall examine whether such future-oriented behaviour involves future planning and ask why this and other cognitive abilities might have evolved in corvids.

  4. Multicast for savings in cache-based video distribution

    NASA Astrophysics Data System (ADS)

    Griwodz, Carsten; Zink, Michael; Liepert, Michael; On, Giwon; Steinmetz, Ralf

    1999-12-01

    Internet video-on-demand (VoD) today streams videos directly from server to clients, because re-distribution is not established yet. Intranet solutions exist but are typically managed centrally. Caching may overcome these management needs, however existing web caching strategies are not applicable because they work in different conditions. We propose movie distribution by means of caching, and study the feasibility from the service providers' point of view. We introduce the combination of our reliable multicast protocol LCRTP for caching hierarchies combined with our enhancement to the patching technique for bandwidth friendly True VoD, not depending on network resource guarantees.

  5. Evidence against observational spatial memory for cache locations of conspecifics in marsh tits Poecile palustris.

    PubMed

    Urhan, A Utku; Emilsson, Ellen; Brodin, Anders

    2017-01-01

    Many species in the family Paridae, such as marsh tits Poecile palustris , are large-scale scatter hoarders of food that make cryptic caches and disperse these in large year-round territories. The perhaps most well-known species in the family, the great tit Parus major , does not store food itself but is skilled in stealing caches from the other species. We have previously demonstrated that great tits are able to memorise positions of caches they have observed marsh tits make and later return and steal the food. As great tits are explorative in nature and unusually good learners, it is possible that such "memorisation of caches from a distance" is a unique ability of theirs. The other possibility is that this ability is general in the parid family. Here, we tested marsh tits in the same experimental set-up as where we previously have tested great tits. We allowed caged marsh tits to observe a caching conspecific in a specially designed indoor arena. After a retention interval of 1 or 24 h, we allowed the observer to enter the arena and search for the caches. The marsh tits showed no evidence of such observational memorization ability, and we believe that such ability is more useful for a non-hoarding species. Why should a marsh tit that memorises hundreds of their own caches in the field bother with the difficult task of memorising other individuals' caches? We argue that the close-up memorisation procedure that marsh tits use at their own caches may be a different type of observational learning than memorisation of caches made by others. For example, the latter must be done from a distance and hence may require the ability to adopt an allocentric perspective, i.e. the ability to visualise the cache from the hoarder's perspective. Members of the Paridae family are known to possess foraging techniques that are cognitively advanced. Previously, we have demonstrated that a non-hoarding parid species, the great tit P. major , is able to memorise positions of caches that they have observed marsh tits P. palustris make. However, it is unknown whether this cognitively advanced foraging strategy is unique to great tits or if it occurs also in other parids. Here, we demonstrated that "pilfering by observational memorization strategy" is not a general strategy in parids. We believe that such ability is important for a non-hoarding species such as the great tit and, most likely, birds owning many caches do not need this foraging strategy.

  6. Cache directory look-up re-use as conflict check mechanism for speculative memory requests

    DOEpatents

    Ohmacht, Martin

    2013-09-10

    In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.

  7. Efficient Aho-Corasick String Matching on Emerging Multicore Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tumeo, Antonino; Villa, Oreste; Secchi, Simone

    String matching algorithms are critical to several scientific fields. Beside text processing and databases, emerging applications such as DNA protein sequence analysis, data mining, information security software, antivirus, ma- chine learning, all exploit string matching algorithms [3]. All these applica- tions usually process large quantity of textual data, require high performance and/or predictable execution times. Among all the string matching algorithms, one of the most studied, especially for text processing and security applica- tions, is the Aho-Corasick algorithm. 1 2 Book title goes here Aho-Corasick is an exact, multi-pattern string matching algorithm which performs the search in a time linearlymore » proportional to the length of the input text independently from pattern set size. However, depending on the imple- mentation, when the number of patterns increase, the memory occupation may raise drastically. In turn, this can lead to significant variability in the performance, due to the memory access times and the caching effects. This is a significant concern for many mission critical applications and modern high performance architectures. For example, security applications such as Network Intrusion Detection Systems (NIDS), must be able to scan network traffic against very large dictionaries in real time. Modern Ethernet links reach up to 10 Gbps, and malicious threats are already well over 1 million, and expo- nentially growing [28]. When performing the search, a NIDS should not slow down the network, or let network packets pass unchecked. Nevertheless, on the current state-of-the-art cache based processors, there may be a large per- formance variability when dealing with big dictionaries and inputs that have different frequencies of matching patterns. In particular, when few patterns are matched and they are all in the cache, the procedure is fast. Instead, when they are not in the cache, often because many patterns are matched and the caches are continuously thrashed, they should be retrieved from the system memory and the procedure is slowed down by the increased latency. Efficient implementations of string matching algorithms have been the fo- cus of several works, targeting Field Programmable Gate Arrays [4, 25, 15, 5], highly multi-threaded solutions like the Cray XMT [34], multicore proces- sors [19] or heterogeneous processors like the Cell Broadband Engine [35, 22]. Recently, several researchers have also started to investigate the use Graphic Processing Units (GPUs) for string matching algorithms in security applica- tions [20, 10, 32, 33]. Most of these approaches mainly focus on reaching high peak performance, or try to optimize the memory occupation, rather than looking at performance stability. However, hardware solutions supports only small dictionary sizes due to lack of memory and are difficult to customize, while platforms such as the Cell/B.E. are very complex to program.« less

  8. Advantages of masting in European beech: timing of granivore satiation and benefits of seed caching support the predator dispersal hypothesis.

    PubMed

    Zwolak, Rafał; Bogdziewicz, Michał; Wróbel, Aleksandra; Crone, Elizabeth E

    2016-03-01

    The predator satiation and predator dispersal hypotheses provide alternative explanations for masting. Both assume satiation of seed-eating vertebrates. They differ in whether satiation occurs before or after seed removal and caching by granivores (predator satiation and predator dispersal, respectively). This difference is largely unrecognized, but it is demographically important because cached seeds are dispersed and often have a microsite advantage over nondispersed seeds. We conducted rodent exclosure experiments in two mast and two nonmast years to test predictions of the predator dispersal hypothesis in our study system of yellow-necked mice (Apodemus flavicollis) and European beech (Fagus sylvatica). Specifically, we tested whether the fraction of seeds removed from the forest floor is similar during mast and nonmast years (i.e., lack of satiation before seed caching), whether masting decreases the removal of cached seeds (i.e., satiation after seed storage), and whether seed caching increases the probability of seedling emergence. We found that masting did not result in satiation at the seed removal stage. However, masting decreased the removal of cached seeds, and seed caching dramatically increased the probability of seedling emergence relative to noncached seeds. European beech thus benefits from masting through the satiation of scatterhoarders that occurs only after seeds are removed and cached. Although these findings do not exclude other evolutionary advantages of beech masting, they indicate that fitness benefits of masting extend beyond the most commonly considered advantages of predator satiation and increased pollination efficiency.

  9. Modifying dementia risk and trajectories of cognitive decline in aging: the Cache County Memory Study.

    PubMed

    Welsh-Bohmer, Kathleen A; Breitner, John C S; Hayden, Kathleen M; Lyketsos, Constantine; Zandi, Peter P; Tschanz, Joann T; Norton, Maria C; Munger, Ron

    2006-07-01

    The Cache County Study of Memory, Health, and Aging, more commonly referred to as the "Cache County Memory Study (CCMS)" is a longitudinal investigation of aging and Alzheimer's disease (AD) based in an exceptionally long-lived population residing in northern Utah. The study begun in 1994 has followed an initial cohort of 5,092 older individuals (many over age 84) and has examined the development of cognitive impairment and dementia in relation to genetic and environmental antecedents. This article summarizes the major contributions of the CCMS towards the understanding of mild cognitive disorders and AD across the lifespan, underscoring the role of common health exposures in modifying dementia risk and trajectories of cognitive change. The study now in its fourth wave of ascertainment illustrates the role of population-based approaches in informing testable models of cognitive aging and Alzheimer's disease.

  10. Locality-Aware CTA Clustering For Modern GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Ang; Song, Shuaiwen; Liu, Weifeng

    2017-04-08

    In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. Themore » experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.« less

  11. Ink Wash Painting Style Rendering With Physically-based Ink Dispersion Model

    NASA Astrophysics Data System (ADS)

    Wang, Yifan; Li, Weiran; Zhu, Qing

    2018-04-01

    This paper presents a real-time rendering method based on the GPU programmable pipeline for rendering the 3D scene in ink wash painting style. The method is divided into main three parts: First, render the ink properties of 3D model by calculating its vertex curvature. Then, cached the ink properties to a paper structure and using an ink dispersion model which is defined by referencing the theory of porous media to simulate the dispersion of ink. Finally, convert the ink properties to the pixel color information and render it to the screen. This method has a better performance than previous methods in visual quality.

  12. Checkpoint repair for high-performance out-of-order execution machines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hwu, W.M.W.; Patt, Y.N.

    Out-or-order execution and branch prediction are two mechanisms that can be used profitably in the design of supercomputers to increase performance. Proper exception handling and branch prediction miss handling in an out-of-order execution machine to require some kind of repair mechanism which can restore the machine to a known previous state. In this paper the authors present a class of repair mechanisms using the concept of checkpointing. The authors derive several properties of checkpoint repair mechanisms. In addition, they provide algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware, which also require nomore » additional complexity or time for use with write-back cache memory systems than they do with write-through cache memory systems, contrary to statements made by previous researchers.« less

  13. A novel cache mechanism

    NASA Technical Reports Server (NTRS)

    Gunawardena, J. A.

    1992-01-01

    This cache mechanism is transparent but does not contain associative circuits. It does not rely on locality of reference of instructions or data. No redundant instructions or data are encached. Items in the cache are accessed without address arithmetic. A cache miss is detected by the simplest test; compare two bits. These features would result in faster access, higher hit rate, reduced chip area, and less power dissipation in comparison with associative systems of similar size.

  14. Load Balancing in Distributed Web Caching: A Novel Clustering Approach

    NASA Astrophysics Data System (ADS)

    Tiwari, R.; Kumar, K.; Khan, G.

    2010-11-01

    The World Wide Web suffers from scaling and reliability problems due to overloaded and congested proxy servers. Caching at local proxy servers helps, but cannot satisfy more than a third to half of requests; more requests are still sent to original remote origin servers. In this paper we have developed an algorithm for Distributed Web Cache, which incorporates cooperation among proxy servers of one cluster. This algorithm uses Distributed Web Cache concepts along with static hierarchies with geographical based clusters of level one proxy server with dynamic mechanism of proxy server during the congestion of one cluster. Congestion and scalability problems are being dealt by clustering concept used in our approach. This results in higher hit ratio of caches, with lesser latency delay for requested pages. This algorithm also guarantees data consistency between the original server objects and the proxy cache objects.

  15. Generating Performance Models for Irregular Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Friese, Ryan D.; Tallent, Nathan R.; Vishnu, Abhinav

    2017-05-30

    Many applications have irregular behavior --- non-uniform input data, input-dependent solvers, irregular memory accesses, unbiased branches --- that cannot be captured using today's automated performance modeling techniques. We describe new hierarchical critical path analyses for the \\Palm model generation tool. To create a model's structure, we capture tasks along representative MPI critical paths. We create a histogram of critical tasks with parameterized task arguments and instance counts. To model each task, we identify hot instruction-level sub-paths and model each sub-path based on data flow, instruction scheduling, and data locality. We describe application models that generate accurate predictions for strong scalingmore » when varying CPU speed, cache speed, memory speed, and architecture. We present results for the Sweep3D neutron transport benchmark; Page Rank on multiple graphs; Support Vector Machine with pruning; and PFLOTRAN's reactive flow/transport solver with domain-induced load imbalance.« less

  16. High Performance Computing and Visualization Infrastructure for Simultaneous Parallel Computing and Parallel Visualization Research

    DTIC Science & Technology

    2016-11-09

    Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel receiving PHDs Names of other research staff...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W

  17. EarthCache as a Tool to Promote Earth-Science in Public School Classrooms

    NASA Astrophysics Data System (ADS)

    Gochis, E. E.; Rose, W. I.; Klawiter, M.; Vye, E. C.; Engelmann, C. A.

    2011-12-01

    Geoscientists often find it difficult to bridge the gap in communication between university research and what is learned in the public schools. Today's schools operate in a high stakes environment that only allow instruction based on State and National Earth Science curriculum standards. These standards are often unknown by academics or are written in a style that obfuscates the transfer of emerging scientific research to students in the classroom. Earth Science teachers are in an ideal position to make this link because they have a background in science as well as a solid understanding of the required curriculum standards for their grade and the pedagogical expertise to pass on new information to their students. As part of the Michigan Teacher Excellence Program (MiTEP), teachers from Grand Rapids, Kalamazoo, and Jackson school districts participate in 2 week field courses with Michigan Tech University to learn from earth science experts about how the earth works. This course connects Earth Science Literacy Principles' Big Ideas and common student misconceptions with standards-based education. During the 2011 field course, we developed and began to implement a three-phase EarthCache model that will provide a geospatial interactive medium for teachers to translate the material they learn in the field to the students in their standards based classrooms. MiTEP participants use GPS and Google Earth to navigate to Michigan sites of geo-significance. At each location academic experts aide participants in making scientific observations about the locations' geologic features, and "reading the rocks" methodology to interpret the area's geologic history. The participants are then expected to develop their own EarthCache site to be used as pedagogical tool bridging the gap between standards-based classroom learning, contemporary research and unique outdoor field experiences. The final phase supports teachers in integrating inquiry based, higher-level learning student activities to EarthCache sites near their own urban communities, or in regional areas such as nature preserves and National Parks. By working together, MiTEP participants are developing a network of regional EarthCache sites and shared lesson plans which explore places that are meaningful to students while simultaneously connecting them to geologic concepts they are learning in school. We believe that the MiTEP EarthCaching model will help participants emerge as leaders of inquiry style, and virtual place-based educators within their districts.

  18. MELOC - Memory and Location Optimized Caching for Mobile Ad Hoc Networks

    DTIC Science & Technology

    2011-01-01

    required for such environments. Moreover, nodes located at centre have to be chosen as cache location, since it reduces the chance of being attacked...Figure 1.1. MANET Formed by Armed Forces 47 Example 3: Sharing of music and videos are famous among mobile users. Instead of downloading...The two tier caching scheme discussed in this paper is acoustic . The characteristics of two-tier caching are as follows, the content of data to be

  19. Visual landmark-directed scatter-hoarding of Siberian chipmunks Tamias sibiricus.

    PubMed

    Zhang, Dongyuan; Li, Jia; Wang, Zhenyu; Yi, Xianfeng

    2016-05-01

    Spatial memory of cached food items plays an important role in cache recovery by scatter-hoarding animals. However, whether scatter-hoarding animals intentionally select cache sites with respect to visual landmarks in the environment and then rely on them to recover their cached seeds for later use has not been extensively explored. Furthermore, there is a lack of evidence on whether there are sex differences in visual landmark-based food-hoarding behaviors in small rodents even though male and female animals exhibit different spatial abilities. In the present study, we used a scatter-hoarding animal, the Siberian chipmunk, Tamias sibiricus to explore these questions in semi-natural enclosures. Our results showed that T. sibiricus preferred to establish caches in the shallow pits labeled with visual landmarks (branches of Pinus sylvestris, leaves of Athyrium brevifrons and PVC tubes). In addition, visual landmarks of P. sylvestris facilitated cache recovery by T. sibiricus. We also found significant sex differences in visual landmark-based food-hoarding strategies in Siberian chipmunks. Males, rather than females, chipmunks tended to establish their caches with respect to the visual landmarks. Our studies show that T. sibiricus rely on visual landmarks to establish and recover their caches, and that sex differences exist in visual landmark-based food hoarding in Siberian chipmunks. © 2015 International Society of Zoological Sciences, Institute of Zoology/Chinese Academy of Sciences and John Wiley & Sons Australia, Ltd.

  20. Early post-tsunami disaster medical assistance to Banda Aceh: a personal account.

    PubMed

    Garner, Alan A; Harrison, Ken

    2006-02-01

    The south Asian tsunami on 26 December, 2004, saw Australia deploy civilian teams to an international disaster in large numbers for the first time. The logistics of supporting such teams in both a self sustainability capacity and medical equipment had not previously been planned for or tested. For the first Australian team deployed to Banda Aceh, which arrived on the fourth day after the tsunami, equipment sourced from the New South Wales Fire Brigades Urban Search and Rescue (US&R) cache supplied all food, water, tents, generators and sleeping equipment. The medical equipment was largely sourced from the CareFlight US&R medical cache. There were significant deficits in surgical equipment as the medical cache had not been designed to provide a stand alone surgical capability. This resulted in the need for substantial improvisation by the surgical teams during the deployment. Despite this, the team performed nearly 140 major procedures in austere circumstances and significantly contributed to the early international response to this major humanitarian disaster.

  1. EMR Database Upgrade from MUMPS to CACHE: Lessons Learned.

    PubMed

    Alotaibi, Abduallah; Emshary, Mshary; Househ, Mowafa

    2014-01-01

    Over the past few years, Saudi hospitals have been implementing and upgrading Electronic Medical Record Systems (EMRs) to ensure secure data transfer and exchange between EMRs.This paper focuses on the process and lessons learned in upgrading the MUMPS database to a the newer Caché database to ensure the integrity of electronic data transfer within a local Saudi hospital. This paper examines the steps taken by the departments concerned, their action plans and how the change process was managed. Results show that user satisfaction was achieved after the upgrade was completed. The system was stable and offered better healthcare quality to patients as a result of the data exchange. Hardware infrastructure upgrades improved scalability and software upgrades to Caché improved stability. The overall performance was enhanced and new functions were added (CPOE) during the upgrades. The essons learned were: 1) Involve higher management; 2) Research multiple solutions available in the market; 3) Plan for a variety of implementation scenarios.

  2. Methods for compressible fluid simulation on GPUs using high-order finite differences

    NASA Astrophysics Data System (ADS)

    Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer

    2017-08-01

    We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.

  3. The Development of Caching and Object Permanence in Western Scrub-Jays (Aphelocoma californica): Which Emerges First?

    PubMed Central

    Salwiczek, Lucie H.; Schlinger, Barney; Emery, Nathan J.; Clayton, Nicola S.

    2010-01-01

    Recent studies on the food-caching behavior of corvids have revealed complex physical and social skills, yet little is known about the ontogeny of food caching in relation to the development of cognitive capacities. Piagetian object permanence is the understanding that objects continue to exist even when they are no longer visible. Here, the authors focus on Piagetian Stages 3 and 4, because they are hallmarks in the cognitive development of both young children and animals. Our aim is to determine in a food-caching corvid, the Western scrub-jay, whether (1) Piagetian Stage 4 competence and tentative caching (i.e., hiding an item invisibly and retrieving it without delay), emerge concomitantly or consecutively; (2) whether experiencing the reappearance of hidden objects enhances the timing of the appearance of object permanence; and (3) discuss how the development of object permanence is related to behavioral development and sensorimotor intelligence. Our findings suggest that object permanence Stage 4 emerges before tentative caching, and independent of environmental influences, but that once the birds have developed simple object-permanence, then social learning might advance the interval after which tentative caching commences. PMID:19685971

  4. The development of caching and object permanence in Western scrub-jays (Aphelocoma californica): which emerges first?

    PubMed

    Salwiczek, Lucie H; Emery, Nathan J; Schlinger, Barney; Clayton, Nicola S

    2009-08-01

    Recent studies on the food-caching behavior of corvids have revealed complex physical and social skills, yet little is known about the ontogeny of food caching in relation to the development of cognitive capacities. Piagetian object permanence is the understanding that objects continue to exist even when they are no longer visible. Here, the authors focus on Piagetian Stages 3 and 4, because they are hallmarks in the cognitive development of both young children and animals. Our aim is to determine in a food-caching corvid, the Western scrub-jay, whether (1) Piagetian Stage 4 competence and tentative caching (i.e., hiding an item invisibly and retrieving it without delay), emerge concomitantly or consecutively; (2) whether experiencing the reappearance of hidden objects enhances the timing of the appearance of object permanence; and (3) discuss how the development of object permanence is related to behavioral development and sensorimotor intelligence. Our findings suggest that object permanence Stage 4 emerges before tentative caching, and independent of environmental influences, but that once the birds have developed simple object-permanence, then social learning might advance the interval after which tentative caching commences. Copyright 2009 APA, all rights reserved.

  5. Fox Squirrels Match Food Assessment and Cache Effort to Value and Scarcity

    PubMed Central

    Delgado, Mikel M.; Nicholas, Molly; Petrie, Daniel J.; Jacobs, Lucia F.

    2014-01-01

    Scatter hoarders must allocate time to assess items for caching, and to carry and bury each cache. Such decisions should be driven by economic variables, such as the value of the individual food items, the scarcity of these items, competition for food items and risk of pilferage by conspecifics. The fox squirrel, an obligate scatter-hoarder, assesses cacheable food items using two overt movements, head flicks and paw manipulations. These behaviors allow an examination of squirrel decision processes when storing food for winter survival. We measured wild squirrels' time allocations and frequencies of assessment and investment behaviors during periods of food scarcity (summer) and abundance (fall), giving the squirrels a series of 15 items (alternating five hazelnuts and five peanuts). Assessment and investment per cache increased when resource value was higher (hazelnuts) or resources were scarcer (summer), but decreased as scarcity declined (end of sessions). This is the first study to show that assessment behaviors change in response to factors that indicate daily and seasonal resource abundance, and that these factors may interact in complex ways to affect food storing decisions. Food-storing tree squirrels may be a useful and important model species to understand the complex economic decisions made under natural conditions. PMID:24671221

  6. 76 FR 26981 - Proposed Flood Elevation Determinations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-10

    ... table provided here represents the flooding sources, location of referenced elevations, effective and.... Specifically, it addresses the following flooding sources: Cache Creek, Cache Creek Left Bank Overflow, and... ``Unincorporated Areas of Yolo County, California'' addressed the flooding source Cache Creek Settling Basin. That...

  7. Norms for CERAD Constructional Praxis Recall

    PubMed Central

    Fillenbaum, Gerda G.; Burchett, Bruce M.; Unverzagt, Frederick W.; Rexroth, Daniel F.; Welsh-Bohmer, Kathleen

    2012-01-01

    Recall of the 4-item constructional praxis measure was a later addition to the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) neuropsychological battery. Norms for this measure, based on cognitively intact African Americans age ≥70 (Indianapolis-Ibadan Dementia Project, N=372), European American participants age ≥66 (Cache County Study of Memory, Health and Aging, N=507), and European American CERAD clinic controls age ≥50 (N=182), are presented here. Performance varied by site; by sex, education and age (African Americans in Indianapolis); education and age (Cache County European Americans; and only age (CERAD European American controls). Performance declined with increased age, within age with less education, and was poorer for women. Means, standard deviations, and percentiles are presented separately for each sample. PMID:21992077

  8. Optimizing raid performance with cache

    NASA Technical Reports Server (NTRS)

    Bouzari, Alex

    1994-01-01

    We live in a world of increasingly complex applications and operating systems. Information is increasing at a mind-boggling rate. The consolidation of text, voice, and imaging represents an even greater challenge for our information systems. Which forced us to address three important questions: Where do we store all this information? How do we access it? And, how do we protect it against the threat of loss or damage? Introduced in the 1980s, RAID (Redundant Arrays of Independent Disks) represents a cost-effective solution to the needs of the information age. While fulfilling expectations for high storage, and reliability, RAID is sometimes subject to criticisms in the area of performance. However, there are design elements that can significantly enhance performance. They can be subdivided into two areas: (1) RAID levels or basic architecture. And, (2) enhancement schemes such as intelligent caching, support of tagged command queuing, and use of SCSI-2 Fast and Wide features.

  9. The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation

    PubMed Central

    Pezzulo, Giovanni; Rigoli, Francesco; Chersi, Fabian

    2013-01-01

    Instrumental behavior depends on both goal-directed and habitual mechanisms of choice. Normative views cast these mechanisms in terms of model-free and model-based methods of reinforcement learning, respectively. An influential proposal hypothesizes that model-free and model-based mechanisms coexist and compete in the brain according to their relative uncertainty. In this paper we propose a novel view in which a single Mixed Instrumental Controller produces both goal-directed and habitual behavior by flexibly balancing and combining model-based and model-free computations. The Mixed Instrumental Controller performs a cost-benefits analysis to decide whether to chose an action immediately based on the available “cached” value of actions (linked to model-free mechanisms) or to improve value estimation by mentally simulating the expected outcome values (linked to model-based mechanisms). Since mental simulation entails cognitive effort and increases the reward delay, it is activated only when the associated “Value of Information” exceeds its costs. The model proposes a method to compute the Value of Information, based on the uncertainty of action values and on the distance of alternative cached action values. Overall, the model by default chooses on the basis of lighter model-free estimates, and integrates them with costly model-based predictions only when useful. Mental simulation uses a sampling method to produce reward expectancies, which are used to update the cached value of one or more actions; in turn, this updated value is used for the choice. The key predictions of the model are tested in different settings of a double T-maze scenario. Results are discussed in relation with neurobiological evidence on the hippocampus – ventral striatum circuit in rodents, which has been linked to goal-directed spatial navigation. PMID:23459512

  10. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  11. Accessing Data Federations with CVMFS

    DOE PAGES

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave; ...

    2017-11-23

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the datamore » scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. Here, we will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.« less

  12. Accessing Data Federations with CVMFS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the datamore » scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. Here, we will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.« less

  13. Accessing Data Federations with CVMFS

    NASA Astrophysics Data System (ADS)

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave; Blomer, Jakob; Meusel, Ren

    2017-10-01

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the data scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. We will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.

  14. Interference Lattice-based Loop Nest Tilings for Stencil Computations

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Frumkin, Michael

    2000-01-01

    A common method for improving performance of stencil operations on structured multi-dimensional discretization grids is loop tiling. Tile shapes and sizes are usually determined heuristically, based on the size of the primary data cache. We provide a lower bound on the numbers of cache misses that must be incurred by any tiling, and a close achievable bound using a particular tiling based on the grid interference lattice. The latter tiling is used to derive highly efficient loop orderings. The total number of cache misses of a code is the sum of (necessary) cold misses and misses caused by elements being dropped from the cache between successive loads (replacement misses). Maximizing temporal locality is equivalent to minimizing replacement misses. Temporal locality of loop nests implementing stencil operations is optimized by tilings that avoid data conflicts. We divide the loop nest iteration space into conflict-free tiles, derived from the cache miss equation. The tiling involves the definition of the grid interference lattice an equivalence class of grid points whose images in main memory map to the same location in the cache-and the construction of a special basis for the lattice. Conflicts only occur on the boundaries of the tiles, unless the tiles are too thin. We show that the surface area of the tiles is bounded for grids of any dimensionality, and for caches of any associativity, provided the eccentricity of the fundamental parallelepiped (the tile spanned by the basis) of the lattice is bounded. Eccentricity is determined by two factors, aspect ratio and skewness. The aspect ratio of the parallelepiped can be bounded by appropriate array padding. The skewness can be bounded by the choice of a proper basis. Combining these two strategies ensures that pathologically thin tiles are avoided. They do not, however, minimize replacement misses per se. The reason is that tile visitation order influences the number of data conflicts on the tile boundaries. If two adjacent tiles are visited successively, there will be no replacement misses on the shared boundary. The iteration space may be covered with pencils larger than the size of the cache while avoiding data conflicts if the pencils are traversed by a scanning-face method. Replacement misses are incurred only on the boundaries of the pencils, and the number of misses is minimized by maximizing the volume of the scanning face, not the volume of the tile. We present an algorithm for constructing the most efficient scanning face for a given grid and stencil operator. In two dimensions it is based on a continued fraction algorithm. In three dimensions it follows Voronoi's successive minima algorithm. We show experimental results of using the scanning face, and compare with canonical loop orderings.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, D.; Blomer, J.

    Both the CernVM File System (CVMFS) and the Frontier Distributed Database Caching System (Frontier) distribute centrally updated data worldwide for LHC experiments using http proxy caches. Neither system provides privacy or access control on reading the data, but both control access to updates of the data and can guarantee the authenticity and integrity of the data transferred to clients over the internet. CVMFS has since its early days required digital signatures and secure hashes on all distributed data, and recently Frontier has added X.509-based authenticity and integrity checking. In this paper we detail and compare the security models of CVMFSmore » and Frontier.« less

  16. Design Alternatives to Improve Access Time Performance of Disk Drives Under DOS and UNIX

    NASA Astrophysics Data System (ADS)

    Hospodor, Andy

    For the past 25 years, improvements in CPU performance have overshadowed improvements in the access time performance of disk drives. CPU performance has been slanted towards greater instruction execution rates, measured in millions of instructions per second (MIPS). However, the slant for performance of disk storage has been towards capacity and corresponding increased storage densities. The IBM PC, introduced in 1982, processed only a fraction of a MIP. Follow-on CPUs, such as the 80486 and 80586, sported 5-10 MIPS by 1992. Single user PCs and workstations, with one CPU and one disk drive, became the dominant application, as implied by their production volumes. However, disk drives did not enjoy a corresponding improvement in access time performance, although the potential still exists. The time to access a disk drive improves (decreases) in two ways: by altering the mechanical properties of the drive or by adding cache to the drive. This paper explores the improvement to access time performance of disk drives using cache, prefetch, faster rotation rates, and faster seek acceleration.

  17. Locality in Search Engine Queries and Its Implications for Caching

    DTIC Science & Technology

    2001-05-01

    in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have

  18. Mammal caching of oak acorns in a red pine and a mixed oak stand

    Treesearch

    E.R. Thorn; W.M. Tzilkowski

    1991-01-01

    Small mammal caching of oak (Quercus spp.) acorns in adjacent red pine (Pinus resinosa) and mixed-oak stands was investigated at The Penn State Experimental Forest, Huntingdon Co., Pennsylvania. Gray squirrels (Sciurus carolinensis) and mice (Peromyscus spp.) were the most common acorn-caching...

  19. Population substructure in Cache County, Utah: the Cache County study

    PubMed Central

    2014-01-01

    Background Population stratification is a key concern for genetic association analyses. In addition, extreme homogeneity of ethnic origins of a population can make it difficult to interpret how genetic associations in that population may translate into other populations. Here we have evaluated the genetic substructure of samples from the Cache County study relative to the HapMap Reference populations and data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Results Our findings show that the Cache County study is similar in ethnic diversity to the self-reported "Whites" in the ADNI sample and less homogenous than the HapMap CEU population. Conclusions We conclude that the Cache County study is genetically representative of the general European American population in the USA and is an appropriate population for conducting broadly applicable genetic studies. PMID:25078123

  20. Improving Internet Archive Service through Proxy Cache.

    ERIC Educational Resources Information Center

    Yu, Hsiang-Fu; Chen, Yi-Ming; Wang, Shih-Yong; Tseng, Li-Ming

    2003-01-01

    Discusses file transfer protocol (FTP) servers for downloading archives (files with particular file extensions), and the change to HTTP (Hypertext transfer protocol) with increased Web use. Topics include the Archie server; proxy cache servers; and how to improve the hit rate of archives by a combination of caching and better searching mechanisms.…

  1. Distributed Name Servers: Naming and Caching in Large Distributed Computing Environments

    DTIC Science & Technology

    1985-12-01

    transmission rate of the communication medium1, transmission over a 56K bps line costs approx- imately 54r, and similarly, communication over a 9.6K...memories for modem computer systems attempt to maximize the hit ratio for a fixed-size cache by utilizing intelligent cache replacement algorithms

  2. Winter prey caching by northern hawk owls in Minnesota

    Treesearch

    Richard R. Schaefer; D. Craig Rudolph; Jesse F. Fagan

    2007-01-01

    Northern Hawk Owls (Surnia ulula) have been reported to cache prey during the breeding season for later consumption, but detailed reports of prey caching during the non-breeding season are comparatively rare. We provided prey to four individual Northern Hawk Owls in wintering areas in northeastern Minnesota during 2001 and 2005 and observed their...

  3. Visits, Hits, Caching and Counting on the World Wide Web: Old Wine in New Bottles?

    ERIC Educational Resources Information Center

    Berthon, Pierre; Pitt, Leyland; Prendergast, Gerard

    1997-01-01

    Although web browser caching speeds up retrieval, reduces network traffic, and decreases the load on servers and browser's computers, an unintended consequence for marketing research is that Web servers undercount hits. This article explores counting problems, caching, proxy servers, trawler software and presents a series of correction factors…

  4. Episodic-like memory during cache recovery by scrub jays.

    PubMed

    Clayton, N S; Dickinson, A

    1998-09-17

    The recollection of past experiences allows us to recall what a particular event was, and where and when it occurred, a form of memory that is thought to be unique to humans. It is known, however, that food-storing birds remember the spatial location and contents of their caches. Furthermore, food-storing animals adapt their caching and recovery strategies to the perishability of food stores, which suggests that they are sensitive to temporal factors. Here we show that scrub jays (Aphelocoma coerulescens) remember 'when' food items are stored by allowing them to recover perishable 'wax worms' (wax-moth larvae) and non-perishable peanuts which they had previously cached in visuospatially distinct sites. Jays searched preferentially for fresh wax worms, their favoured food, when allowed to recover them shortly after caching. However, they rapidly learned to avoid searching for worms after a longer interval during which the worms had decayed. The recovery preference of jays demonstrates memory of where and when particular food items were cached, thereby fulfilling the behavioural criteria for episodic-like memory in non-human animals.

  5. Parallelization Issues and Particle-In Codes.

    NASA Astrophysics Data System (ADS)

    Elster, Anne Cathrine

    1994-01-01

    "Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.

  6. Ordering of guarded and unguarded stores for no-sync I/O

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2013-06-25

    A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.

  7. A survey of techniques for architecting and managing GPU register file

    DOE PAGES

    Mittal, Sparsh

    2016-04-07

    To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less

  8. A survey of techniques for architecting and managing GPU register file

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh

    To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less

  9. Improve Performance of Data Warehouse by Query Cache

    NASA Astrophysics Data System (ADS)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  10. Improved cache performance in Monte Carlo transport calculations using energy banding

    NASA Astrophysics Data System (ADS)

    Siegel, A.; Smith, K.; Felker, K.; Romano, P.; Forget, B.; Beckman, P.

    2014-04-01

    We present an energy banding algorithm for Monte Carlo (MC) neutral particle transport simulations which depend on large cross section lookup tables. In MC codes, read-only cross section data tables are accessed frequently, exhibit poor locality, and are typically too much large to fit in fast memory. Thus, performance is often limited by long latencies to RAM, or by off-node communication latencies when the data footprint is very large and must be decomposed on a distributed memory machine. The proposed energy banding algorithm allows maximal temporal reuse of data in band sizes that can flexibly accommodate different architectural features. The energy banding algorithm is general and has a number of benefits compared to the traditional approach. In the present analysis we explore its potential to achieve improvements in time-to-solution on modern cache-based architectures.

  11. Operational Experience with the Frontier System in CMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blumenfeld, Barry; Dykstra, Dave; Kreuzer, Peter

    2012-06-20

    The Frontier framework is used in the CMS experiment at the LHC to deliver conditions data to processing clients worldwide, including calibration, alignment, and configuration information. Each central server at CERN, called a Frontier Launchpad, uses tomcat as a servlet container to establish the communication between clients and the central Oracle database. HTTP-proxy Squid servers, located close to clients, cache the responses to queries in order to provide high performance data access and to reduce the load on the central Oracle database. Each Frontier Launchpad also has its own reverse-proxy Squid for caching. The three central servers have been deliveringmore » about 5 million responses every day since the LHC startup, containing about 40 GB data in total, to more than one hundred Squid servers located worldwide, with an average response time on the order of 10 milliseconds. The Squid caches deployed worldwide process many more requests per day, over 700 million, and deliver over 40 TB of data. Several monitoring tools of the tomcat log files, the accesses of the Squids on the central Launchpad servers, and the availability of remote Squids have been developed to guarantee the performance of the service and make the system easily maintainable. Following a brief introduction of the Frontier framework, we describe the performance of this highly reliable and stable system, detail monitoring concerns and their deployment, and discuss the overall operational experience from the first two years of LHC data-taking.« less

  12. Operational Experience with the Frontier System in CMS

    NASA Astrophysics Data System (ADS)

    Blumenfeld, Barry; Dykstra, Dave; Kreuzer, Peter; Du, Ran; Wang, Weizhen

    2012-12-01

    The Frontier framework is used in the CMS experiment at the LHC to deliver conditions data to processing clients worldwide, including calibration, alignment, and configuration information. Each central server at CERN, called a Frontier Launchpad, uses tomcat as a servlet container to establish the communication between clients and the central Oracle database. HTTP-proxy Squid servers, located close to clients, cache the responses to queries in order to provide high performance data access and to reduce the load on the central Oracle database. Each Frontier Launchpad also has its own reverse-proxy Squid for caching. The three central servers have been delivering about 5 million responses every day since the LHC startup, containing about 40 GB data in total, to more than one hundred Squid servers located worldwide, with an average response time on the order of 10 milliseconds. The Squid caches deployed worldwide process many more requests per day, over 700 million, and deliver over 40 TB of data. Several monitoring tools of the tomcat log files, the accesses of the Squids on the central Launchpad servers, and the availability of remote Squids have been developed to guarantee the performance of the service and make the system easily maintainable. Following a brief introduction of the Frontier framework, we describe the performance of this highly reliable and stable system, detail monitoring concerns and their deployment, and discuss the overall operational experience from the first two years of LHC data-taking.

  13. Pattern recognition for cache management in distributed medical imaging environments.

    PubMed

    Viana-Ferreira, Carlos; Ribeiro, Luís; Matos, Sérgio; Costa, Carlos

    2016-02-01

    Traditionally, medical imaging repositories have been supported by indoor infrastructures with huge operational costs. This paradigm is changing thanks to cloud outsourcing which not only brings technological advantages but also facilitates inter-institutional workflows. However, communication latency is one main problem in this kind of approaches, since we are dealing with tremendous volumes of data. To minimize the impact of this issue, cache and prefetching are commonly used. The effectiveness of these mechanisms is highly dependent on their capability of accurately selecting the objects that will be needed soon. This paper describes a pattern recognition system based on artificial neural networks with incremental learning to evaluate, from a set of usage pattern, which one fits the user behavior at a given time. The accuracy of the pattern recognition model in distinct training conditions was also evaluated. The solution was tested with a real-world dataset and a synthesized dataset, showing that incremental learning is advantageous. Even with very immature initial models, trained with just 1 week of data samples, the overall accuracy was very similar to the value obtained when using 75% of the long-term data for training the models. Preliminary results demonstrate an effective reduction in communication latency when using the proposed solution to feed a prefetching mechanism. The proposed approach is very interesting for cache replacement and prefetching policies due to the good results obtained since the first deployment moments.

  14. A new framework for the analysis of continental-scale convection-resolving climate simulations

    NASA Astrophysics Data System (ADS)

    Leutwyler, D.; Charpilloz, C.; Arteaga, A.; Ban, N.; Di Girolamo, S.; Fuhrer, O.; Hoefler, T.; Schulthess, T. C.; Christoph, S.

    2017-12-01

    High-resolution climate simulations at horizontal resolution of O(1-4 km) allow explicit treatment of deep convection (thunderstorms and rain showers). Explicitly treating convection by the governing equations reduces uncertainties associated with parametrization schemes and allows a model formulation closer to physical first principles [1,2]. But kilometer-scale climate simulations with long integration periods and large computational domains are expensive and data storage becomes unbearably voluminous. Hence new approaches to perform analysis are required. In the crCLIM project we propose a new climate modeling framework that allows scientists to conduct analysis at high spatial and temporal resolution. We tackle the computational cost by using the largest available supercomputers such as hybrid CPU-GPU architectures. For this the COSMO model has been adapted to run on such architectures [2]. We then alleviate the I/O-bottleneck by employing a simulation data-virtualizer (SDaVi) that allows to trade-off storage (space) for computational effort (time). This is achieved by caching the simulation outputs and efficiently launching re-simulations in case of cache misses. All this is done transparently from the analysis applications [3]. For the re-runs this approach requires a bit-reproducible version of COSMO. That is to say a model that produces identical results on different architectures to ensure coherent recomputation of the requested data [4]. In this contribution we present a version of SDaVi, a first performance model, and a strategy to obtain bit-reproducibility across hardware architectures.[1] N. Ban, J. Schmidli, C. Schär. Evaluation of the convection-resolving regional climate modeling approach in decade-long simulations. J. Geophys. Res. Atmos., 7889-7907, 2014.[2] D. Leutwyler, O. Fuhrer, X. Lapillonne, D. Lüthi, C. Schär. Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19. Geosci. Model Dev, 3393-3412, 2016.[3] S. Di Girolamo, P. Schmid, T. Schulthess, T. Hoefler. Virtualized Big Data: Reproducing Simulation Output on Demand. Submit. to the 23rd ACM Symposium on PPoPP 18, Vienna, Austria.[4] A. Arteaga, O. Fuhrer, T. Hoefler. Designing Bit-Reproducible Portable High-Performance Applications. IEEE 28th IPDPS, 2014.

  15. dCache, Sync-and-Share for Big Data

    NASA Astrophysics Data System (ADS)

    Millar, AP; Fuhrmann, P.; Mkrtchyan, T.; Behrmann, G.; Bernardt, C.; Buchholz, Q.; Guelzow, V.; Litvintsev, D.; Schwank, K.; Rossi, A.; van der Reest, P.

    2015-12-01

    The availability of cheap, easy-to-use sync-and-share cloud services has split the scientific storage world into the traditional big data management systems and the very attractive sync-and-share services. With the former, the location of data is well understood while the latter is mostly operated in the Cloud, resulting in a rather complex legal situation. Beside legal issues, those two worlds have little overlap in user authentication and access protocols. While traditional storage technologies, popular in HEP, are based on X.509, cloud services and sync-and-share software technologies are generally based on username/password authentication or mechanisms like SAML or Open ID Connect. Similarly, data access models offered by both are somewhat different, with sync-and-share services often using proprietary protocols. As both approaches are very attractive, dCache.org developed a hybrid system, providing the best of both worlds. To avoid reinventing the wheel, dCache.org decided to embed another Open Source project: OwnCloud. This offers the required modern access capabilities but does not support the managed data functionality needed for large capacity data storage. With this hybrid system, scientists can share files and synchronize their data with laptops or mobile devices as easy as with any other cloud storage service. On top of this, the same data can be accessed via established mechanisms, like GridFTP to serve the Globus Transfer Service or the WLCG FTS3 tool, or the data can be made available to worker nodes or HPC applications via a mounted filesystem. As dCache provides a flexible authentication module, the same user can access its storage via different authentication mechanisms; e.g., X.509 and SAML. Additionally, users can specify the desired quality of service or trigger media transitions as necessary, thus tuning data access latency to the planned access profile. Such features are a natural consequence of using dCache. We will describe the design of the hybrid dCache/OwnCloud system, report on several months of operations experience running it at DESY, and elucidate the future road-map.

  16. Design to monitor trend in abundance and presence of American beaver (Castor canadensis) at the national forest scale.

    PubMed

    Beck, Jeffrey L; Dauwalter, Daniel C; Gerow, Kenneth G; Hayward, Gregory D

    2010-05-01

    Wildlife conservationists design monitoring programs to assess population dynamics, project future population states, and evaluate the impacts of management actions on populations. Because agency mandates and conservation laws call for monitoring data to elicit management responses, it is imperative to design programs that match the administrative scale for which management decisions are made. We describe a program to monitor population trends in American beaver (Castor canadensis) on the US Department of Agriculture, Black Hills National Forest (BHNF) in southwestern South Dakota and northeastern Wyoming, USA. Beaver have been designated as a management indicator species on the BHNF because of their association with riparian and aquatic habitats and its status as a keystone species. We designed our program to monitor the density of beaver food caches (abundance) within sampling units with beaver and the proportion of sampling units with beavers present at the scale of a national forest. We designated watersheds as sampling units in a stratified random sampling design that we developed based on habitat modeling results. Habitat modeling indicated that the most suitable beaver habitat was near perennial water, near aspen (Populus tremuloides) and willow (Salix spp.), and in low gradient streams at lower elevations. Results from the initial monitoring period in October 2007 allowed us to assess costs and logistical considerations, validate our habitat model, and conduct power analyses to assess whether our sampling design could detect the level of declines in beaver stated in the monitoring objectives. Beaver food caches were located in 20 of 52 sampled watersheds. Monitoring 20 to 25 watersheds with beaver should provide sufficient power to detect 15-40% declines in the beaver food cache index as well as a twofold decline in the odds of beaver being present in watersheds. Indices of abundance, such as the beaver food cache index, provide a practical measure of population status to conduct long-term monitoring across broad landscapes such as national forests.

  17. dCache on Steroids - Delegated Storage Solutions

    DOE PAGES

    Mkrtchyan, Tigran; Adeyemi, F.; Ashish, A.; ...

    2017-11-23

    For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH andmore » others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. As a result, we will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.« less

  18. dCache on Steroids - Delegated Storage Solutions

    NASA Astrophysics Data System (ADS)

    Mkrtchyan, T.; Adeyemi, F.; Ashish, A.; Behrmann, G.; Fuhrmann, P.; Litvintsev, D.; Millar, P.; Rossi, A.; Sahakyan, M.; Starek, J.

    2017-10-01

    For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH and others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. We will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.

  19. dCache on Steroids - Delegated Storage Solutions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mkrtchyan, Tigran; Adeyemi, F.; Ashish, A.

    For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH andmore » others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. As a result, we will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.« less

  20. Mapping virtual addresses to different physical addresses for value disambiguation for thread memory access requests

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gala, Alan; Ohmacht, Martin

    A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memorymore » access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.« less

  1. Efficient Cache use for Stencil Operations on Structured Discretization Grids

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; VanderWijngaart, Rob F.

    2001-01-01

    We derive tight bounds on the cache misses for evaluation of explicit stencil operators on structured grids. Our lower bound is based on the isoperimetrical property of the discrete octahedron. Our upper bound is based on a good surface to volume ratio of a parallelepiped spanned by a reduced basis of the interference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by a factor of three, relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattice have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on a MIPS R10000 processor we show a good correlation between abnormally high numbers of cache misses and unfavorable three-dimensional grids.

  2. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob

    2003-01-01

    The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.

  3. Dietary folate, vitamin B-12, vitamin B-6 and incident Alzheimer's disease: the cache county memory, health and aging study.

    PubMed

    Nelson, C; Wengreen, H J; Munger, R G; Corcoran, C D

    2009-12-01

    To examine associations between dietary and supplemental folate, vitamin B-12 and vitamin B-6 and incident Alzheimer's disease (AD) among elderly men and women. Data collected were from participants of the Cache County Memory, Health and Aging Study, a longitudinal study of 5092 men and women 65 years and older who were residents of Cache County, Utah in 1995. Multistage clinical assessment procedures were used to identify incident cases of AD. Dietary data were collected using a 142-item food frequency questionnaire. Cox Proportional Hazards (CPH) modeling was used to determine hazard ratios across quintiles of micronutrient intake. 202 participants were diagnosed with incident AD during follow-up (1995-2004). In multivariable CPH models that controlled for the effects of gender, age, education, and other covariates there were no observed differences in risk of AD or dementia by increasing quintiles of total intake of folate, vitamin B-12, or vitamin B-6. Similarly, there were no observed differences in risk of AD by regular use of either folate or B6 supplements. Dietary intake of B-vitamins from food and supplemental sources appears unrelated to incidence of dementia and AD. Further studies examining associations between dietary intakes of B-vitamins, biomarkers of B-vitamin status and cognitive endpoints are warranted.

  4. Cache directory lookup reader set encoding for partial cache line speculation support

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2014-10-21

    In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.

  5. Killing and caching of an adult White-tailed deer, Odocoileus virginianus, by a single Gray Wolf, Canis lupus

    USGS Publications Warehouse

    Nelson, Michael E.

    2011-01-01

    A single Gray Wolf (Canis lupus) killed an adult male White-tailed Deer (Odocoileus virginianus) and cached the intact carcass in 76 cm of snow. The carcass was revisited and entirely consumed between four and seven days later. This is the first recorded observation of a Gray Wolf caching an entire adult deer.

  6. Image matrix processor for fast multi-dimensional computations

    DOEpatents

    Roberson, George P.; Skeate, Michael F.

    1996-01-01

    An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.

  7. Joshua tree (Yucca brevifolia) seeds are dispersed by seed-caching rodents

    USGS Publications Warehouse

    Vander Wall, S.B.; Esque, T.; Haines, D.; Garnett, M.; Waitman, B.A.

    2006-01-01

    Joshua tree (Yucca brevifolia) is a distinctive and charismatic plant of the Mojave Desert. Although floral biology and seed production of Joshua tree and other yuccas are well understood, the fate of Joshua tree seeds has never been studied. We tested the hypothesis that Joshua tree seeds are dispersed by seed-caching rodents. We radioactively labelled Joshua tree seeds and followed their fates at five source plants in Potosi Wash, Clark County, Nevada, USA. Rodents made a mean of 30.6 caches, usually within 30 m of the base of source plants. Caches contained a mean of 5.2 seeds buried 3-30 nun deep. A variety of rodent species appears to have prepared the caches. Three of the 836 Joshua tree seeds (0.4%) cached germinated the following spring. Seed germination using rodent exclosures was nearly 15%. More than 82% of seeds in open plots were removed by granivores, and neither microsite nor supplemental water significantly affected germination. Joshua tree produces seeds in indehiscent pods or capsules, which rodents dismantle to harvest seeds. Because there is no other known means of seed dispersal, it is possible that the Joshua tree-rodent seed dispersal interaction is an obligate mutualism for the plant.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shoopman, J. D.

    This report documents Livermore Computing (LC) activities in support of ASC L2 milestone 5589: Modernization and Expansion of LLNL Archive Disk Cache, due March 31, 2016. The full text of the milestone is included in Attachment 1. The description of the milestone is: Description: Configuration of archival disk cache systems will be modernized to reduce fragmentation, and new, higher capacity disk subsystems will be deployed. This will enhance archival disk cache capability for ASC archive users, enabling files written to the archives to remain resident on disk for many (6–12) months, regardless of file size. The milestone was completed inmore » three phases. On August 26, 2015 subsystems with 6PB of disk cache were deployed for production use in LLNL’s unclassified HPSS environment. Following that, on September 23, 2015 subsystems with 9 PB of disk cache were deployed for production use in LLNL’s classified HPSS environment. On January 31, 2016, the milestone was fully satisfied when the legacy Data Direct Networks (DDN) archive disk cache subsystems were fully retired from production use in both LLNL’s unclassified and classified HPSS environments, and only the newly deployed systems were in use.« less

  9. Smart Collaborative Caching for Information-Centric IoT in Fog Computing.

    PubMed

    Song, Fei; Ai, Zheng-Yang; Li, Jun-Jie; Pau, Giovanni; Collotta, Mario; You, Ilsun; Zhang, Hong-Ke

    2017-11-01

    The significant changes enabled by the fog computing had demonstrated that Internet of Things (IoT) urgently needs more evolutional reforms. Limited by the inflexible design philosophy; the traditional structure of a network is hard to meet the latest demands. However, Information-Centric Networking (ICN) is a promising option to bridge and cover these enormous gaps. In this paper, a Smart Collaborative Caching (SCC) scheme is established by leveraging high-level ICN principles for IoT within fog computing paradigm. The proposed solution is supposed to be utilized in resource pooling, content storing, node locating and other related situations. By investigating the available characteristics of ICN, some challenges of such combination are reviewed in depth. The details of building SCC, including basic model and advanced algorithms, are presented based on theoretical analysis and simplified examples. The validation focuses on two typical scenarios: simple status inquiry and complex content sharing. The number of clusters, packet loss probability and other parameters are also considered. The analytical results demonstrate that the performance of our scheme, regarding total packet number and average transmission latency, can outperform that of the original ones. We expect that the SCC will contribute an efficient solution to the related studies.

  10. Smart Collaborative Caching for Information-Centric IoT in Fog Computing

    PubMed Central

    Song, Fei; Ai, Zheng-Yang; Li, Jun-Jie; Zhang, Hong-Ke

    2017-01-01

    The significant changes enabled by the fog computing had demonstrated that Internet of Things (IoT) urgently needs more evolutional reforms. Limited by the inflexible design philosophy; the traditional structure of a network is hard to meet the latest demands. However, Information-Centric Networking (ICN) is a promising option to bridge and cover these enormous gaps. In this paper, a Smart Collaborative Caching (SCC) scheme is established by leveraging high-level ICN principles for IoT within fog computing paradigm. The proposed solution is supposed to be utilized in resource pooling, content storing, node locating and other related situations. By investigating the available characteristics of ICN, some challenges of such combination are reviewed in depth. The details of building SCC, including basic model and advanced algorithms, are presented based on theoretical analysis and simplified examples. The validation focuses on two typical scenarios: simple status inquiry and complex content sharing. The number of clusters, packet loss probability and other parameters are also considered. The analytical results demonstrate that the performance of our scheme, regarding total packet number and average transmission latency, can outperform that of the original ones. We expect that the SCC will contribute an efficient solution to the related studies. PMID:29104219

  11. Vectorization, threading, and cache-blocking considerations for hydrocodes on emerging architectures

    DOE PAGES

    Fung, J.; Aulwes, R. T.; Bement, M. T.; ...

    2015-07-14

    This work reports on considerations for improving computational performance in preparation for current and expected changes to computer architecture. The algorithms studied will include increasingly complex prototypes for radiation hydrodynamics codes, such as gradient routines and diffusion matrix assembly (e.g., in [1-6]). The meshes considered for the algorithms are structured or unstructured meshes. The considerations applied for performance improvements are meant to be general in terms of architecture (not specifically graphical processing unit (GPUs) or multi-core machines, for example) and include techniques for vectorization, threading, tiling, and cache blocking. Out of a survey of optimization techniques on applications such asmore » diffusion and hydrodynamics, we make general recommendations with a view toward making these techniques conceptually accessible to the applications code developer. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.« less

  12. A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture

    NASA Astrophysics Data System (ADS)

    Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.

    2017-12-01

    A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.

  13. The impact of Moore's Law and loss of Dennard scaling: Are DSP SoCs an energy efficient alternative to x86 SoCs?

    NASA Astrophysics Data System (ADS)

    Johnsson, L.; Netzer, G.

    2016-10-01

    Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be “underutilized” or “dark” due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark)

  14. Urban Geocaching: what Happened in Lisbon during the Last Decade?

    NASA Astrophysics Data System (ADS)

    Nogueira Mendes, R.; Rodrigues, T.; Rodrigues, A. M.

    2013-05-01

    Created in 2000 in the United States of America, Geocaching has become a major phenomenon all around the world, counting actually with millions of Geocaches (or caches) that work as a recreational motivation for millions of users, called Geocachers. During the last 30 days over 5,000,000 new logs have been submitted worldwide, disseminating individual experiences, motivations, emotions and photos through the official Geocaching website (www.geocaching.com), and several official or informal national web forums. The activity itself can be compared with modern treasure hunting that uses handheld GPS, Smartphones or Tablets, WEB 2.0, wiki features and technologies to keep Geocachers engaged with their activity, in a strong social-network. All these characteristics make Geocaching an activity with a strong geographic component that deals closely with the surrounding environment where each cache has been hidden. From previous work, significance correlation has been found regarding hides and natural/rural environments, but metropolitan and urban areas like Lisbon municipality (that holds 3.23% of the total 27534 Portuguese caches), still registers the higher density of Geocaches, and logs numbers. Lacking "natural/rural" environment, Geocaching in cities tend to happen in symbolic areas, like public parks and places, sightseeing spots and historical neighborhoods. The present study looks to Geocaching within the city of Lisbon, in order to understand how it works, and if this activity reflects the city itself, promoting its image and cultural heritage. From a freely available dataset that includes all Geocaches that have been placed in Lisbon since February 2001, spatial analysis has been conducted, showing the informal preferences of this activity. Results show a non-random distribution of caches within the study area, similar to the land use distribution. Preferable locations tend to be in iconic places of the city, usually close to the Tagus River, that concentrates 25% of the total caches. Since most of these places are known to be touristic destinations, the TOP15 logged Caches were also analyzed regarding their description and logs in order to understand if Geocaching reflects tourism and if it works as a tourist promotion tool within urban environments. Final results also reflect the Geocaching performance and major trends within urban environments providing new insights regarding this activity impacts and implications.

  15. Authenticating cache

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Tyler Barratt; Urrea, Jorge Mario

    2012-06-01

    The aim of the Authenticating Cache architecture is to ensure that machine instructions in a Read Only Memory (ROM) are legitimate from the time the ROM image is signed (immediately after compilation) to the time they are placed in the cache for the processor to consume. The proposed architecture allows the detection of ROM image modifications during distribution or when it is loaded into memory. It also ensures that modified instructions will not execute in the processor-as the cache will not be loaded with a page that fails an integrity check. The authenticity of the instruction stream can also bemore » verified in this architecture. The combination of integrity and authenticity assurance greatly improves the security profile of a system.« less

  16. SciDAC-Data, A Project to Enabling Data Driven Modeling of Exascale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mubarak, M.; Ding, P.; Aliaga, L.

    The SciDAC-Data project is a DOE funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab Data Center on the organization, movement, and consumption of High Energy Physics data. The project will analyze the analysis patterns and data organization that have been used by the NOvA, MicroBooNE, MINERvA and other experiments, to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations aremore » designed to address questions of data handling, cache optimization and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership class exascale computing facilities. We will address the use of the SciDAC-Data distributions acquired from Fermilab Data Center’s analysis workflows and corresponding to around 71,000 HEP jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in HPC environments. In particular we describe in detail how the Sequential Access via Metadata (SAM) data handling system in combination with the dCache/Enstore based data archive facilities have been analyzed to develop the radically different models of the analysis of HEP data. We present how the simulation may be used to analyze the impact of design choices in archive facilities.« less

  17. Magpies can use local cues to retrieve their food caches.

    PubMed

    Feenders, Gesa; Smulders, Tom V

    2011-03-01

    Much importance has been placed on the use of spatial cues by food-hoarding birds in the retrieval of their caches. In this study, we investigate whether food-hoarding birds can be trained to use local cues ("beacons") in their cache retrieval. We test magpies (Pica pica) in an active hoarding-retrieval paradigm, where local cues are always reliable, while spatial cues are not. Our results show that the birds use the local cues to retrieve their caches, even when occasionally contradicting spatial information is available. The design of our study does not allow us to test rigorously whether the birds prefer using local over spatial cues, nor to investigate the process through which they learn to use local cues. We furthermore provide evidence that magpies develop landmark preferences, which improve their retrieval accuracy. Our findings support the hypothesis that birds are flexible in their use of memory information, using a combination of the most reliable or salient information to retrieve their caches. © Springer-Verlag 2010

  18. A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong

    Recent trends of CMOS scaling and increasing number of on-chip cores have led to a large increase in the size of on-chip caches. Since SRAM has low density and consumes large amount of leakage power, its use in designing on-chip caches has become more challenging. To address this issue, researchers are exploring the use of several emerging memory technologies, such as embedded DRAM, spin transfer torque RAM, resistive RAM, phase change RAM and domain wall memory. In this paper, we survey the architectural approaches proposed for designing memory systems and, specifically, caches with these emerging memory technologies. To highlight theirmore » similarities and differences, we present a classification of these technologies and architectural approaches based on their key characteristics. We also briefly summarize the challenges in using these technologies for architecting caches. We believe that this survey will help the readers gain insights into the emerging memory device technologies, and their potential use in designing future computing systems.« less

  19. Software Exploit Prevention and Remediation via Software Memory Protection

    DTIC Science & Technology

    2009-05-01

    trampolines that are necessary. Trampolines are pieces of code emitted into the fragment cache to transfer con- trol back to Strata. Most control...transfer instructions (CTIs) are initially linked to trampolines (unless the transfer target already exists in the fragment cache). Once a CTI’s target...instruction becomes available in the fragment cache, the CTI is linked directly to the destination, avoiding future uses of the trampoline . This

  20. Image matrix processor for fast multi-dimensional computations

    DOEpatents

    Roberson, G.P.; Skeate, M.F.

    1996-10-15

    An apparatus for multi-dimensional computation is disclosed which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination. 10 figs.

  1. A Novel Two-Tier Cooperative Caching Mechanism for the Optimization of Multi-Attribute Periodic Queries in Wireless Sensor Networks

    PubMed Central

    Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung

    2015-01-01

    Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665

  2. Analysis of power gating in different hierarchical levels of 2MB cache, considering variation

    NASA Astrophysics Data System (ADS)

    Jafari, Mohsen; Imani, Mohsen; Fathipour, Morteza

    2015-09-01

    This article reintroduces power gating technique in different hierarchical levels of static random-access memory (SRAM) design including cell, row, bank and entire cache memory in 16 nm Fin field effect transistor. Different structures of SRAM cells such as 6T, 8T, 9T and 10T are used in design of 2MB cache memory. The power reduction of the entire cache memory employing cell-level optimisation is 99.7% with the expense of area and other stability overheads. The power saving of the cell-level optimisation is 3× (1.2×) higher than power gating in cache (bank) level due to its superior selectivity. The access delay times are allowed to increase by 4% in the same energy delay product to achieve the best power reduction for each supply voltages and optimisation levels. The results show the row-level power gating is the best for optimising the power of the entire cache with lowest drawbacks. Comparisons of cells show that the cells whose bodies have higher power consumption are the best candidates for power gating technique in row-level optimisation. The technique has the lowest percentage of saving in minimum energy point (MEP) of the design. The power gating also improves the variation of power in all structures by at least 70%.

  3. Respiratory hospital admissions associated with PM10 pollution in Utah, Salt Lake, and Cache Valleys

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pope CA, I.I.I.

    This study assessed the association between respiratory hospital admissions and PM10 pollution in Utah, Salt Lake, and Cache valleys during April 1985 through March 1989. Utah and Salt Lake valleys had high levels of PM10 pollution that violated both the annual and 24-h standards issued by the Environmental Protection Agency (EPA). Much lower PM10 levels occurred in the Cache Valley. Utah Valley experienced the intermittent operation of its primary source of PM10 pollution: an integrated steel mill. Bronchitis and asthma admissions for preschool-age children were approximately twice as frequent in Utah Valley when the steel mill was operating versus whenmore » it was not. Similar differences were not observed in Salt Lake or Cache valleys. Even though Cache Valley had higher smoking rates and lower temperatures in winter than did Utah Valley, per capita bronchitis and asthma admissions for all ages were approximately twice as high in Utah Valley. During the period when the steel mill was closed, differences in per capita admissions between Utah and Cache valleys narrowed considerably. Regression analysis also demonstrated a statistical association between respiratory hospital admissions and PM10 pollution. The results suggest that PM10 pollution plays a role in the incidence and severity of respiratory disease.« less

  4. Ecosystem services from keystone species: diversionary seeding and seed-caching desert rodents can enhance Indian ricegrass seedling establishment

    USGS Publications Warehouse

    Longland, William; Ostoja, Steven M.

    2013-01-01

    Seeds of Indian ricegrass (Achnatherum hymenoides), a native bunchgrass common to sandy soils on arid western rangelands, are naturally dispersed by seed-caching rodent species, particularly Dipodomys spp. (kangaroo rats). These animals cache large quantities of seeds when mature seeds are available on or beneath plants and recover most of their caches for consumption during the remainder of the year. Unrecovered seeds in caches account for the vast majority of Indian ricegrass seedling recruitment. We applied three different densities of white millet (Panicum miliaceum) seeds as “diversionary foods” to plots at three Great Basin study sites in an attempt to reduce rodents' over-winter cache recovery so that more Indian ricegrass seeds would remain in soil seedbanks and potentially establish new seedlings. One year after diversionary seed application, a moderate level of Indian ricegrass seedling recruitment occurred at two of our study sites in western Nevada, although there was no recruitment at the third site in eastern California. At both Nevada sites, the number of Indian ricegrass seedlings sampled along transects was significantly greater on all plots treated with diversionary seeds than on non-seeded control plots. However, the density of diversionary seeds applied to plots had a marginally non-significant effect on seedling recruitment, and it was not correlated with recruitment patterns among plots. Results suggest that application of a diversionary seed type that is preferred by seed-caching rodents provides a promising passive restoration strategy for target plant species that are dispersed by these rodents.

  5. Cognitive caching promotes flexibility in task switching: evidence from event-related potentials.

    PubMed

    Lange, Florian; Seer, Caroline; Müller, Dorothea; Kopp, Bruno

    2015-12-08

    Time-consuming processes of task-set reconfiguration have been shown to contribute to the costs of switching between cognitive tasks. We describe and probe a novel mechanism serving to reduce the costs of task-set reconfiguration. We propose that when individuals are uncertain about the currently valid task, one task set is activated for execution while other task sets are maintained at a pre-active state in cognitive cache. We tested this idea by assessing an event-related potential (ERP) index of task-set reconfiguration in a three-rule task-switching paradigm involving varying degrees of task uncertainty. In high-uncertainty conditions, two viable tasks were equally likely to be correct whereas in low-uncertainty conditions, one task was more likely than the other. ERP and performance measures indicated substantial costs of task-set reconfiguration when participants were required to switch away from a task that had been likely to be correct. In contrast, task-set-reconfiguration costs were markedly reduced when the previous task set was chosen under high task uncertainty. These results suggest that cognitive caching of alternative task sets adds to human cognitive flexibility under high task uncertainty.

  6. Optimizing transformations of stencil operations for parallel cache-based architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bassetti, F.; Davis, K.

    This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less

  7. On the evolutionary and ontogenetic origins of tool-oriented behaviour in New Caledonian crows (Corvus moneduloides)

    PubMed Central

    KENWARD, BEN; SCHLOEGL, CHRISTIAN; RUTZ, CHRISTIAN; WEIR, ALEXANDER A. S.; BUGNYAR, THOMAS; KACELNIK, ALEX

    2015-01-01

    New Caledonian crows (Corvus moneduloides) are prolific tool users in captivity and in the wild, and have an inherited predisposition to express tool-oriented behaviours. To further understand the evolution and development of tool use, we compared the development of object manipulation in New Caledonian crows and common ravens (Corvus corax), which do not routinely use tools. We found striking qualitative similarities in the ontogeny of tool-oriented behaviour in New Caledonian crows and food-caching behaviour in ravens. Given that the common ancestor of New Caledonian crows and ravens was almost certainly a caching species, we therefore propose that the basic action patterns for tool use in New Caledonian crows may have their evolutionary origins in caching behaviour. Noncombinatorial object manipulations had similar frequencies in the two species. However, frequencies of object combinations that are precursors to functional behaviour increased in New Caledonian crows and decreased in ravens throughout the study period, ending 6 weeks post-fledging. These quantitative observations are consistent with the hypothesis that New Caledonian crows develop tool-oriented behaviour because of an increased motivation to perform object combinations that facilitate the necessary learning. PMID:25892825

  8. An Intelligent Cloud Storage Gateway for Medical Imaging.

    PubMed

    Viana-Ferreira, Carlos; Guerra, António; Silva, João F; Matos, Sérgio; Costa, Carlos

    2017-09-01

    Historically, medical imaging repositories have been supported by indoor infrastructures. However, the amount of diagnostic imaging procedures has continuously increased over the last decades, imposing several challenges associated with the storage volume, data redundancy and availability. Cloud platforms are focused on delivering hardware and software services over the Internet, becoming an appealing solution for repository outsourcing. Although this option may bring financial and technological benefits, it also presents new challenges. In medical imaging scenarios, communication latency is a critical issue that still hinders the adoption of this paradigm. This paper proposes an intelligent Cloud storage gateway that optimizes data access times. This is achieved through a new cache architecture that combines static rules and pattern recognition for eviction and prefetching. The evaluation results, obtained from experiments over a real-world dataset, show that cache hit ratios can reach around 80%, leading to reductions of image retrieval times by over 60%. The combined use of eviction and prefetching policies proposed can significantly reduce communication latency, even when using a small cache in comparison to the total size of the repository. Apart from the performance gains, the proposed system is capable of adjusting to specific workflows of different institutions.

  9. WriteShield: A Pseudo Thin Client for Prevention of Information Leakage

    NASA Astrophysics Data System (ADS)

    Kirihata, Yasuhiro; Sameshima, Yoshiki; Onoyama, Takashi; Komoda, Norihisa

    While thin-client systems are diffusing as an effective security method in enterprises and organizations, there is a new approach called pseudo thin-client system. In this system, local disks of clients are write-protected and user data is forced to save on the central file server to realize the same security effect of conventional thin-client systems. Since it takes purely the software-based simple approach, it does not require the hardware enhancement of network and servers to reduce the installation cost. However there are several problems such as no write control to external media, memory depletion possibility, and lower security because of the exceptional write permission to the system processes. In this paper, we propose WriteShield, a pseudo thin-client system which solves these issues. In this system, the local disks are write-protected with volume filter driver and it has a virtual cache mechanism to extend the memory cache size for the write protection. This paper presents design and implementation details of WriteShield. Besides we describe the security analysis and simulation evaluation of paging algorithms for virtual cache mechanism and measure the disk I/O performance to verify its feasibility in the actual environment.

  10. Cognitive caching promotes flexibility in task switching: evidence from event-related potentials

    PubMed Central

    Lange, Florian; Seer, Caroline; Müller, Dorothea; Kopp, Bruno

    2015-01-01

    Time-consuming processes of task-set reconfiguration have been shown to contribute to the costs of switching between cognitive tasks. We describe and probe a novel mechanism serving to reduce the costs of task-set reconfiguration. We propose that when individuals are uncertain about the currently valid task, one task set is activated for execution while other task sets are maintained at a pre-active state in cognitive cache. We tested this idea by assessing an event-related potential (ERP) index of task-set reconfiguration in a three-rule task-switching paradigm involving varying degrees of task uncertainty. In high-uncertainty conditions, two viable tasks were equally likely to be correct whereas in low-uncertainty conditions, one task was more likely than the other. ERP and performance measures indicated substantial costs of task-set reconfiguration when participants were required to switch away from a task that had been likely to be correct. In contrast, task-set-reconfiguration costs were markedly reduced when the previous task set was chosen under high task uncertainty. These results suggest that cognitive caching of alternative task sets adds to human cognitive flexibility under high task uncertainty. PMID:26643146

  11. Interacting Cache memories: evidence for flexible memory use by Western Scrub-Jays (Aphelocoma californica).

    PubMed

    Clayton, Nicola S; Yu, Kara Shirley; Dickinson, Anthony

    2003-01-01

    When Western Scrub-Jays (Aphelocoma californica) cached and recovered perishable crickets, N. S. Clayton, K. S. Yu, and A. Dickinson (2001) reported that the jays rapidly learned to search for fresh crickets after a 1-day retention interval (RI) between caching and recovery but to avoid searching for perished crickets after a 4-day RI. In the present experiments, the jays generalized their search preference for crickets to intermediate RIs and used novel information about the rate of decay of crickets presented during the RI to reverse these search preferences at recovery. The authors interpret this reversal as evidence that the birds can integrate information about the caching episode with new information presented during the RI.

  12. DIETARY FOLATE, VITAMIN B-12, VITAMIN B-6 AND INCIDENT ALZHEIMER’S DISEASE: THE CACHE COUNTY MEMORY, HEALTH, AND AGING STUDY

    PubMed Central

    NELSON, C.; WENGREEN, H.J.; MUNGER, R.G.; CORCORAN, C.D.

    2013-01-01

    Objective To examine associations between dietary and supplemental folate, vitamin B-12 and vitamin B-6 and incident Alzheimer’s disease (AD) among elderly men and women. Design, Setting and Participants Data collected were from participants of the Cache County Memory, Health and Aging Study, a longitudinal study of 5092 men and women 65 years and older who were residents of Cache County, Utah in 1995. Measurements Multistage clinical assessment procedures were used to identify incident cases of AD. Dietary data were collected using a 142-item food frequency questionnaire. Cox Proportional Hazards (CPH) modeling was used to determine hazard ratios across quintiles of micronutrient intake. Results 202 participants were diagnosed with incident AD during follow-up (1995–2004). In multivariable CPH models that controlled for the effects of gender, age, education, and other covariates there were no observed differences in risk of AD or dementia by increasing quintiles of total intake of folate, vitamin B-12, or vitamin B-6. Similarly, there were no observed differences in risk of AD by regular use of either folate or B6 supplements. Conclusion Dietary intake of B-vitamins from food and supplemental sources appears unrelated to incidence of dementia and AD. Further studies examining associations between dietary intakes of B-vitamins, biomarkers of B-vitamin status and cognitive endpoints are warranted. PMID:19924351

  13. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sczyrba, Alex; Pratap, Abhishek; Canon, Shane

    2011-03-22

    Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less

  14. Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity

    NASA Astrophysics Data System (ADS)

    Toledo-Redondo, Sergio; Salinas, Alfonso; Morente-Molinera, Juan Antonio; Méndez, Antonio; Fornieles, Jesús; Portí, Jorge; Morente, Juan Antonio

    2013-03-01

    A parallel 3D algorithm for solving time-domain electromagnetic problems with arbitrary geometries is presented. The technique employed is the Transmission Line Modeling (TLM) method implemented in Shared Memory (SM) environments. The benchmarking performed reveals that the maximum speedup depends on the memory size of the problem as well as multiple hardware factors, like the disposition of CPUs, cache, or memory. A maximum speedup of 15 has been measured for the largest problem. In certain circumstances of low memory requirements, superlinear speedup is achieved using our algorithm. The model is employed to model the Earth-ionosphere cavity, thus enabling a study of the natural electromagnetic phenomena that occur in it. The algorithm allows complete 3D simulations of the cavity with a resolution of 10 km, within a reasonable timescale.

  15. Cache Sharing and Isolation Tradeoffs in Multicore Mixed-Criticality Systems

    DTIC Science & Technology

    2015-05-01

    of lockdown registers, to provide way-based partitioning. These alternatives are illustrated in Fig. 1 with respect to a quad-core ARM Cortex A9...presented a cache-partitioning scheme that allows multiple tasks to share the same cache partition on a single processor (as we do for Level-A and...sets and determined the fraction that were schedulable on our target hardware platform, the quad-core ARM Cortex A9 machine mentioned earlier, the LLC

  16. Software-Controlled Caches in the VMP Multiprocessor

    DTIC Science & Technology

    1986-03-01

    programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig

  17. Turbidity and Total Suspended Solids on the Lower Cache River Watershed, AR.

    PubMed

    Rosado-Berrios, Carlos A; Bouldin, Jennifer L

    2016-06-01

    The Cache River Watershed (CRW) in Arkansas is part of one of the largest remaining bottomland hardwood forests in the US. Although wetlands are known to improve water quality, the Cache River is listed as impaired due to sedimentation and turbidity. This study measured turbidity and total suspended solids (TSS) in seven sites of the lower CRW; six sites were located on the Bayou DeView tributary of the Cache River. Turbidity and TSS levels ranged from 1.21 to 896 NTU, and 0.17 to 386.33 mg/L respectively and had an increasing trend over the 3-year study. However, a decreasing trend from upstream to downstream in the Bayou DeView tributary was noted. Sediment loading calculated from high precipitation events and mean TSS values indicate that contributions from the Cache River main channel was approximately 6.6 times greater than contributions from Bayou DeView. Land use surrounding this river channel affects water quality as wetlands provide a filter for sediments in the Bayou DeView channel.

  18. PEM public key certificate cache server

    NASA Astrophysics Data System (ADS)

    Cheung, T.

    1993-12-01

    Privacy Enhanced Mail (PEM) provides privacy enhancement services to users of Internet electronic mail. Confidentiality, authentication, message integrity, and non-repudiation of origin are provided by applying cryptographic measures to messages transferred between end systems by the Message Transfer System. PEM supports both symmetric and asymmetric key distribution. However, the prevalent implementation uses a public key certificate-based strategy, modeled after the X.509 directory authentication framework. This scheme provides an infrastructure compatible with X.509. According to RFC 1422, public key certificates can be stored in directory servers, transmitted via non-secure message exchanges, or distributed via other means. Directory services provide a specialized distributed database for OSI applications. The directory contains information about objects and then provides structured mechanisms for accessing that information. Since directory services are not widely available now, a good approach is to manage certificates in a centralized certificate server. This document describes the detailed design of a centralized certificate cache serve. This server manages a cache of certificates and a cache of Certificate Revocation Lists (CRL's) for PEM applications. PEMapplications contact the server to obtain/store certificates and CRL's. The server software is programmed in C and ELROS. To use this server, ISODE has to be configured and installed properly. The ISODE library 'libisode.a' has to be linked together with this library because ELROS uses the transport layer functions provided by 'libisode.a.' The X.500 DAP library that is included with the ELROS distribution has to be linked in also, since the server uses the DAP library functions to communicate with directory servers.

  19. Simplifying and speeding the management of intra-node cache coherence

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

    2012-04-17

    A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

  20. Using dCache in Archiving Systems oriented to Earth Observation

    NASA Astrophysics Data System (ADS)

    Garcia Gil, I.; Perez Moreno, R.; Perez Navarro, O.; Platania, V.; Ozerov, D.; Leone, R.

    2012-04-01

    The object of LAST activity (Long term data Archive Study on new Technologies) is to perform an independent study on best practices and assessment of different archiving technologies mature for operation in the short and mid-term time frame, or available in the long-term with emphasis on technologies better suited to satisfy the requirements of ESA, LTDP and other European and Canadian EO partners in terms of digital information preservation and data accessibility and exploitation. During the last phase of the project, a testing of several archiving solutions has been performed in order to evaluate their suitability. In particular, dCache, aimed to provide a file system tree view of the data repository exchanging this data with backend (tertiary) Storage Systems as well as space management, pool attraction, dataset replication, hot spot determination and recovery from disk or node failures. Connected to a tertiary storage system, dCache simulates unlimited direct access storage space. Data exchanges to and from the underlying HSM are performed automatically and invisibly to the user Dcache was created to solve the requirements of big computer centers and universities with big amounts of data, putting their efforts together and founding EMI (European Middleware Initiative). At the moment being, Dcache is mature enough to be implemented, being used by several research centers of relevance (e.g. LHC storing up to 50TB/day). This solution has been not used so far in Earth Observation and the results of the study are summarized in this article, focusing on the capacities over a simulated environment to get in line with the ESA requirements for a geographically distributed storage. The challenge of a geographically distributed storage system can be summarized as the way to provide a maximum quality for storage and dissemination services with the minimum cost.

  1. dCache: Big Data storage for HEP communities and beyond

    NASA Astrophysics Data System (ADS)

    Millar, A. P.; Behrmann, G.; Bernardt, C.; Fuhrmann, P.; Litvintsev, D.; Mkrtchyan, T.; Petersen, A.; Rossi, A.; Schwank, K.

    2014-06-01

    With over ten years in production use dCache data storage system has evolved to match ever changing lansdcape of continually evolving storage technologies with new solutions to both existing problems and new challenges. In this paper, we present three areas of innovation in dCache: providing efficient access to data with NFS v4.1 pNFS, adoption of CDMI and WebDAV as an alternative to SRM for managing data, and integration with alternative authentication mechanisms.

  2. Wolves, Canis lupus, carry and cache the collars of radio-collared White-tailed Deer, Odocoileus virginianus, they killed

    USGS Publications Warehouse

    Nelson, Michael E.; Mech, L. David

    2011-01-01

    Wolves (Canis lupus) in northeastern Minnesota cached six radio-collars (four in winter, two in spring-summer) of 202 radio-collared White-tailed Deer (Odocoileus virginianus) they killed or consumed from 1975 to 2010. A Wolf bedded on top of one collar cached in snow. We found one collar each at a Wolf den and Wolf rendezvous site, 2.5 km and 0.5 km respectively, from each deer's previous locations.

  3. Landscape pattern of seed banks and anthropogenic impacts in forested wetlands of the northern Mississippi River Alluvial Valley

    USGS Publications Warehouse

    Middleton, B.; Wu, X.B.

    2008-01-01

    Agricultural development on floodplains contributes to hydrologic alteration and forest fragmentation, which may alter landscape-level processes. These changes may be related to shifts in the seed bank composition of floodplain wetlands. We examined the patterns of seed bank composition across a floodplain watershed by looking at the number of seeds germinating per m2 by species in 60 farmed and intact forested wetlands along the Cache River watershed in Illinois. The seed bank composition was compared above and below a water diversion (position), which artificially subdivides the watershed. Position of these wetlands represented the most variability of Axis I in a Nonmetric Multidimensional Scaling (NMS) analysis of site environmental variables and their relationship to seed bank composition (coefficient of determination for Axis 1: r2 = 0.376; Pearson correlation of position to Axis 1: r = 0.223). The 3 primary axes were also represented by other site environmental variables, including farming status (farmed or unfarmed), distance from the mouth of the river, latitude, and longitude. Spatial analysis based on Mantel correlograms showed that both water-dispersed and wind/water-dispersed seed assemblages had strong spatial structure in the upper Cache (above the water diversion), bur the spatial structure of water-dispersed seed assemblage was diminished in the lower Cache (below the water diversion), which lost floodpulsing. Bearing analysis also Suggested that water-dispersal process had a stronger influence on the overall spatial pattern of seed assemblage in the upper Cache, while wind/water-dispersal process had a stronger influence in the lower Cache. An analysis of the landscapes along the river showed that the mid-lower Cache (below the water diversion) had undergone greater land cover changes associated with agriculture than did the upper Cache watershed. Thus, the combination of forest fragmentation and hydrologic changes in the surrounding landscape may have had an influence on the seed bank composition and spatial distribution of the seed banks of the Cache River watershed. Our study suggests that the spatial pattern of seed bank composition may be influenced by landscape-level factors and processes.

  4. The Caregiver Contribution to Heart Failure Self-Care (CACHS): Further Psychometric Testing of a Novel Instrument.

    PubMed

    Buck, Harleah G; Harkness, Karen; Ali, Muhammad Usman; Carroll, Sandra L; Kryworuchko, Jennifer; McGillion, Michael

    2017-04-01

    Caregivers (CGs) contribute important assistance with heart failure (HF) self-care, including daily maintenance, symptom monitoring, and management. Until CGs' contributions to self-care can be quantified, it is impossible to characterize it, account for its impact on patient outcomes, or perform meaningful cost analyses. The purpose of this study was to conduct psychometric testing and item reduction on the recently developed 34-item Caregiver Contribution to Heart Failure Self-care (CACHS) instrument using classical and item response theory methods. Fifty CGs (mean age 63 years ±12.84; 70% female) recruited from a HF clinic completed the CACHS in 2014 and results evaluated using classical test theory and item response theory. Items would be deleted for low (<.05) or high (>.95) endorsement, low (<.3) or high (>.7) corrected item-total correlations, significant pairwise correlation coefficients, floor or ceiling effects, relatively low latent trait and item information function levels (<1.5 and p > .5), and differential item functioning. After analysis, 14 items were excluded, resulting in a 20-item instrument (self-care maintenance eight items; monitoring seven items; and management five items). Most items demonstrated moderate to high discrimination (median 2.13, minimum .77, maximum 5.05), and appropriate item difficulty (-2.7 to 1.4). Internal consistency reliability was excellent (Cronbach α = .94, average inter-item correlation = .41) with no ceiling effects. The newly developed 20-item version of the CACHS is supported by rigorous instrument development and represents a novel instrument to measure CGs' contribution to HF self-care. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. Distributed Saturation

    NASA Technical Reports Server (NTRS)

    Chung, Ming-Ying; Ciardo, Gianfranco; Siminiceanu, Radu I.

    2007-01-01

    The Saturation algorithm for symbolic state-space generation, has been a recent break-through in the exhaustive veri cation of complex systems, in particular globally-asyn- chronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and the fastest symbolic exploration algo- rithm to date. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during the highly irregular exploration. A crucial factor in limiting the memory consumption during the symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using the symbolic approach. In this technical report, we develop a garbage collection scheme and several operation cache policies to help on solving extremely complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTNow, in terms of time and memory efficiency.

  6. Windowed multipole for cross section Doppler broadening

    NASA Astrophysics Data System (ADS)

    Josey, C.; Ducru, P.; Forget, B.; Smith, K.

    2016-02-01

    This paper presents an in-depth analysis on the accuracy and performance of the windowed multipole Doppler broadening method. The basic theory behind cross section data is described, along with the basic multipole formalism followed by the approximations leading to windowed multipole method and the algorithm used to efficiently evaluate Doppler broadened cross sections. The method is tested by simulating the BEAVRS benchmark with a windowed multipole library composed of 70 nuclides. Accuracy of the method is demonstrated on a single assembly case where total neutron production rates and 238U capture rates compare within 0.1% to ACE format files at the same temperature. With regards to performance, clock cycle counts and cache misses were measured for single temperature ACE table lookup and for windowed multipole. The windowed multipole method was found to require 39.6% more clock cycles to evaluate, translating to a 7.9% performance loss overall. However, the algorithm has significantly better last-level cache performance, with 3 fewer misses per evaluation, or a 65% reduction in last-level misses. This is due to the small memory footprint of the windowed multipole method and better memory access pattern of the algorithm.

  7. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGES

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  8. Dispersal Mutualism Incorporated into Large-Scale, Infrequent Disturbances

    PubMed Central

    Parker, V. Thomas

    2015-01-01

    Because of their influence on succession and other community interactions, large-scale, infrequent natural disturbances also should play a major role in mutualistic interactions. Using field data and experiments, I test whether mutualisms have been incorporated into large-scale wildfire by whether the outcomes of a mutualism depend on disturbance. In this study a seed dispersal mutualism is shown to depend on infrequent, large-scale disturbances. A dominant shrubland plant (Arctostaphylos species) produces seeds that make up a persistent soil seed bank and requires fire to germinate. In post-fire stands, I show that seedlings emerging from rodent caches dominate sites experiencing higher fire intensity. Field experiments show that rodents (Perimyscus californicus, P. boylii) do cache Arctostaphylos fruit and bury most seed caches to a sufficient depth to survive a killing heat pulse that a fire might drive into the soil. While the rodent dispersal and caching behavior itself has not changed compared to other habitats, the environmental transformation caused by wildfire converts the caching burial of seed from a dispersal process to a plant fire adaptive trait, and provides the context for stimulating subsequent life history evolution in the plant host. PMID:26151560

  9. Hydrology of Cache Valley, Cache County, Utah, and adjacent part of Idaho, with emphasis on simulation of ground-water flow

    USGS Publications Warehouse

    Kariya, Kim A.; Roark, D. Michael; Hanson, Karen M.

    1994-01-01

    A hydrologic investigation of Cache Valley was done to better understand the ground-water system in unconsolidated basin-fill deposits and the interaction between ground water and surface water. Ground-water recharge occurs by infiltration of precipitation and unconsumed irrigation water, seepage from canals and streams, and subsurface inflow from adjacent consolidated rock and adjacent unconsolidated basin-fill deposit ground-water systems. Ground-water discharge occurs as seepage to streams and reservoirs, spring discharge, evapotranspiration, and withdrawal from wells.Water levels declined during 1984-90. Less-than-average precipitation during 1987-90 and increased pumping from irrigation and public-supply wells contributed to the declines.A ground-water-flow model was used to simulate flow in the unconsolidated basin-fill deposits. Data primarily from 1969 were used to calibrate the model to steady-state conditions. Transient-state calibration was done by simulating ground-water conditions on a yearly basis for 1982-90.A hypothetical simulation in which the dry conditions of 1990 were continued for 5 years projected an average lO-foot water-level decline between Richmond and Hyrum. When increased pumpage was simulated by adding three well fields, each pumping 10 cubic feet per second, in the Logan, Smithfield, and College Ward areas, water-level declines greater than 10 feet were projected in most of the southeastern part of the valley and discharge from springs and seepage to streams and reservoirs decreased.

  10. Reader set encoding for directory of shared cache memory in multiprocessor system

    DOEpatents

    Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

    2014-06-10

    In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.

  11. Design Trade-off Between Performance and Fault-Tolerance of Space Onboard Computers

    NASA Astrophysics Data System (ADS)

    Gorbunov, M. S.; Antonov, A. A.

    2017-01-01

    It is well known that there is a trade-off between performance and power consumption in onboard computers. The fault-tolerance is another important factor affecting performance, chip area and power consumption. Involving special SRAM cells and error-correcting codes is often too expensive with relation to the performance needed. We discuss the possibility of finding the optimal solutions for modern onboard computer for scientific apparatus focusing on multi-level cache memory design.

  12. Memory and the hippocampus in food-storing birds: a comparative approach.

    PubMed

    Clayton, N S

    1998-01-01

    Comparative studies provide a unique source of evidence for the role of the hippocampus in learning and memory. Within birds and mammals, the hippocampal volume of scatter-hoarding species that cache food in many different locations is enlarged, relative to the remainder of the telencephalon, when compared with than that of species which cache food in one larder, or do not cache at all. Do food-storing species show enhanced memory function in association with the volumetric enlargement of the hippocampus? Comparative studies within the parids (titmice and chickadees) and corvids (jays, nutcrackers and magpies), two families of birds which show natural variation in food-storing behavior, suggest that there may be two kinds of memory specialization associated with scatter-hoarding. First, in terms of spatial memory, several scatter-hoarding species have a more accurate and enduring spatial memory, and a preference to rely more heavily upon spatial cues, than that of closely related species which store less food, or none at all. Second, some scatter-hoarding parids and corvids are also more resistant to memory interference. While the most critical component about a cache site may be its spatial location, there is mounting evidence that food-storing birds remember additional information about the contents and status of cache sites. What is the underlying neural mechanism by which the hippocampus learns and remembers cache sites? The current mammalian dogma is that the neural mechanisms of learning and memory are achieved primarily by variations in synaptic number and efficacy. Recent work on the concomitant development of food-storing, memory and the avian hippocampus illustrates that the avian hippocampus may swell or shrivel by as much as 30% in response to presence or absence of food-storing experience. Memory for food caches triggers a dramatic increase in the total number of number of neurons within the avian hippocampus by altering the rate at which these cells are born and die.

  13. Cross-layer model design in wireless ad hoc networks for the Internet of Things.

    PubMed

    Yang, Xin; Wang, Ling; Xie, Jian; Zhang, Zhaolin

    2018-01-01

    Wireless ad hoc networks can experience extreme fluctuations in transmission traffic in the Internet of Things, which is widely used today. Currently, the most crucial issues requiring attention for wireless ad hoc networks are making the best use of low traffic periods, reducing congestion during high traffic periods, and improving transmission performance. To solve these problems, the present paper proposes a novel cross-layer transmission model based on decentralized coded caching in the physical layer and a content division multiplexing scheme in the media access control layer. Simulation results demonstrate that the proposed model effectively addresses these issues by substantially increasing the throughput and successful transmission rate compared to existing protocols without a negative influence on delay, particularly for large scale networks under conditions of highly contrasting high and low traffic periods.

  14. Cross-layer model design in wireless ad hoc networks for the Internet of Things

    PubMed Central

    Wang, Ling; Xie, Jian; Zhang, Zhaolin

    2018-01-01

    Wireless ad hoc networks can experience extreme fluctuations in transmission traffic in the Internet of Things, which is widely used today. Currently, the most crucial issues requiring attention for wireless ad hoc networks are making the best use of low traffic periods, reducing congestion during high traffic periods, and improving transmission performance. To solve these problems, the present paper proposes a novel cross-layer transmission model based on decentralized coded caching in the physical layer and a content division multiplexing scheme in the media access control layer. Simulation results demonstrate that the proposed model effectively addresses these issues by substantially increasing the throughput and successful transmission rate compared to existing protocols without a negative influence on delay, particularly for large scale networks under conditions of highly contrasting high and low traffic periods. PMID:29734355

  15. NIC atomic operation unit with caching and bandwidth mitigation

    DOEpatents

    Hemmert, Karl Scott; Underwood, Keith D.; Levenhagen, Michael J.

    2016-03-01

    A network interface controller atomic operation unit and a network interface control method comprising, in an atomic operation unit of a network interface controller, using a write-through cache and employing a rate-limiting functional unit.

  16. Soccer science and the Bayes community: exploring the cognitive implications of modern scientific communication.

    PubMed

    Shrager, Jeff; Billman, Dorrit; Convertino, Gregorio; Massar, J P; Pirolli, Peter

    2010-01-01

    Science is a form of distributed analysis involving both individual work that produces new knowledge and collaborative work to exchange information with the larger community. There are many particular ways in which individual and community can interact in science, and it is difficult to assess how efficient these are, and what the best way might be to support them. This paper reports on a series of experiments in this area and a prototype implementation using a research platform called CACHE. CACHE both supports experimentation with different structures of interaction between individual and community cognition and serves as a prototype for computational support for those structures. We particularly focus on CACHE-BC, the Bayes community version of CACHE, within which the community can break up analytical tasks into "mind-sized" units and use provenance tracking to keep track of the relationship between these units. Copyright © 2009 Cognitive Science Society, Inc.

  17. Efficient image data distribution and management with application to web caching architectures

    NASA Astrophysics Data System (ADS)

    Han, Keesook J.; Suter, Bruce W.

    2003-03-01

    We present compact image data structures and associated packet delivery techniques for effective Web caching architectures. Presently, images on a web page are inefficiently stored, using a single image per file. Our approach is to use clustering to merge similar images into a single file in order to exploit the redundancy between images. Our studies indicate that a 30-50% image data size reduction can be achieved by eliminating the redundancies of color indexes. Attached to this file is new metadata to permit an easy extraction of images. This approach will permit a more efficient use of the cache, since a shorter list of cache references will be required. Packet and transmission delays can be reduced by 50% eliminating redundant TCP/IP headers and connection time. Thus, this innovative paradigm for the elimination of redundancy may provide valuable benefits for optimizing packet delivery in IP networks by reducing latency and minimizing the bandwidth requirements.

  18. Adaptive Caching Concept

    NASA Image and Video Library

    2015-06-10

    This diagram, superimposed on a photo of Martian landscape, illustrates a concept called "adaptive caching," which is in development for NASA's 2020 Mars rover mission. In addition to the investigations that the Mars 2020 rover will conduct on Mars, the rover will collect carefully selected samples of Mars rock and soil and cache them to be available for possible return to Earth if a Mars sample-return mission is scheduled and flown. Each sample will be stored in a sealed tube. Adaptive caching would result in a set of samples, up to the maximum number of tubes carried on the rover, being placed on the surface at the discretion of the mission operators. The tubes holding the collected samples would not go into a surrounding container. In this illustration, green dots indicate "regions of interest," where samples might be collected. The green diamond indicates one region of interest serving as the depot for the cache. The green X at upper right represents the landing site. The solid black line indicates the rover's route during its prime mission, and the dashed black line indicates its route during an extension of the mission. The base image is a portion of the "Everest Panorama" taken by the panoramic camera on NASA's Mars Exploration Rover Spirit at the top of Husband Hill in 2005. http://photojournal.jpl.nasa.gov/catalog/PIA19150

  19. Use of the sun as a heading indicator when caching and recovering in a wild rodent

    PubMed Central

    Samson, Jamie; Manser, Marta B.

    2016-01-01

    A number of diurnal species have been shown to use directional information from the sun to orientate. The use of the sun in this way has been suggested to occur in either a time-dependent (relying on specific positional information) or a time-compensated manner (a compass that adjusts itself over time with the shifts in the sun’s position). However, some interplay may occur between the two where a species could also use the sun in a time-limited way, whereby animals acquire certain information about the change of position, but do not show full compensational abilities. We tested whether Cape ground squirrels (Xerus inauris) use the sun as an orientation marker to provide information for caching and recovery. This species is a social sciurid that inhabits arid, sparsely vegetated habitats in Southern Africa, where the sun is nearly always visible during the diurnal period. Due to the lack of obvious landmarks, we predicted that they might use positional cues from the sun in the sky as a reference point when caching and recovering food items. We provide evidence that Cape ground squirrels use information from the sun’s position while caching and reuse this information in a time-limited way when recovering these caches. PMID:27580797

  20. Solutions and debugging for data consistency in multiprocessors with noncoherent caches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

    1995-02-01

    We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less

  1. Design of a memory-access controller with 3.71-times-enhanced energy efficiency for Internet-of-Things-oriented nonvolatile microcontroller unit

    NASA Astrophysics Data System (ADS)

    Natsui, Masanori; Hanyu, Takahiro

    2018-04-01

    In realizing a nonvolatile microcontroller unit (MCU) for sensor nodes in Internet-of-Things (IoT) applications, it is important to solve the data-transfer bottleneck between the central processing unit (CPU) and the nonvolatile memory constituting the MCU. As one circuit-oriented approach to solving this problem, we propose a memory access minimization technique for magnetoresistive-random-access-memory (MRAM)-embedded nonvolatile MCUs. In addition to multiplexing and prefetching of memory access, the proposed technique realizes efficient instruction fetch by eliminating redundant memory access while considering the code length of the instruction to be fetched and the transition of the memory address to be accessed. As a result, the performance of the MCU can be improved while relaxing the performance requirement for the embedded MRAM, and compact and low-power implementation can be performed as compared with the conventional cache-based one. Through the evaluation using a system consisting of a general purpose 32-bit CPU and embedded MRAM, it is demonstrated that the proposed technique increases the peak efficiency of the system up to 3.71 times, while a 2.29-fold area reduction is achieved compared with the cache-based one.

  2. Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack

    In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less

  3. 78 FR 19304 - Notice of Intent To Repatriate Cultural Items: The Colorado College, Colorado Springs, CO

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-29

    ..., Armstrong Hall, Room 201, 14 E. Cache La Poudre, Colorado Springs, CO 80903, telephone (719) 389-6201..., Room 201, 14 E. Cache La Poudre, Colorado Springs, CO 80903, telephone (719) 389-6201, before April 29...

  4. A set-associative, fault-tolerant cache design

    NASA Technical Reports Server (NTRS)

    Lamet, Dan; Frenzel, James F.

    1992-01-01

    The design of a defect-tolerant control circuit for a set-associative cache memory is presented. The circuit maintains the stack ordering necessary for implementing the Least Recently Used (LRU) replacement algorithm. A discussion of programming techniques for bypassing defective blocks is included.

  5. Performance of redundant disk array organizations in transaction processing environments

    NASA Technical Reports Server (NTRS)

    Mourad, Antoine N.; Fuchs, W. K.; Saab, Daniel G.

    1993-01-01

    A performance evaluation is conducted for two redundant disk-array organizations in a transaction-processing environment, relative to the performance of both mirrored disk organizations and organizations using neither striping nor redundancy. The proposed parity-striping alternative to striping with rotated parity is shown to furnish rapid recovery from failure at the same low storage cost without interleaving the data over multiple disks. Both noncached systems and systems using a nonvolatile cache as the controller are considered.

  6. The history of scatter hoarding studies.

    PubMed

    Brodin, Anders

    2010-03-27

    In this review, I will present an overview of the development of the field of scatter hoarding studies. Scatter hoarding is a conspicuous behaviour and it has been observed by humans for a long time. Apart from an exceptional experimental study already published in 1720, it started with observational field studies of scatter hoarding birds in the 1940s. Driven by a general interest in birds, several ornithologists made large-scale studies of hoarding behaviour in species such as nutcrackers and boreal titmice. Scatter hoarding birds seem to remember caching locations accurately, and it was shown in the 1960s that successful retrieval is dependent on a specific part of the brain, the hippocampus. The study of scatter hoarding, spatial memory and the hippocampus has since then developed into a study system for evolutionary studies of spatial memory. In 1978, a game theoretical paper started the era of modern studies by establishing that a recovery advantage is necessary for individual hoarders for the evolution of a hoarding strategy. The same year, a combined theoretical and empirical study on scatter hoarding squirrels investigated how caches should be spaced out in order to minimize cache loss, a phenomenon sometimes called optimal cache density theory. Since then, the scatter hoarding paradigm has branched into a number of different fields: (i) theoretical and empirical studies of the evolution of hoarding, (ii) field studies with modern sampling methods, (iii) studies of the precise nature of the caching memory, (iv) a variety of studies of caching memory and its relationship to the hippocampus. Scatter hoarding has also been the subject of studies of (v) coevolution between scatter hoarding animals and the plants that are dispersed by these.

  7. Improved forest change detection with terrain illumination corrected landsat images

    USDA-ARS?s Scientific Manuscript database

    An illumination correction algorithm has been developed to improve the accuracy of forest change detection from Landsat reflectance data. This algorithm is based on an empirical rotation model and was tested on the Landsat imagery pair over Cherokee National Forest, Tennessee, Uinta-Wasatch-Cache N...

  8. BIRDS AS A MODEL TO STUDY ADULT NEUROGENESIS: BRIDGING EVOLUTIONARY, COMPARATIVE AND NEUROETHOLOGICAL APPROCHES

    PubMed Central

    BARNEA, ANAT; PRAVOSUDOV, VLADIMIR

    2011-01-01

    During the last few decades evidence has demonstrated that adult neurogenesis is a well-preserved feature throughout the animal kingdom. In birds, ongoing neuronal addition occurs rather broadly, to a number of brain regions. This review describes adult avian neurogenesis and neuronal recruitment, discusses factors that regulate these processes, and touches upon the question of their genetic control. Several attributes make birds an extremely advantageous model to study neurogenesis. First, song learning exhibits seasonal variation that is associated with seasonal variation in neuronal turnover in some song control brain nuclei, which seems to be regulated via adult neurogenesis. Second, food-caching birds naturally use memory-dependent behavior in learning locations of thousands of food caches scattered over their home ranges. In comparison with other birds, food-caching species have relatively enlarged hippocampi with more neurons and intense neurogenesis, which appears to be related to spatial learning. Finally, migratory behavior and naturally occurring social systems in birds also provide opportunities to investigate neurogenesis. Such diversity of naturally-occurring memory-based behaviors, combined with the fact that birds can be studied both in the wild and in the laboratory, make them ideal for investigation of neural processes underlying learning. This can be done by using various approaches, from evolutionary and comparative to neuroethological and molecular. Finally, we connect the avian arena to a broader view by providing a brief comparative and evolutionary overview of adult neurogenesis and by discussing the possible functional role of the new neurons. We conclude by indicating future directions and possible medical applications. PMID:21929623

  9. Experiences with http/WebDAV protocols for data access in high throughput computing

    NASA Astrophysics Data System (ADS)

    Bernabeu, Gerard; Martinez, Francisco; Acción, Esther; Bria, Arnau; Caubet, Marc; Delfino, Manuel; Espinal, Xavier

    2011-12-01

    In the past, access to remote storage was considered to be at least one order of magnitude slower than local disk access. Improvement on network technologies provide the alternative of using remote disk. For those accesses one can today reach levels of throughput similar or exceeding those of local disks. Common choices as access protocols in the WLCG collaboration are RFIO, [GSI]DCAP, GRIDFTP, XROOTD and NFS. HTTP protocol shows a promising alternative as it is a simple, lightweight protocol. It also enables the use of standard technologies such as http caching or load balancing which can be used to improve service resilience and scalability or to boost performance for some use cases seen in HEP such as the "hot files". WebDAV extensions allow writing data, giving it enough functionality to work as a remote access protocol. This paper will show our experiences with the WebDAV door for dCache, in terms of functionality and performance, applied to some of the HEP work flows in the LHC Tier1 at PIC.

  10. Corvids Outperform Pigeons and Primates in Learning a Basic Concept.

    PubMed

    Wright, Anthony A; Magnotti, John F; Katz, Jeffrey S; Leonard, Kevin; Vernouillet, Alizée; Kelly, Debbie M

    2017-04-01

    Corvids (birds of the family Corvidae) display intelligent behavior previously ascribed only to primates, but such feats are not directly comparable across species. To make direct species comparisons, we used a same/different task in the laboratory to assess abstract-concept learning in black-billed magpies ( Pica hudsonia). Concept learning was tested with novel pictures after training. Concept learning improved with training-set size, and test accuracy eventually matched training accuracy-full concept learning-with a 128-picture set; this magpie performance was equivalent to that of Clark's nutcrackers (a species of corvid) and monkeys (rhesus, capuchin) and better than that of pigeons. Even with an initial 8-item picture set, both corvid species showed partial concept learning, outperforming both monkeys and pigeons. Similar corvid performance refutes the hypothesis that nutcrackers' prolific cache-location memory accounts for their superior concept learning, because magpies rely less on caching. That corvids with "primitive" neural architectures evolved to equal primates in full concept learning and even to outperform them on the initial 8-item picture test is a testament to the shared (convergent) survival importance of abstract-concept learning.

  11. PKIX Certificate Status in Hybrid MANETs

    NASA Astrophysics Data System (ADS)

    Muñoz, Jose L.; Esparza, Oscar; Gañán, Carlos; Parra-Arnau, Javier

    Certificate status validation is a hard problem in general but it is particularly complex in Mobile Ad-hoc Networks (MANETs) because we require solutions to manage both the lack of fixed infrastructure inside the MANET and the possible absence of connectivity to trusted authorities when the certification validation has to be performed. In this sense, certificate acquisition is usually assumed as an initialization phase. However, certificate validation is a critical operation since the node needs to check the validity of certificates in real-time, that is, when a particular certificate is going to be used. In such MANET environments, it may happen that the node is placed in a part of the network that is disconnected from the source of status data at the moment the status checking is required. Proposals in the literature suggest the use of caching mechanisms so that the node itself or a neighbour node has some status checking material (typically on-line status responses or lists of revoked certificates). However, to the best of our knowledge the only criterion to evaluate the cached (obsolete) material is the time. In this paper, we analyse how to deploy a certificate status checking PKI service for hybrid MANET and we propose a new criterion based on risk to evaluate cached status data that is much more appropriate and absolute than time because it takes into account the revocation process.

  12. Efficient Maintenance and Update of Nonbonded Lists in Macromolecular Simulations.

    PubMed

    Chowdhury, Rezaul; Beglov, Dmitri; Moghadasi, Mohammad; Paschalidis, Ioannis Ch; Vakili, Pirooz; Vajda, Sandor; Bajaj, Chandrajit; Kozakov, Dima

    2014-10-14

    Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists.

  13. Programmable stream prefetch with resource optimization

    DOEpatents

    Boyle, Peter; Christ, Norman; Gara, Alan; Mawhinney, Robert; Ohmacht, Martin; Sugavanam, Krishnan

    2013-01-08

    A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.

  14. A Routing Mechanism for Cloud Outsourcing of Medical Imaging Repositories.

    PubMed

    Godinho, Tiago Marques; Viana-Ferreira, Carlos; Bastião Silva, Luís A; Costa, Carlos

    2016-01-01

    Web-based technologies have been increasingly used in picture archive and communication systems (PACS), in services related to storage, distribution, and visualization of medical images. Nowadays, many healthcare institutions are outsourcing their repositories to the cloud. However, managing communications between multiple geo-distributed locations is still challenging due to the complexity of dealing with huge volumes of data and bandwidth requirements. Moreover, standard methodologies still do not take full advantage of outsourced archives, namely because their integration with other in-house solutions is troublesome. In order to improve the performance of distributed medical imaging networks, a smart routing mechanism was developed. This includes an innovative cache system based on splitting and dynamic management of digital imaging and communications in medicine objects. The proposed solution was successfully deployed in a regional PACS archive. The results obtained proved that it is better than conventional approaches, as it reduces remote access latency and also the required cache storage space.

  15. A Scalable QoS-Aware VoD Resource Sharing Scheme for Next Generation Networks

    NASA Astrophysics Data System (ADS)

    Huang, Chenn-Jung; Luo, Yun-Cheng; Chen, Chun-Hua; Hu, Kai-Wen

    In network-aware concept, applications are aware of network conditions and are adaptable to the varying environment to achieve acceptable and predictable performance. In this work, a solution for video on demand service that integrates wireless and wired networks by using the network aware concepts is proposed to reduce the blocking probability and dropping probability of mobile requests. Fuzzy logic inference system is employed to select appropriate cache relay nodes to cache published video streams and distribute them to different peers through service oriented architecture (SOA). SIP-based control protocol and IMS standard are adopted to ensure the possibility of heterogeneous communication and provide a framework for delivering real-time multimedia services over an IP-based network to ensure interoperability, roaming, and end-to-end session management. The experimental results demonstrate that effectiveness and practicability of the proposed work.

  16. gpuSPHASE-A shared memory caching implementation for 2D SPH using CUDA

    NASA Astrophysics Data System (ADS)

    Winkler, Daniel; Meister, Michael; Rezavand, Massoud; Rauch, Wolfgang

    2017-04-01

    Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD such simulations in 3D need a computation time of years so that a reduction to a 2D domain is inevitable. In this paper gpuSPHASE, a new open-source 2D SPH solver implementation for graphics devices, is developed. It is optimized for simulations that must be executed with thousands of frames per second to be computed in reasonable time. A novel caching algorithm for Compute Unified Device Architecture (CUDA) shared memory is proposed and implemented. The software is validated and the performance is evaluated for the well established dambreak test case.

  17. Geochemistry of mercury and other constituents in subsurface sediment—Analyses from 2011 and 2012 coring campaigns, Cache Creek Settling Basin, Yolo County, California

    USGS Publications Warehouse

    Arias, Michelle R.; Alpers, Charles N.; Marvin-DiPasquale, Mark C.; Fuller, Christopher C.; Agee, Jennifer L.; Sneed, Michelle; Morita, Andrew Y.; Salas, Antonia

    2017-10-31

    Cache Creek Settling Basin was constructed in 1937 to trap sediment from Cache Creek before delivery to the Yolo Bypass, a flood conveyance for the Sacramento River system that is tributary to the Sacramento–San Joaquin Delta. Sediment management options being considered by stakeholders in the Cache Creek Settling Basin include sediment excavation; however, that could expose sediments containing elevated mercury concentrations from historical mercury mining in the watershed. In cooperation with the California Department of Water Resources, the U.S. Geological Survey undertook sediment coring campaigns in 2011–12 (1) to describe lateral and vertical distributions of mercury concentrations in deposits of sediment in the Cache Creek Settling Basin and (2) to improve constraint of estimates of the rate of sediment deposition in the basin.Sediment cores were collected in the Cache Creek Settling Basin, Yolo County, California, during October 2011 at 10 locations and during August 2012 at 5 other locations. Total core depths ranged from approximately 4.6 to 13.7 meters (15 to 45 feet), with penetration to about 9.1 meters (30 feet) at most locations. Unsplit cores were logged for two geophysical parameters (gamma bulk density and magnetic susceptibility); then, selected cores were split lengthwise. One half of each core was then photographed and archived, and the other half was subsampled. Initial subsamples from the cores (20-centimeter composite samples from five predetermined depths in each profile) were analyzed for total mercury, methylmercury, total reduced sulfur, iron speciation, organic content (as the percentage of weight loss on ignition), and grain-size distribution. Detailed follow-up subsampling (3-centimeter intervals) was done at six locations along an east-west transect in the southern part of the Cache Creek Settling Basin and at one location in the northern part of the basin for analyses of total mercury; organic content; and cesium-137, which was used for dating. This report documents site characteristics; field and laboratory methods; and results of the analyses of each core section and subsample of these sediment cores, including associated quality-assurance and quality-control data.

  18. Preparation and characterization of porous bioceramic layers on pure titanium surfaces obtained by micro-arc oxidation process

    NASA Astrophysics Data System (ADS)

    Chien, Chi-Sheng; Hung, Yu-Chien; Hong, Ting-Fu; Wu, Chung-Chun; Kuo, Tsung-Yuan; Lee, Tzer-Min; Liao, Tze-Yuan; Lin, Huan-Chang; Chuang, Cheng-Hsin

    2017-03-01

    Fluorapatite (FA) has better chemical and thermal stability than hydroxyapatite (HA), and has thus attracted significant interest for biomaterial applications in recent years. In this study, porous bioceramic layers were prepared on pure titanium surfaces using a micro-arc oxidation (MAO) technique with an applied voltage of 450 V and an oxidation time of 5 min. The MAO process was performed using three different electrolyte solutions containing calcium fluoride (CaF2), calcium acetate monohydrate (Ca(CH3COO)2·H2O), and sodium phosphate monobasic dihydrate (NaH2PO4·2H2O) mixed in ratios of 0:2:1, 1:1:1, and 2:0:1, respectively. The surface morphology, composition, micro-hardness, porosity, and biological properties of the various MAO coatings were examined and compared. The results showed that as the CaF2/Ca(CH3COO)2·H2O ratio increased, the elemental composition of the MAO coating transformed from HA, A-TiO2 (Anatase) and R-TiO2 (Rutile); to A-TiO2, R-TiO2, and a small amount of HA; and finally A-TiO2, R-TiO2, CaF2, TiP2O5, and FA. The change in elemental composition was accompanied by a higher micro-hardness and a lower porosity. The coatings exhibited a similar in vitro bioactivity performance during immersion in simulated body fluid for 7-28 days. Furthermore, for in initial in vitro biocompatibility tests performed for 24 h using Dulbecco's Modified Eagle Medium (DMEM) supplement containing 10%Fetal bovine serum, the attachment and spreading of osteoblast-like osteosarcoma MG63 cells were found to increase slightly with an increasing CaF2/Ca(CH3COO)2·H2O ratio. In general, the results presented in this study show that all three MAO coatings possess a certain degree of in vitro bioactivity and biocompatibility.

  19. Can seed-caching enhance seedling survival of Indian ricegrass (Achnatherum hymenoides) through intraspecific facilitation?

    USDA-ARS?s Scientific Manuscript database

    Positive interactions among individual plants (facilitation) may often enhance seedling survival in stressful environments. Many granivorous small mammal species cache groups of seeds for future consumption in shallowly buried scatterhoards, and seeds of many plant species germinate and establish ag...

  20. 78 FR 2655 - Uinta-Wasatch-Cache National Forest; Utah; Ogden Travel Plan Project

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-01-14

    ...-Wasatch-Cache National Forest; Utah; Ogden Travel Plan Project AGENCY: Forest Service, USDA. ACTION... prepare a supplement to the Ogden Travel Plan Revision Final Supplemental Environmental Impact Statement (FSEIS). The Ogden Travel Plan Revision FSEIS evaluated six alternatives for possible travel management...

  1. Cache-site selection in Clark's Nutcracker (Nucifraga columbiana)

    Treesearch

    Teresa J. Lorenz; Kimberly A. Sullivan; Amanda V. Bakian; Carol A. Aubry

    2011-01-01

    Clark's Nutcracker (Nucifraga Columbiana) is one of the most specialized scatter-hoarding birds, considered a seed disperser for four species of pines (Pinus spp.), as well as an obligate coevolved mutualist of White bark Pine (P. albicaulis). Cache-site selection has not been formally studied in Clark...

  2. Population genetic structure and its implications for adaptive variation in memory and the hippocampus on a continental scale in food-caching black-capped chickadees.

    PubMed

    Pravosudov, V V; Roth, T C; Forister, M L; Ladage, L D; Burg, T M; Braun, M J; Davidson, B S

    2012-09-01

    Food-caching birds rely on stored food to survive the winter, and spatial memory has been shown to be critical in successful cache recovery. Both spatial memory and the hippocampus, an area of the brain involved in spatial memory, exhibit significant geographic variation linked to climate-based environmental harshness and the potential reliance on food caches for survival. Such geographic variation has been suggested to have a heritable basis associated with differential selection. Here, we ask whether population genetic differentiation and potential isolation among multiple populations of food-caching black-capped chickadees is associated with differences in memory and hippocampal morphology by exploring population genetic structure within and among groups of populations that are divergent to different degrees in hippocampal morphology. Using mitochondrial DNA and 583 AFLP loci, we found that population divergence in hippocampal morphology is not significantly associated with neutral genetic divergence or geographic distance, but instead is significantly associated with differences in winter climate. These results are consistent with variation in a history of natural selection on memory and hippocampal morphology that creates and maintains differences in these traits regardless of population genetic structure and likely associated gene flow. Published 2012. This article is a US Government work and is in the public domain in the USA.

  3. Harvesting implementation for the GI-cat distributed catalog

    NASA Astrophysics Data System (ADS)

    Boldrini, Enrico; Papeschi, Fabrizio; Bigagli, Lorenzo; Mazzetti, Paolo

    2010-05-01

    GI-cat framework implements a distributed catalog service supporting different international standards and interoperability arrangements in use by the geoscientific community. The distribution functionality in conjunction with the mediation functionality allows to seamlessly query remote heterogeneous data sources, including OGC Web Services - e.e. OGC CSW, WCS, WFS and WMS, community standards such as UNIDATA THREDDS/OPeNDAP, SeaDataNet CDI (Common Data Index), GBIF (Global Biodiversity Information Facility) services and OpenSearch engines. In the GI-cat modular architecture a distributor component carry out the distribution functionality by query delegation to the mediator components (one for each different data source). Each of these mediator components is able to query a specific data source and convert back the results by mapping of the foreign data model to the GI-cat internal one, based on ISO 19139. In order to cope with deployment scenarios in which local data is expected, an harvesting approach has been experimented. The new strategy comes in addition to the consolidated distributed approach, allowing the user to switch between a remote and a local search at will for each federated resource; this extends GI-cat configuration possibilities. The harvesting strategy is designed in GI-cat by the use at the core of a local cache component, implemented as a native XML database and based on eXist. The different heterogeneous sources are queried for the bulk of available data; this data is then injected into the cache component after being converted to the GI-cat data model. The query and conversion steps are performed by the mediator components that were are part of the GI-cat framework. Afterward each new query can be exercised against local data that have been stored in the cache component. Considering both advantages and shortcomings that affect harvesting and query distribution approaches, it comes out that a user driven tuning is required to take the best of them. This is often related to the specific user scenarios to be implemented. GI-cat proved to be a flexible framework to address user need. The GI-cat configurator tool was updated to make such a tuning possible: each data source can be configured to enable either harvesting or query distribution approaches; in the former case an appropriate harvesting interval can be set.

  4. Presenilin E318G variant and Alzheimer's disease risk: the Cache County study.

    PubMed

    Hippen, Ariel A; Ebbert, Mark T W; Norton, Maria C; Tschanz, JoAnn T; Munger, Ronald G; Corcoran, Christopher D; Kauwe, John S K

    2016-06-29

    Alzheimer's disease is the leading cause of dementia in the elderly and the third most common cause of death in the United States. A vast number of genes regulate Alzheimer's disease, including Presenilin 1 (PSEN1). Multiple studies have attempted to locate novel variants in the PSEN1 gene that affect Alzheimer's disease status. A recent study suggested that one of these variants, PSEN1 E318G (rs17125721), significantly affects Alzheimer's disease status in a large case-control dataset, particularly in connection with the APOEε4 allele. Our study looks at the same variant in the Cache County Study on Memory and Aging, a large population-based dataset. We tested for association between E318G genotype and Alzheimer's disease status by running a series of Fisher's exact tests. We also performed logistic regression to test for an additive effect of E318G genotype on Alzheimer's disease status and for the existence of an interaction between E318G and APOEε4. In our Fisher's exact test, it appeared that APOEε4 carriers with an E318G allele have slightly higher risk for AD than those without the allele (3.3 vs. 3.8); however, the 95 % confidence intervals of those estimates overlapped completely, indicating non-significance. Our logistic regression model found a positive but non-significant main effect for E318G (p = 0.895). The interaction term between E318G and APOEε4 was also non-significant (p = 0.689). Our findings do not provide significant support for E318G as a risk factor for AD in APOEε4 carriers. Our calculations indicated that the overall sample used in the logistic regression models was adequately powered to detect the sort of effect sizes observed previously. However, the power analyses of our Fisher's exact tests indicate that our partitioned data was underpowered, particularly in regards to the low number of E318G carriers, both AD cases and controls, in the Cache county dataset. Thus, the differences in types of datasets used may help to explain the difference in effect magnitudes seen. Analyses in additional case-control datasets will be required to understand fully the effect of E318G on Alzheimer's disease status.

  5. A Study on the Effectiveness of Lockup-Free Caches for a Reduced Instruction Set Computer (RISC) Processor

    DTIC Science & Technology

    1992-09-01

    to acquire or develop effective simulation tools to observe the behavior of a RISC implementation as it executes different types of programs . We choose...Performance Computer performance is measured by the amount of the time required to execute a program . Performance encompasses two types of time, elapsed time...and CPU time. Elapsed time is the time required to execute a program from start to finish. It includes latency of input/output activities such as

  6. Habitat selection is unaltered after severe insect infestation: Concerns for forest-dependent species

    Treesearch

    Claire A. Zugmeyer; John L. Koprowski

    2009-01-01

    Severe disturbance may alter or eliminate important habitat structure that helps preserve food caches of foodhoarding species. Recent recolonization of an insect-damaged forest by the endangered Mt. Graham red squirrel (Tamiasciurus hudsonicus grahamensis) provided an opportunity to examine habitat selection for midden (cache) sites following...

  7. 76 FR 14372 - Uinta-Wasatch-Cache National Forest Resource Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-16

    ... Street, Salt Lake City, Utah. Written comments should be sent to Loyal Clark, Uinta-Wasatch-Cache... open to the public. The following business will be conducted: (1) Review Forest Service project approval letter, (2) discuss travel budget, and (3) review new proposals. Persons who wish to bring related...

  8. QCDLoop: A comprehensive framework for one-loop scalar integrals

    NASA Astrophysics Data System (ADS)

    Carrazza, Stefano; Ellis, R. Keith; Zanderighi, Giulia

    2016-12-01

    We present a new release of the QCDLoop library based on a modern object-oriented framework. We discuss the available new features such as the extension to the complex masses, the possibility to perform computations in double and quadruple precision simultaneously, and useful caching mechanisms to improve the computational speed. We benchmark the performance of the new library, and provide practical examples of phenomenological implementations by interfacing this new library to Monte Carlo programs.

  9. Challenge toward the prediction of typhoon behaviour and down pour

    NASA Astrophysics Data System (ADS)

    Takahashi, K.; Onishi, R.; Baba, Y.; Kida, S.; Matsuda, K.; Goto, K.; Fuchigami, H.

    2013-08-01

    Mechanisms of interactions among different scale phenomena play important roles for forecasting of weather and climate. Multi-scale Simulator for the Geoenvironment (MSSG), which deals with multi-scale multi-physics phenomena, is a coupled non-hydrostatic atmosphere-ocean model designed to be run efficiently on the Earth Simulator. We present simulation results with the world-highest 1.9km horizontal resolution for the entire globe and regional heavy rain with 1km horizontal resolution and 5m horizontal/vertical resolution for urban area simulation. To gain high performance by exploiting the system capabilities, we propose novel performance evaluation metrics introduced in previous studies that incorporate the effects of the data caching mechanism between CPU and memory. With a useful code optimization guideline based on such metrics, we demonstrate that MSSG can achieve an excellent peak performance ratio of 32.2% on the Earth Simulator with the single-core performance found to be a key to a reduced time-to-solution.

  10. Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schreiber, Robert

    Summary of technical results of Blackcomb Memory Devices We explored various different memory technologies (STTRAM, PCRAM, FeRAM, and ReRAM). The progress can be classified into three categories, below. Modeling and Tool Releases Various modeling tools have been developed over the last decade to help in the design of SRAM or DRAM-based memory hierarchies. To explore new design opportunities that NVM technologies can bring to the designers, we have developed similar high-level models for NVM, including PCRAMsim [Dong 2009], NVSim [Dong 2012], and NVMain [Poremba 2012]. NVSim is a circuit-level model for NVM performance, energy, and area estimation, which supports variousmore » NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. On the other side, NVMain is a cycle accurate main memory simulator designed to simulate emerging nonvolatile memories at the architectural level. We have released these models as open source tools and provided contiguous support to them. We also proposed PS3-RAM, which is a fast, portable and scalable statistical STT-RAM reliability analysis model [Wen 2012]. Design Space Exploration and Optimization With the support of these models, we explore different device/circuit optimization techniques. For example, in [Niu 2012a] we studied the power reduction technique for the application of ECC scheme in ReRAM designs and proposed to use ECC code to relax the BER (Bit Error Rate) requirement of a single memory to improve the write energy consumption and latency for both 1T1R and cross-point ReRAM designs. In [Xu 2011], we proposed a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We also studied the tradeoffs in building a reliable crosspoint ReRAM array [Niu 2012b]. We have conducted an in depth analysis of the circuit and system level design implications of multi-layer cross-point Resistive RAM (MLCReRAM) from performance, power and reliability perspectives [Xu 2013]. The objective of this study is to understand the design trade-offs of this technology with respect to the MLC Phase Change Memory (MLCPCM).Our MLC ReRAM design at the circuit and system levels indicates that different resistance allocation schemes, programming strategies, peripheral designs, and material selections profoundly affect the area, latency, power, and reliability of MLC ReRAM. Based on this analysis, we conduct two case studies: first we compare MLC ReRAM design against MLC phase-change memory (PCM) and multi-layer cross-point ReRAM design, and point out why multi-level ReRAM is appealing; second we further explore the design space for MLC ReRAM. Architecture and Application We explored hybrid checkpointing using phase-change memory for future exascale systems [Dong 2011] and showed that the use of nonvolatile memory for local checkpointing significantly increases the number of faults covered by local checkpoints and reduces the probability of a global failure in the middle of a global checkpoint to less than 1%. We also proposed a technique called i2WAP to mitigate the write variations in NVM-based last-level cache for the improvement of the NVM lifetime [Wang 2013]. Our wear leveling technique attempts to work around the limitations of write endurance by arranging data access so that write operations can be distributed evenly across all the storage cells. During our intensive research on fault-tolerant NVM design, we found that ECC cannot effectively tolerate hard errors from limited write endurance and process imperfection. Therefore, we devised a novel Point and Discard (PAD) architecture in in [ 2012] as a hard-error-tolerant architecture for ReRAM-based Last Level Caches. PAD improves the lifetime of ReRAM caches by 1.6X-440X under different process variations without performance overhead in the system's early life. We have investigated the applicability of NVM for persistent memory design [Zhao 2013]. New byte addressable NVM enables fast persistent memory that allows in-memory persistent data objects to be updated with much higher throughput. Despite the significant improvement, the performance of these designs is only 50% of the native system with no persistence support, due to the logging or copy-on-write mechanisms used to update the persistent memory. A challenge in this approach is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. We have designed a persistent memory system, called Klin, that can provide performance as close as that of the native system. The Klin design adopts a non-volatile cache and a non-volatile main memory for constructing a multi-versioned durable memory system, enabling atomic updates without logging or copy-on-write. Our evaluation shows that the proposed Kiln mechanism can achieve up to 2X of performance improvement to NVRAM-based persistent memory employing write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and no redundant writes to slow non-volatile storage media. The work was published in MICRO 2013 and received Best Paper Honorable Mentioned Award.« less

  11. Hydrologic data for the Cache Creek-Bear Thrust environmental impact statement near Jackson, Wyoming

    USGS Publications Warehouse

    Craig, G.S.; Ringen, B.H.; Cox, E.R.

    1981-01-01

    Information on the quantity and quality of surface and ground water in an area of concern for the Cache Creek-Bear Thrust Environmental Impact Statement in northwestern Wyoming is presented without interpretation. The environmental impact statement is being prepared jointly by the U.S. Geological Survey and the U.S. Forest Service and concerns proposed exploration and development of oil and gas on leased Federal land near Jackson, Wyoming. Information includes data from a gaging station on Cache Creek and from wells, springs, and miscellaneous sites on streams. Data include streamflow, chemical and suspended-sediment quality of streams, and the occurrence and chemical quality of ground water. (USGS)

  12. Education for sustainability and environmental education in National Geoparks. EarthCaching - a new method?

    NASA Astrophysics Data System (ADS)

    Zecha, Stefanie; Regelous, Anette

    2017-04-01

    National Geoparks are restricted areas incorporating educational resources of great importance in promoting education for sustainable development, mobilizing knowledge inherent to the EarthSciences. Different methods can be used to implement the education of sustainability. Here we present possibilities for National Geoparks to support sustainability focusing on new media and EarthCaches based on the data set of the "EarthCachers International EarthCaching" conference in Goslar in October 2015. Using an empirical study designed by ourselves we collected actual information about the environmental consciousness of Earthcachers. The data set was analyzed using SPSS and statistical methods. Here we present the results and their consequences for National Geoparks.

  13. The Cache County Study on Memory in Aging: Factors Affecting Risk of Alzheimer's disease and its Progression after Onset

    PubMed Central

    Tschanz, JoAnn T.; Norton, Maria C.; Zandi, Peter P.; Lyketsos, Constantine G.

    2014-01-01

    The Cache County Study on Memory in Aging is a longitudinal, population-based study of Alzheimer's disease (AD) and other dementias. Initiated in 1995 and extending to 2013, the study has followed over 5,000 elderly residents of Cache County, Utah (USA) for over twelve years. Achieving a 90% participation rate at enrollment, and spawning two ancillary projects, the study has contributed to the literature on genetic, psychosocial and environmental risk factors for AD, late life cognitive decline, and the clinical progression of dementia after its onset. This paper describes the major study contributions to the literature on AD and dementia. PMID:24423221

  14. The Cache County Study on Memory in Aging: factors affecting risk of Alzheimer's disease and its progression after onset.

    PubMed

    Tschanz, Joann T; Norton, Maria C; Zandi, Peter P; Lyketsos, Constantine G

    2013-12-01

    The Cache County Study on Memory in Aging is a longitudinal, population-based study of Alzheimer's disease (AD) and other dementias. Initiated in 1995 and extending to 2013, the study has followed over 5,000 elderly residents of Cache County, Utah (USA) for over twelve years. Achieving a 90% participation rate at enrolment, and spawning two ancillary projects, the study has contributed to the literature on genetic, psychosocial and environmental risk factors for AD, late-life cognitive decline, and the clinical progression of dementia after its onset. This paper describes the major study contributions to the literature on AD and dementia.

  15. Web based visualization of large climate data sets

    USGS Publications Warehouse

    Alder, Jay R.; Hostetler, Steven W.

    2015-01-01

    We have implemented the USGS National Climate Change Viewer (NCCV), which is an easy-to-use web application that displays future projections from global climate models over the United States at the state, county and watershed scales. We incorporate the NASA NEX-DCP30 statistically downscaled temperature and precipitation for 30 global climate models being used in the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC), and hydrologic variables we simulated using a simple water-balance model. Our application summarizes very large, complex data sets at scales relevant to resource managers and citizens and makes climate-change projection information accessible to users of varying skill levels. Tens of terabytes of high-resolution climate and water-balance data are distilled to compact binary format summary files that are used in the application. To alleviate slow response times under high loads, we developed a map caching technique that reduces the time it takes to generate maps by several orders of magnitude. The reduced access time scales to >500 concurrent users. We provide code examples that demonstrate key aspects of data processing, data exporting/importing and the caching technique used in the NCCV.

  16. HPC Profiling with the Sun Studio™ Performance Tools

    NASA Astrophysics Data System (ADS)

    Itzkowitz, Marty; Maruyama, Yukon

    In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.

  17. Retrospective Cognition by Food-Caching Western Scrub-Jays

    ERIC Educational Resources Information Center

    de Kort, S.R.; Dickinson, A.; Clayton, N.S.

    2005-01-01

    Episodic-like memory, the retrospective component of cognitive time travel in animals, needs to fulfil three criteria to meet the behavioral properties of episodic memory as defined for humans. Here, we review results obtained with the cache-recovery paradigm with western scrub-jays and conclude that they fulfil these three criteria. The jays…

  18. 76 FR 16640 - Petitions for Modification of Existing Mandatory Safety Standards

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-24

    ... standard to permit an alternative method of compliance to allow additional outby storage caches of Self.... The petitioner further states that: (a) Additional SCSR outby storage caches will be placed a maximum of 2,000 feet apart in beltlines and return air courses; (b) these additional SCSR outby storage...

  19. Acorn Caching in Tree Squirrels: Teaching Hypothesis Testing in the Park

    ERIC Educational Resources Information Center

    McEuen, Amy B.; Steele, Michael A.

    2012-01-01

    We developed an exercise for a university-level ecology class that teaches hypothesis testing by examining acorn preferences and caching behavior of tree squirrels (Sciurus spp.). This exercise is easily modified to teach concepts of behavioral ecology for earlier grades, particularly high school, and provides students with a theoretical basis for…

  20. Seed Selection by Desert Rodents: Implications for Enhancing Seedling Establishment of Indian Ricegrass (Oryzopsis hymenoides)

    USDA-ARS?s Scientific Manuscript database

    Seeds of many plant species are dispersed by seed-caching rodents that place groups of seeds in superficially-buried scatterhoard caches. A case in point is provided by an important forage plant on arid western rangelands, Indian ricegrass (Oryzopsis hymenoides), for which seedling recruitment comes...

  1. Openwebglobe 2: Visualization of Complex 3D-GEODATA in the (mobile) Webbrowser

    NASA Astrophysics Data System (ADS)

    Christen, M.

    2016-06-01

    Providing worldwide high resolution data for virtual globes consists of compute and storage intense tasks for processing data. Furthermore, rendering complex 3D-Geodata, such as 3D-City models with an extremely high polygon count and a vast amount of textures at interactive framerates is still a very challenging task, especially on mobile devices. This paper presents an approach for processing, caching and serving massive geospatial data in a cloud-based environment for large scale, out-of-core, highly scalable 3D scene rendering on a web based virtual globe. Cloud computing is used for processing large amounts of geospatial data and also for providing 2D and 3D map data to a large amount of (mobile) web clients. In this paper the approach for processing, rendering and caching very large datasets in the currently developed virtual globe "OpenWebGlobe 2" is shown, which displays 3D-Geodata on nearly every device.

  2. Ten dimensions of health and their relationships with overall self-reported health and survival in a predominately religiously active elderly population: the cache county memory study.

    PubMed

    Østbye, Truls; Krause, Katrina M; Norton, Maria C; Tschanz, JoAnn; Sanders, Linda; Hayden, Kathleen; Pieper, Carl; Welsh-Bohmer, Kathleen A

    2006-02-01

    To document the extent of healthy aging along 10 different dimensions in a population known for its longevity. A cohort study with baseline measures of overall self-reported health and health along 10 specific dimensions; analyses investigated the 10 dimensions as predictors of self-reported health and 10-year mortality. Cache County, Utah, which is among the areas with the highest conditional life expectancy at age 65 in the United States. Inhabitants of Cache County aged 65 and older (January 1, 1995). Self-reported overall health and 10 specific dimensions of healthy aging: independent living, vision, hearing, activities of daily living, instrumental activities of daily living, absence of physical illness, cognition, healthy mood, social support and participation, and religious participation and spirituality. This elderly population was healthy overall. With few exceptions, 80% to 90% of persons aged 65 to 75 were healthy according to each measure used. Prevalence of excellent and good self-reported health decreased with age, to approximately 60% in those aged 85 and older. Even in the oldest old, the majority of respondents were independent in activities of daily living. Although vision, hearing, and mood were significant predictors of overall self-reported health in the final models, age, sex, and cognition were significant only in the final survival models. This population has a high prevalence of most factors representing healthy aging. The predictors of overall self-reported health are distinct from the predictors of survival in this age group and, being potentially modifiable, are amenable to clinical and public health efforts.

  3. Neuropsychological Performance in Advanced Age- Influences of Demographic Factors and Apolipoprotein E: Findings from the Cache County Memory Study

    PubMed Central

    Welsh-Bohmer, Katheen A.; Østbye, Truls; Sanders, Linda; Pieper, Carl F.; Hayden, Kathleen M.; Tschanz, JoAnn T.; Norton, Maria C.

    2009-01-01

    The Cache County Study of Memory in Aging (CCMS) is an epidemiological study of Alzheimer’s disease (AD), mild cognitive disorders, and aging in a population of exceptionally long-lived individuals (7th to 11th decade). Observation of population members without dementia provides an opportunity for establishing the range of normal neurocognitive performance in a representative sample of the very old. We examined neurocognitive performance of the normal participants undergoing full clinical evaluations (n=507) and we tested the potential modifying effects of APOE genotype, a known genetic risk factor for the later development of AD. The results indicate that advanced age and low education are related to lower test scores across nearly all of the neurocognitive measures. Gender and APOE ε4 both had negligible and inconsistent influences, affecting only isolated measures of memory and expressive speech (in case of gender). The gender and APOE effects disappeared once age and education were controlled. The study of this exceptionally long-lived population provides useful normative information regarding the broad range of “normal” cognition seen in advanced age. Among elderly without dementia or other cognitive impairment, APOE does not appear to exert any major effects on cognition once other demographic influences are controlled. PMID:18609337

  4. Neuropsychological performance in advanced age: influences of demographic factors and Apolipoprotein E: findings from the Cache County Memory Study.

    PubMed

    Welsh-Bohmer, Katheen A; Ostbye, Truls; Sanders, Linda; Pieper, Carl F; Hayden, Kathleen M; Tschanz, JoAnn T; Norton, Maria C

    2009-01-01

    The Cache County Study of Memory in Aging (CCMS) is an epidemiological study of Alzheimer's disease (AD), mild cognitive disorders, and aging in a population of exceptionally long-lived individuals (7th to 11th decade). Observation of population members without dementia provides an opportunity for establishing the range of normal neurocognitive performance in a representative sample of the very old. We examined neurocognitive performance of the normal participants undergoing full clinical evaluations (n = 507) and we tested the potential modifying effects of apolipoprotein E (APOE) genotype, a known genetic risk factor for the later development of AD. The results indicate that advanced age and low education are related to lower test scores across nearly all of the neurocognitive measures. Gender and APOE epsilon4 both had negligible and inconsistent influences, affecting only isolated measures of memory and expressive speech (in case of gender). The gender and APOE effects disappeared once age and education were controlled. The study of this exceptionally long-lived population provides useful normative information regarding the broad range of "normal" cognition seen in advanced age. Among elderly without dementia or other cognitive impairment, APOE does not appear to exert any major effects on cognition once other demographic influences are controlled.

  5. The coaching process: an effective tool for professional development.

    PubMed

    Kowalski, Karren; Casper, Colleen

    2007-01-01

    A model for coaching in nursing is described. Criteria for selecting a coach are discussed. Competencies for a coach are recommended. In addition, guidelines for caching sessions are provided as well as an example of an action plan outline to help the coachee identify areas of desired growth and options for developing these areas.

  6. Exploitation of pocket gophers and their food caches by grizzly bears

    USGS Publications Warehouse

    Mattson, D.J.

    2004-01-01

    I investigated the exploitation of pocket gophers (Thomomys talpoides) by grizzly bears (Ursus arctos horribilis) in the Yellowstone region of the United States with the use of data collected during a study of radiomarked bears in 1977-1992. My analysis focused on the importance of pocket gophers as a source of energy and nutrients, effects of weather and site features, and importance of pocket gophers to grizzly bears in the western contiguous United States prior to historical extirpations. Pocket gophers and their food caches were infrequent in grizzly bear feces, although foraging for pocket gophers accounted for about 20-25% of all grizzly bear feeding activity during April and May. Compared with roots individually excavated by bears, pocket gopher food caches were less digestible but more easily dug out. Exploitation of gopher food caches by grizzly bears was highly sensitive to site and weather conditions and peaked during and shortly after snowmelt. This peak coincided with maximum success by bears in finding pocket gopher food caches. Exploitation was most frequent and extensive on gently sloping nonforested sites with abundant spring beauty (Claytonia lanceolata) and yampah (Perdieridia gairdneri). Pocket gophers are rare in forests, and spring beauty and yampah roots are known to be important foods of both grizzly bears and burrowing rodents. Although grizzly bears commonly exploit pocket gophers only in the Yellowstone region, this behavior was probably widespread in mountainous areas of the western contiguous United States prior to extirpations of grizzly bears within the last 150 years.

  7. Experimental evidence for a novel mechanism driving variation in habitat quality in a food-caching bird.

    PubMed

    Strickland, Dan; Kielstra, Brian; Ryan Norris, D

    2011-12-01

    Variation in habitat quality can have important consequences for fitness and population dynamics. For food-caching species, a critical determinant of habitat quality is normally the density of storable food, but it is also possible that quality is driven by the ability of habitats to preserve food items. The food-caching gray jay (Perisoreus canadensis) occupies year-round territories in the coniferous boreal and subalpine forests of North America, but does not use conifer seed crops as a source of food. Over the last 33 years, we found that the occupancy rate of territories in Algonquin Park (ON, Canada) has declined at a higher rate in territories with a lower proportion of conifers compared to those with a higher proportion. Individuals occupying territories with a low proportion of conifers were also less likely to successfully fledge young. Using chambers to simulate food caches, we conducted an experiment to examine the hypothesis that coniferous trees are better able to preserve the perishable food items stored in summer and fall than deciduous trees due to their antibacterial and antifungal properties. Over a 1-4 month exposure period, we found that mealworms, blueberries, and raisins all lost less weight when stored on spruce and pine trees compared to deciduous and other coniferous trees. Our results indicate a novel mechanism to explain how habitat quality may influence the fitness and population dynamics of food-caching animals, and has important implications for understanding range limits for boreal breeding animals.

  8. Quantifying animal movement for caching foragers: the path identification index (PII) and cougars, Puma concolor

    USGS Publications Warehouse

    Ironside, Kirsten E.; Mattson, David J.; Theimer, Tad; Jansen, Brian; Holton, Brandon; Arundel, Terry; Peters, Michael; Sexton, Joseph O.; Edwards, Thomas C.

    2017-01-01

    Relocation studies of animal movement have focused on directed versus area restricted movement, which rely on correlations between step-length and turn angles, along with a degree of stationarity through time to define behavioral states. Although these approaches may work well for grazing foraging strategies in a patchy landscape, species that do not spend a significant amount of time searching out and gathering small dispersed food items, but instead feed for short periods on large, concentrated sources or cache food result in movements that maybe difficult to analyze using turning and velocity alone. We use GPS telemetry collected from a prey-caching predator, the cougar (Puma concolor), to test whether adding additional movement metrics capturing site recursion, to the more traditional velocity and turning, improve the ability to identify behaviors. We evaluated our movement index’s ability to identify behaviors using field investigations. We further tested for statistical stationarity across behaviors for use of topographic view-sheds. We found little correlation between turn angle, velocity, tortuosity, and site fidelity and combined them into a movement index used to identify movement paths (temporally autocorrelated movements) related to fast directed movements (taxis), area restricted movements (search), and prey caching (foraging). Changes in the frequency and duration of these movements were helpful for identifying seasonal activities such as migration and denning in females. Comparison of field investigations of cougar activities to behavioral classes defined using the movement index and found an overall classification accuracy of 81%. Changes in behaviors resulted in changes in how cougars used topographic view-sheds, showing statistical non-stationarity over time. The movement index shows promise for identifying behaviors in species that frequently return to specific locations such as food caches, watering holes, or dens, and highlights the role memory and cognitive abilities may play in determining animal movements. With the addition of measures capturing site recursion the temporal structure in movements of a caching forager was revealed.

  9. Ground-water flow patterns and water budget of a bottomland forested wetland, Black Swamp, eastern Arkansas

    USGS Publications Warehouse

    Gonthier, G.J.; Kleiss, B.A.

    1996-01-01

    The U.S. Geological Survey, working in cooperation with the U.S. Army Corps of Engineers, Waterways Experiment Station, collected surface-water and ground-water data from 119 wells and 13 staff gages from September 1989 to September 1992 to describe ground-water flow patterns and water budget in the Black Swamp, a bottomland forested wetland in eastern Arkansas. The study area was between two streamflow gaging stations located about 30.5 river miles apart on the Cache River. Ground-water flow was from northwest to southeast with some diversion toward the Cache River. Hydraulic connection between the surface water and the alluvial aquifer is indicated by nearly equal changes in surface-water and ground-water levels near the Cache River. Diurnal fluctuations of hydraulic head ranged from more than 0 to 0.38 feet and were caused by evapotranspiration. Changes in hydraulic head of the alluvial aquifer beneath the wetland lagged behind stage fluctuations and created the potential for changes in ground-water movement. Differences between surface-water levels in the wetland and stage of the Cache River created a frequently occurring local ground-water flow condition in which surface water in the wetland seeped into the upper part of the alluvial aquifer and then seeped into the Cache River. When the Cache River flooded the wetland, ground water consistently seeped to the surface during falling surface-water stage and surface water seeped into the ground during rising surface-water stage. Ground-water flow was a minor component of the water budget, accounting for less than 1 percent of both inflow and outflow. Surface-water drainage from the study area through diversion canals was not accounted for in the water budget and may be the reason for a surplus of water in the budget. Even though ground-water flow volume is small compared to other water budget components, ground-water seepage to the wetland surface may still be vital to some wetland functions.

  10. Pragmatic open space box utilization: asteroid survey model using distributed objects management based articulation (DOMBA)

    NASA Astrophysics Data System (ADS)

    Mohammad, Atif Farid; Straub, Jeremy

    2015-05-01

    A multi-craft asteroid survey has significant data synchronization needs. Limited communication speeds drive exacting performance requirements. Tables have been used in Relational Databases, which are structure; however, DOMBA (Distributed Objects Management Based Articulation) deals with data in terms of collections. With this, no read/write roadblocks to the data exist. A master/slave architecture is created by utilizing the Gossip protocol. This facilitates expanding a mission that makes an important discovery via the launch of another spacecraft. The Open Space Box Framework facilitates the foregoing while also providing a virtual caching layer to make sure that continuously accessed data is available in memory and that, upon closing the data file, recharging is applied to the data.

  11. Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

    NASA Astrophysics Data System (ADS)

    Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

    2016-04-01

    We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

  12. NAFFS: network attached flash file system for cloud storage on portable consumer electronics

    NASA Astrophysics Data System (ADS)

    Han, Lin; Huang, Hao; Xie, Changsheng

    Cloud storage technology has become a research hotspot in recent years, while the existing cloud storage services are mainly designed for data storage needs with stable high speed Internet connection. Mobile Internet connections are often unstable and the speed is relatively low. These native features of mobile Internet limit the use of cloud storage in portable consumer electronics. The Network Attached Flash File System (NAFFS) presented the idea of taking the portable device built-in NAND flash memory as the front-end cache of virtualized cloud storage device. Modern portable devices with Internet connection have built-in more than 1GB NAND Flash, which is quite enough for daily data storage. The data transfer rate of NAND flash device is much higher than mobile Internet connections[1], and its non-volatile feature makes it very suitable as the cache device of Internet cloud storage on portable device, which often have unstable power supply and intermittent Internet connection. In the present work, NAFFS is evaluated with several benchmarks, and its performance is compared with traditional network attached file systems, such as NFS. Our evaluation results indicate that the NAFFS achieves an average accessing speed of 3.38MB/s, which is about 3 times faster than directly accessing cloud storage by mobile Internet connection, and offers a more stable interface than that of directly using cloud storage API. Unstable Internet connection and sudden power off condition are tolerable, and no data in cache will be lost in such situation.

  13. Statistical Inference-Based Cache Management for Mobile Learning

    ERIC Educational Resources Information Center

    Li, Qing; Zhao, Jianmin; Zhu, Xinzhong

    2009-01-01

    Supporting efficient data access in the mobile learning environment is becoming a hot research problem in recent years, and the problem becomes tougher when the clients are using light-weight mobile devices such as cell phones whose limited storage space prevents the clients from holding a large cache. A practical solution is to store the cache…

  14. Geo-Caching: Place-Based Discovery of Virginia State Parks and Museums

    ERIC Educational Resources Information Center

    Gray, Howard Richard

    2007-01-01

    The use of Global Positioning Systems (GPS) units has exploded in recent years along with the computer technology to access this data-based information. Geo-caching is an exciting game using GPS that provides place-based information regarding the public lands, facilities and cultural heritage programs within the Virginia Parks and Museum system.…

  15. 40 CFR 81.345 - Utah.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    .../Attainment. The area surrounding Brigham City, as described by the following Townships or the portions of the following Townships in Box Elder County: T9N 2W, T9N R1W, T8N 2W Cache County, UT (part): Cache County.... The area surrounding Grantsville, as described by the following Townships or the portions of the...

  16. 40 CFR 81.345 - Utah.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    .../Attainment. The area surrounding Brigham City, as described by the following Townships or the portions of the following Townships in Box Elder County: T9N 2W, T9N R1W, T8N 2W Cache County, UT (part): Cache County.... The area surrounding Grantsville, as described by the following Townships or the portions of the...

  17. Managing coherence via put/get windows

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

    2011-01-11

    A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

  18. Managing coherence via put/get windows

    DOEpatents

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

    2012-02-21

    A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

  19. XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication

    NASA Astrophysics Data System (ADS)

    Bauerdick, L. A. T.; Bloom, K.; Bockelman, B.; Bradley, D. C.; Dasu, S.; Dost, J. M.; Sfiligoi, I.; Tadel, A.; Tadel, M.; Wuerthwein, F.; Yagil, A.; Cms Collaboration

    2014-06-01

    Following the success of the XRootd-based US CMS data federation, the AAA project investigated extensions of the federation architecture by developing two sample implementations of an XRootd, disk-based, caching proxy. The first one simply starts fetching a whole file as soon as a file open request is received and is suitable when completely random file access is expected or it is already known that a whole file be read. The second implementation supports on-demand downloading of partial files. Extensions to the Hadoop Distributed File System have been developed to allow for an immediate fallback to network access when local HDFS storage fails to provide the requested block. Both cache implementations are in pre-production testing at UCSD.

  20. High Performance Computing Multicast

    DTIC Science & Technology

    2012-02-01

    responsiveness, first-tier applications often implement replicated in- memory key-value stores , using them to store state or to cache data from services...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response. A...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response

  1. Different Scalable Implementations of Collision and Streaming for Optimal Computational Performance of Lattice Boltzmann Simulations

    NASA Astrophysics Data System (ADS)

    Geneva, Nicholas; Wang, Lian-Ping

    2015-11-01

    In the past 25 years, the mesoscopic lattice Boltzmann method (LBM) has become an increasingly popular approach to simulate incompressible flows including turbulent flows. While LBM solves more solution variables compared to the conventional CFD approach based on the macroscopic Navier-Stokes equation, it also offers opportunities for more efficient parallelization. In this talk we will describe several different algorithms that have been developed over the past 10 plus years, which can be used to represent the two core steps of LBM, collision and streaming, more effectively than standard approaches. The application of these algorithms spans LBM simulations ranging from basic channel to particle laden flows. We will cover the essential detail on the implementation of each algorithm for simple 2D flows, to the challenges one faces when using a given algorithm for more complex simulations. The key is to explore the best use of data structure and cache memory. Two basic data structures will be discussed and the importance of effective data storage to maximize a CPU's cache will be addressed. The performance of a 3D turbulent channel flow simulation using these different algorithms and data structures will be compared along with important hardware related issues.

  2. Parallel network simulations with NEURON.

    PubMed

    Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

    2006-10-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.

  3. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  4. GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy

    NASA Astrophysics Data System (ADS)

    Yamanaka, Akinori; Aoki, Takayuki; Ogawa, Satoi; Takaki, Tomohiro

    2011-03-01

    The phase-field simulation for dendritic solidification of a binary alloy has been accelerated by using a graphic processing unit (GPU). To perform the phase-field simulation of the alloy solidification on GPU, a program code was developed with computer unified device architecture (CUDA). In this paper, the implementation technique of the phase-field model on GPU is presented. Also, we evaluated the acceleration performance of the three-dimensional solidification simulation by using a single NVIDIA TESLA C1060 GPU and the developed program code. The results showed that the GPU calculation for 5763 computational grids achieved the performance of 170 GFLOPS by utilizing the shared memory as a software-managed cache. Furthermore, it can be demonstrated that the computation with the GPU is 100 times faster than that with a single CPU core. From the obtained results, we confirmed the feasibility of realizing a real-time full three-dimensional phase-field simulation of microstructure evolution on a personal desktop computer.

  5. Headwaters, Wetlands, and Wildfires: Utilizing Landsat imagery, GIS, and Statistical Models for Mapping Wetlands in Northern Colorado's Cache la Poudre Watershed in the aftermath of the June 2012 High Park Fire

    NASA Astrophysics Data System (ADS)

    Chignell, S.; Skach, S.; Kessenich, B.; Weimer, A.; Luizza, M.; Birtwistle, A.; Evangelista, P.; Laituri, M.; Young, N.

    2013-12-01

    The June 2012 High Park Fire burned over 87,000 acres of forest and 259 homes to the west of Fort Collins, CO. The fire has had dramatic impacts on forest ecosystems; of particular concern are its effects on the Cache la Poudre watershed, as the Poudre River is one of the most important headwaters of the Colorado Front Range, providing important ecosystem and economic services before flowing into the South Platte, which in turn flows into the Missouri River. Within a week of the fire, the area received several days of torrential rains. This precipitation--in conjunction with steep riverbanks and the loss of vegetation by fire--caused soil and ash runoff to be deposited into the Poudre's channel, resulting in a river of choking mud and black sludge. Monitoring the effects of such wildfires is critical and requires establishing immediate baseline data to assess impacts over time. Of particular concern is the region's wetlands, which not only provide habitat for a rich array of flora and fauna, but help regulate river discharge, improve water quality, and aid in carbon sequestration. However, the high expense of field work and the changing nature of wetlands have left many of the area's wetland maps incomplete and in need of updating. Utilizing Landsat 5 and Landsat 8 imagery, ancillary GIS layers, and boosted regression trees modeling, the NASA DEVELOP team based at the North Central Climate Science Center at Colorado State University developed a methodology for wetland modeling within the Cache la Poudre watershed. These efforts produced a preliminary model of predicted wetlands across the landscape that correctly classified 89% of the withheld validation points and had a kappa value of approximately 0.78. This initial model is currently being refined and validated using the USGS Software for Assisted Habitat Modeling (SAHM) to run multiple models within three elevation-based 'life zones.' The ultimate goal of this ongoing project is to provide important spatial data for land managers in the watershed, as well as an efficient, economical, and accurate wetland mapping methodology that can be reproduced throughout the intermountain west region.

  6. Jefferson Lab Mass Storage and File Replication Services

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ian Bird; Ying Chen; Bryan Hess

    Jefferson Lab has implemented a scalable, distributed, high performance mass storage system - JASMine. The system is entirely implemented in Java, provides access to robotic tape storage and includes disk cache and stage manager components. The disk manager subsystem may be used independently to manage stand-alone disk pools. The system includes a scheduler to provide policy-based access to the storage systems. Security is provided by pluggable authentication modules and is implemented at the network socket level. The tape and disk cache systems have well defined interfaces in order to provide integration with grid-based services. The system is in production andmore » being used to archive 1 TB per day from the experiments, and currently moves over 2 TB per day total. This paper will describe the architecture of JASMine; discuss the rationale for building the system, and present a transparent 3rd party file replication service to move data to collaborating institutes using JASMine, XM L, and servlet technology interfacing to grid-based file transfer mechanisms.« less

  7. Improvements to Autoplot's HAPI Support

    NASA Astrophysics Data System (ADS)

    Faden, J.; Vandegriff, J. D.; Weigel, R. S.

    2017-12-01

    Autoplot handles data from a variety of data servers. These servers communicate data in different forms, each somewhat different in capabilities and each needing new software to interface. The Heliophysics Application Programmer's Interface (HAPI) attempts to ease this by providing a standard target for clients and servers to meet. Autoplot fully supports reading data from HAPI servers, and support continues to improve as the HAPI server spec matures. This collaboration has already produced robust clients and documentation which would be expensive for groups creating their own protocol. For example, client-side data caching is introduced where Autoplot maintains a cache of data for performance and off-line use. This is a feature we considered for previous data systems, but we could never afford the time to study and implement this carefully. Also, Autoplot itself can be used as a server, making the data it can read and the results of its processing available to other data systems. Autoplot use with other data transmission systems is reviewed as well, outlining features of each system.

  8. Mars Technology Rover with Arm-Mounted Percussive Coring Tool, Microimager, and Sample-Handling Encapsulation Containerization Subsystem

    NASA Technical Reports Server (NTRS)

    Younse, Paulo J.; Dicicco, Matthew A.; Morgan, Albert R.

    2012-01-01

    A report describes the PLuto (programmable logic) Mars Technology Rover, a mid-sized FIDO (field integrated design and operations) class rover with six fully drivable and steerable cleated wheels, a rocker-bogey suspension, a pan-tilt mast with panorama and navigation stereo camera pairs, forward and rear stereo hazcam pairs, internal avionics with motor drivers and CPU, and a 5-degrees-of-freedom robotic arm. The technology rover was integrated with an arm-mounted percussive coring tool, microimager, and sample handling encapsulation containerization subsystem (SHEC). The turret of the arm contains a percussive coring drill and microimager. The SHEC sample caching system mounted to the rover body contains coring bits, sample tubes, and sample plugs. The coring activities performed in the field provide valuable data on drilling conditions for NASA tasks developing and studying coring technology. Caching of samples using the SHEC system provide insight to NASA tasks investigating techniques to store core samples in the future.

  9. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  10. 120-MHz BiCMOS superscalar RISC processor

    NASA Astrophysics Data System (ADS)

    Tanaka, Shigeya; Hotta, Takashi; Murabayashi, Fumio; Yamada, Hiromichi; Yoshida, Shoji; Shimamura, Kotaro; Katsura, Koyo; Bandoh, Tadaaki; Ikeda, Koichi; Matsubara, Kenji

    1994-04-01

    A superscalar RISC processor contains 2.8 million transistors in a die size of 16.2 mm x 16.5 mm, and utilizes 3.3 V/0.5 micron BiCMOS technology. In order to take advantage of superscalar performance without incurring penalties from a slower clock or a longer pipeline, a tag bit is implemented in the instruction cache to indicate dependency between two instructions. A performance gain of up to 37% is obtained with only a 3.5% area overhead from our superscalar design.

  11. Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak

    2003-01-01

    This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.

  12. Optimizing Input/Output Using Adaptive File System Policies

    NASA Technical Reports Server (NTRS)

    Madhyastha, Tara M.; Elford, Christopher L.; Reed, Daniel A.

    1996-01-01

    Parallel input/output characterization studies and experiments with flexible resource management algorithms indicate that adaptivity is crucial to file system performance. In this paper we propose an automatic technique for selecting and refining file system policies based on application access patterns and execution environment. An automatic classification framework allows the file system to select appropriate caching and pre-fetching policies, while performance sensors provide feedback used to tune policy parameters for specific system environments. To illustrate the potential performance improvements possible using adaptive file system policies, we present results from experiments involving classification-based and performance-based steering.

  13. Assessment of watershed vulnerability to climate change for the Uinta-Wasatch-Cache and Ashley National Forests, Utah

    Treesearch

    Janine Rice; Tim Bardsley; Pete Gomben; Dustin Bambrough; Stacey Weems; Sarah Leahy; Christopher Plunkett; Charles Condrat; Linda A. Joyce

    2017-01-01

    Watersheds on the Uinta-Wasatch-Cache and Ashley National Forests provide many ecosystem services, and climate change poses a risk to these services. We developed a watershed vulnerability assessment to provide scientific information for land managers facing the challenge of managing these watersheds. Literature-based information and expert elicitation is used to...

  14. Hide And Seek GPS And Geocaching In The Classroom

    ERIC Educational Resources Information Center

    Lary, Lynn M.

    2004-01-01

    In short, geocaching is a high-tech, worldwide treasure hunt (geocaches can now be found in more than 180 countries) where a person hides a cache for others to find. Generally, the cache is some type of waterproof container that contains a log book and an assortment of goodies, such as lottery tickets, toys, photo books for cachers to fill with…

  15. Dynamically programmable cache

    NASA Astrophysics Data System (ADS)

    Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas

    1998-10-01

    Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).

  16. Storage resource manager

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perelmutov, T.; Bakken, J.; Petravick, D.

    Storage Resource Managers (SRMs) are middleware components whose function is to provide dynamic space allocation and file management on shared storage components on the Grid[1,2]. SRMs support protocol negotiation and reliable replication mechanism. The SRM standard supports independent SRM implementations, allowing for a uniform access to heterogeneous storage elements. SRMs allow site-specific policies at each location. Resource Reservations made through SRMs have limited lifetimes and allow for automatic collection of unused resources thus preventing clogging of storage systems with ''orphan'' files. At Fermilab, data handling systems use the SRM management interface to the dCache Distributed Disk Cache [5,6] and themore » Enstore Tape Storage System [15] as key components to satisfy current and future user requests [4]. The SAM project offers the SRM interface for its internal caches as well.« less

  17. Managing coherence via put/get windows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blumrich, Matthias A; Chen, Dong; Coteus, Paul W

    A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less

  18. Comparisons between Intel 386 and i486 microprecessors

    NASA Technical Reports Server (NTRS)

    Liu, Yuan-Kwei

    1989-01-01

    A quick and preliminary comparison is made between the Intel 386 and i486 microprocessors. The following topics are discussed: the i486 key elements, comparison of instruction set architecture, the i486 on-chip cache characteristics, the i486 multiprocessor support, comparison of performance, comparison of power consumption, comparison of radiation hardening potential, and recommendations for the Space Station Freedom (SSF) Data Management System (DMS).

  19. The Adventures of Team Fantastic: A Practical Guide for Team Leaders and Members.

    ERIC Educational Resources Information Center

    Hallam, Glenn L.

    This publication looks at the ways in which one who is part of a team can help improve the team's performance. The successes and failures of a fictional team are used to illustrate real-life team skills. Examples are drawn from a number of imaginary scenarios--for example, looking for a cache of diamonds in the Brazilian jungle, straightening ties…

  20. Tracking Seed Fates of Tropical Tree Species: Evidence for Seed Caching in a Tropical Forest in North-East India

    PubMed Central

    Sidhu, Swati; Datta, Aparajita

    2015-01-01

    Rodents affect the post-dispersal fate of seeds by acting either as on-site seed predators or as secondary dispersers when they scatter-hoard seeds. The tropical forests of north-east India harbour a high diversity of little-studied terrestrial murid and hystricid rodents. We examined the role played by these rodents in determining the seed fates of tropical evergreen tree species in a forest site in north-east India. We selected ten tree species (3 mammal-dispersed and 7 bird-dispersed) that varied in seed size and followed the fates of 10,777 tagged seeds. We used camera traps to determine the identity of rodent visitors, visitation rates and their seed-handling behavior. Seeds of all tree species were handled by at least one rodent taxon. Overall rates of seed removal (44.5%) were much higher than direct on-site seed predation (9.9%), but seed-handling behavior differed between the terrestrial rodent groups: two species of murid rodents removed and cached seeds, and two species of porcupines were on-site seed predators. In addition, a true cricket, Brachytrupes sp., cached seeds of three species underground. We found 309 caches formed by the rodents and the cricket; most were single-seeded (79%) and seeds were moved up to 19 m. Over 40% of seeds were re-cached from primary cache locations, while about 12% germinated in the primary caches. Seed removal rates varied widely amongst tree species, from 3% in Beilschmiedia assamica to 97% in Actinodaphne obovata. Seed predation was observed in nine species. Chisocheton cumingianus (57%) and Prunus ceylanica (25%) had moderate levels of seed predation while the remaining species had less than 10% seed predation. We hypothesized that seed traits that provide information on resource quantity would influence rodent choice of a seed, while traits that determine resource accessibility would influence whether seeds are removed or eaten. Removal rates significantly decreased (p < 0.001) while predation rates increased (p = 0.06) with seed size. Removal rates were significantly lower for soft seeds (p = 0.002), whereas predation rates were significantly higher on soft seeds (p = 0.01). Our results show that murid rodents play a very important role in affecting the seed fates of tropical trees in the Eastern Himalayas. We also found that the different rodent groups differed in their seed handling behavior and responses to changes in seed characteristics. PMID:26247616

  1. VLBI-resolution radio-map algorithms: Performance analysis of different levels of data-sharing on multi-socket, multi-core architectures

    NASA Astrophysics Data System (ADS)

    Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.

    2012-09-01

    A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.

  2. Environmental Impact Analysis Process, Groom Mountain Range, Lincoln County, Nevada

    DTIC Science & Technology

    1985-10-01

    bases clustered around springs, temporary camps, rock shelters , quarries, lithic scatters, rock art, pinyon caches, pot drops, isolates, and historic...include pinyon caches and rock shelters with associated historic artifacts and many of the spring sites. These sites provide an unusual research...Management. (b) Proposed Action: Renewed Withdrawal of Groom Mountain Range Addition to Nellis Air Force Bombing and Gunnery Range, Lincoln County, Nevada. (c

  3. Presettlement Forests of the Black Swamp Area, Cache River,Woodruff County, Arkansas, from Notes of the First Land Survey

    Treesearch

    Thomas L. Foti

    2001-01-01

    Relationships between forest vegetation and soil were reconstructed from field notes of the 1846 Public Land Survey (PLS) along a portion of the Cache River including Black Swamp. Locations of corners were digitized long with species,diameter,and distance from section or quarter-section corners. Trees were grouped for analysis according to occurrence on groups of...

  4. Killing of a muskox, Ovibus moschatus, by two wolves, Canis lupis, and subsequent caching

    USGS Publications Warehouse

    Mech, L. David; Adams, Layne G.

    1999-01-01

    The killing of a cow Muskox (Ovibos moschatus) by two Wolves (Canis lupus) in 5 minutes during summer on Ellesmere Island is described. After two of the four feedings observed, one Wolf cached a leg and regurgitated food as far as 2.3 km away and probably farther. The implications of this behavior for deriving food-consumption estimates are discussed.

  5. Evaluating Fragment Construction Policies for SDT Systems

    DTIC Science & Technology

    2006-01-01

    allocates a fragment and begins translation. Once a termination condition is met, Strata emits any trampolines that are necessary. Trampolines are pieces... trampolines (unless its target previously exists in the fragment cache). Once a CTI’s target instruction becomes available in the fragment cache, the CTI is...linked directly to the destination, avoiding future uses of the trampoline . This mechanism is called Fragment Linking and avoids significant overhead

  6. (Re)engineering Earth System Models to Expose Greater Concurrency for Ultrascale Computing: Practice, Experience, and Musings

    NASA Astrophysics Data System (ADS)

    Mills, R. T.

    2014-12-01

    As the high performance computing (HPC) community pushes towards the exascale horizon, the importance and prevalence of fine-grained parallelism in new computer architectures is increasing. This is perhaps most apparent in the proliferation of so-called "accelerators" such as the Intel Xeon Phi or NVIDIA GPGPUs, but the trend also holds for CPUs, where serial performance has grown slowly and effective use of hardware threads and vector units are becoming increasingly important to realizing high performance. This has significant implications for weather, climate, and Earth system modeling codes, many of which display impressive scalability across MPI ranks but take relatively little advantage of threading and vector processing. In addition to increasing parallelism, next generation codes will also need to address increasingly deep hierarchies for data movement: NUMA/cache levels, on node vs. off node, local vs. wide neighborhoods on the interconnect, and even in the I/O system. We will discuss some approaches (grounded in experiences with the Intel Xeon Phi architecture) for restructuring Earth science codes to maximize concurrency across multiple levels (vectors, threads, MPI ranks), and also discuss some novel approaches for minimizing expensive data movement/communication.

  7. Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

    PubMed

    Nair, Pradeep S; John, Eugene B

    2007-01-01

    Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.

  8. Towards ubiquitous access of computer-assisted surgery systems.

    PubMed

    Liu, Hui; Lufei, Hanping; Shi, Weishong; Chaudhary, Vipin

    2006-01-01

    Traditional stand-alone computer-assisted surgery (CAS) systems impede the ubiquitous and simultaneous access by multiple users. With advances in computing and networking technologies, ubiquitous access to CAS systems becomes possible and promising. Based on our preliminary work, CASMIL, a stand-alone CAS server developed at Wayne State University, we propose a novel mobile CAS system, UbiCAS, which allows surgeons to retrieve, review and interpret multimodal medical images, and to perform some critical neurosurgical procedures on heterogeneous devices from anywhere at anytime. Furthermore, various optimization techniques, including caching, prefetching, pseudo-streaming-model, and compression, are used to guarantee the QoS of the UbiCAS system. UbiCAS enables doctors at remote locations to actively participate remote surgeries, share patient information in real time before, during, and after the surgery.

  9. Applications Performance on NAS Intel Paragon XP/S - 15#

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

    1994-01-01

    The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.

  10. A Temporal Locality-Aware Page-Mapped Flash Translation Layer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Youngjae; Gupta, Aayush; Urgaonkar, Bhuvan

    2013-01-01

    The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the flash translation layer (FTL) which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-Based Flash Translation Layer (DFTL) which selectively caches page- level address mappings. Our experimental evaluation using FlashSim with realistic enterprise-scalemore » workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: 1) improved performance, 2) reduced garbage collection overhead and 3) better overload behavior compared with hybrid FTL schemes which are the most popular implementation methods. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared with the hybrid FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%. Moreover, interestingly, when write-back cache on DFTL-based SSD is enabled, DFTL even outperforms the page-based FTL scheme, improving their response time by 72% in Financial trace.« less

  11. Scatter-hoarding rodents as secondary seed dispersers of a frugivore-dispersed tree Scleropyrum wallichianum in a defaunated Xishuangbanna tropical forest, China.

    PubMed

    Cao, Lin; Xiao, Zhishu; Guo, Cong; Chen, Jin

    2011-09-01

    Local extinction or population decline of large frugivorous vertebrates as primary seed dispersers, caused by human disturbance and habitat change, might lead to dispersal limitation of many large-seeded fruit trees. However, it is not known whether or not scatter-hoarding rodents as secondary seed dispersers can help maintain natural regeneration (e.g. seed dispersal) of these frugivore-dispersed trees in the face of the functional reduction or loss of primary seed dispersers. In the present study, we investigated how scatter-hoarding rodents affect the fate of tagged seeds of a large-seeded fruit tree (Scleropyrum wallichianum Arnott, 1838, Santalaceae) from seed fall to seedling establishment in a heavily defaunated tropical forest in the Xishuangbanna region of Yunnan Province, in southwest China, in 2007 and 2008. Our results show that: (i) rodents removed nearly all S. wallichianum seeds in both years; (ii) a large proportion (2007, 75%; 2008, 67.5%) of the tagged seeds were cached individually in the surface soil or under leaf litters; (iii) dispersal distance of primary caches was further in 2007 (19.6 ± 14.6 m) than that in 2008 (14.1 ± 11.6 m), and distance increased as rodents recovered and moved seeds from primary caches into subsequent caching sites; and (iv) part of the cached seeds (2007, 3.2%; 2008, 2%) survived to the seedling stage each year. Our study suggests that by taking roles of both primary and secondary seed dispersers, scatter-hoarding rodents can play a significant role in maintaining seedling establishment of S. wallichianum, and are able to at least partly compensate for the loss of large frugivorous vertebrates in seed dispersal. © 2011 ISZS, Blackwell Publishing and IOZ/CAS.

  12. Free Factories: Unified Infrastructure for Data Intensive Web Services

    PubMed Central

    Zaranek, Alexander Wait; Clegg, Tom; Vandewege, Ward; Church, George M.

    2010-01-01

    We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center. PMID:20514356

  13. Key design elements of a data utility for national biosurveillance: event-driven architecture, caching, and Web service model.

    PubMed

    Tsui, Fu-Chiang; Espino, Jeremy U; Weng, Yan; Choudary, Arvinder; Su, Hoah-Der; Wagner, Michael M

    2005-01-01

    The National Retail Data Monitor (NRDM) has monitored over-the-counter (OTC) medication sales in the United States since December 2002. The NRDM collects data from over 18,600 retail stores and processes over 0.6 million sales records per day. This paper describes key architectural features that we have found necessary for a data utility component in a national biosurveillance system. These elements include event-driven architecture to provide analyses of data in near real time, multiple levels of caching to improve query response time, high availability through the use of clustered servers, scalable data storage through the use of storage area networks and a web-service function for interoperation with affiliated systems. The methods and architectural principles are relevant to the design of any production data utility for public health surveillance-systems that collect data from multiple sources in near real time for use by analytic programs and user interfaces that have substantial requirements for time-series data aggregated in multiple dimensions.

  14. A population study of Alzheimer's disease: findings from the Cache County Study on Memory, Health, and Aging.

    PubMed

    Tschanz, Joann T; Treiber, Katherine; Norton, Maria C; Welsh-Bohmer, Kathleen A; Toone, Leslie; Zandi, Peter P; Szekely, Christine A; Lyketsos, Constantine; Breitner, John C S

    2005-01-01

    There are several population-based studies of aging, memory, and dementia being conducted worldwide. Of these, the Cache County Study on Memory, Health and Aging is noteworthy for its large number of "oldest-old" members. This study, which has been following an initial cohort of 5,092 seniors since 1995, has reported among its major findings the role of the Apolipoprotein E gene on modifying the risk for Alzheimer's disease (AD) in males and females and identifying pharmacologic compounds that may act to reduce AD risk. This article summarizes the major findings of the Cache County study to date, describes ongoing investigations, and reports preliminary analyses on the outcome of the oldest-old in this population, the subgroup of participants who were over age 84 at the study's inception.

  15. Understanding Consistency Maintenance in Service Discovery Architectures in Response to Message Loss

    DTIC Science & Technology

    2002-07-01

    manager (SM), and (3) service cache manager ( SCM ). The SCM is an optional element not supported by all discovery protocols. These components participate...the SCM operates as an intermediary, matching advertised SDs of SMs to requirements provided by SUs. Table 1 shows how these general concepts map...Service DescriptionService ItemService Description (SD) Directory Service Agent (optional) not applicableLookup ServiceService Cache Manager ( SCM

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Millar, A. P.; Behrmann, G.; Bernardt, C.

    With over ten years in production use dCache data storage system has evolved to match ever changing lansdcape of continually evolving storage technologies with new solutions to both existing problems and new challenges. In this paper, we present three areas of innovation in dCache: providing efficient access to data with NFS v4.1 pNFS, adoption of CDMI and WebDAV as an alternative to SRM for managing data, and integration with alternative authentication mechanisms.

  17. Graded Mirror Self-Recognition by Clark's Nutcrackers.

    PubMed

    Clary, Dawson; Kelly, Debbie M

    2016-11-04

    The traditional 'mark test' has shown some large-brained species are capable of mirror self-recognition. During this test a mark is inconspicuously placed on an animal's body where it can only be seen with the aid of a mirror. If the animal increases the number of actions directed to the mark region when presented with a mirror, the animal is presumed to have recognized the mirror image as its reflection. However, the pass/fail nature of the mark test presupposes self-recognition exists in entirety or not at all. We developed a novel mirror-recognition task, to supplement the mark test, which revealed gradation in the self-recognition of Clark's nutcrackers, a large-brained corvid. To do so, nutcrackers cached food alone, observed by another nutcracker, or with a regular or blurry mirror. The nutcrackers suppressed caching with a regular mirror, a behavioural response to prevent cache theft by conspecifics, but did not suppress caching with a blurry mirror. Likewise, during the mark test, most nutcrackers made more self-directed actions to the mark with a blurry mirror than a regular mirror. Both results suggest self-recognition was more readily achieved with the blurry mirror and that self-recognition may be more broadly present among animals than currently thought.

  18. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    NASA Astrophysics Data System (ADS)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  19. A Content Standard for Computational Models; Digital Rights Management (DRM) Architectures; A Digital Object Approach to Interoperable Rights Management: Finely-Grained Policy Enforcement Enabled by a Digital Object Infrastructure; LOCKSS: A Permanent Web Publishing and Access System; Tapestry of Time and Terrain.

    ERIC Educational Resources Information Center

    Hill, Linda L.; Crosier, Scott J.; Smith, Terrence R.; Goodchild, Michael; Iannella, Renato; Erickson, John S.; Reich, Vicky; Rosenthal, David S. H.

    2001-01-01

    Includes five articles. Topics include requirements for a content standard to describe computational models; architectures for digital rights management systems; access control for digital information objects; LOCKSS (Lots of Copies Keep Stuff Safe) that allows libraries to run Web caches for specific journals; and a Web site from the U.S.…

  20. Wintertime Ambient Ammonia Concentrations in Northern Utah's Urban Valleys

    NASA Astrophysics Data System (ADS)

    Hammond, I. A.; Martin, R. S.; Silva, P.; Baasandorj, M.

    2017-12-01

    Many of the population centers in northern Utah are currently classified as non-attainment or serious non-attainment, Wasatch Front, for PM2.5 and previous studies have shown ammonium nitrate to often be the largest contributor to the particulate mass. Furthermore, measurements have shown several of the Wasatch Front cities and Cache Valley (UT/ID) consistently recorded some of the highest ambient ammonia (NH3) concentrations in the continental United States. As a part of the multi-organization 2017 Utah Winter Fine Particulate Study real-time NH3 concentrations were monitored in the Cache Valley at the Logan, UT site, collocated at an EPA sampling trailer near the Utah State University (USU) campus. A Picarro model G2508 was to used collect 5-sec averaged concentrations of NH3, carbon dioxide (CO2), and methane (CH4) from January 16th to February 14th, 2017. Parts of three inversion events, wherein the PM2.5 concentrations approached or exceeded the National Ambient Air Quality Standards, were captured during the sampling period, including a 10-day event from January 25th to February 4th. Concentrations of all three of the observed species showed significant accumulation during the events, with NH3 concentrations ranging from below the detection limit (<0.5 ppb) to >70 ppb. Preliminary analysis suggested the temporal NH3 changes tracked the increase in PM2.5 throughout the inversion events; however, a one-day period of NH3 depletion during the main inversion event was observed while PM2.5 continued to increase. Additionally, a network of passive NH3 samplers (Ogawa Model 3300) were arrayed at 25 sites throughout the Cache Valley and at 11 sites located along the Wasatch Front. These networks sampled for three 7-day periods, during the same study time frame. Ion chromatographic (IC) analyses of the sample pads are not yet finalized; however, preliminary results show concentrations in the tens of ppb and seemingly spatially correlate with previous studies showing elevated wintertime values.

  1. Simulation of flow and habitat conditions under ice, Cache la Poudre River - January 2006

    USGS Publications Warehouse

    Waddle, Terry

    2007-01-01

    The objectives of this study are (1) to describe the extent and thickness of ice cover, (2) simulate depth and velocity under ice at the study site for observed and reduced flows, and (3) to quantify fish habitat in this portion of the mainstem Cache la Poudre River for the current winter release schedule as well as for similar conditions without the 0.283 m3/s winter release.

  2. The Named-State Register File

    DTIC Science & Technology

    1993-08-01

    on the Lempel - Ziv [44] algo- rithm. Zip is compressing a single 8,017 byte file. " RTLSim An register transfer language simulator for the Message...package. gordoni@cs.adelaide.edu.au, Wynn Vale, 5127, Australia, 1.0 edition, October 1991. [44] Ziv J. and Lempel A. "A universal algorithm for...fixed hardware algorithm . Some data caches allow the program to explicitly allocate cache lines [68]. This allocation is only useful in writing new data

  3. Multicluster

    DTIC Science & Technology

    1993-03-01

    CLUSTER A CLUSTER B .UDP D "Orequeqes ProxyDistribute 0 Figure 4-4: HOSTALL Implementation HOST_ALL is implemented as follows. The kernel looks up the...it includes the HOSTALL request as an argument. The generic CronusHost object is managed by the Cronus Kernel. A kernel that receives a ProxyDistnbute...request uses its cached service information to send the HOSTALL request to each host in its cluster via UDP. If the kernel has no cached information

  4. An integrated GIS/remote sensing data base in North Cache soil conservation district, Utah: A pilot project for the Utah Department of Agriculture's RIMS (Resource Inventory and Monitoring System)

    NASA Technical Reports Server (NTRS)

    Wheeler, D. J.; Ridd, M. K.; Merola, J. A.

    1984-01-01

    A basic geographic information system (GIS) for the North Cache Soil Conservation District (SCD) was sought for selected resource problems. Since the resource management issues in the North Cache SCD are very complex, it is not feasible in the initial phase to generate all the physical, socioeconomic, and political baseline data needed for resolving all management issues. A selection of critical varables becomes essential. Thus, there are foud specific objectives: (1) assess resource management needs and determine which resource factors ae most fundamental for building a beginning data base; (2) evaluate the variety of data gathering and analysis techniques for the resource factors selected; (3) incorporate the resulting data into a useful and efficient digital data base; and (4) demonstrate the application of the data base to selected real world resoource management issues.

  5. Josephson 4 K-bit cache memory design for a prototype signal processor. I - General overview

    NASA Astrophysics Data System (ADS)

    Henkels, W. H.; Geppert, L. M.; Kadlec, J.; Epperlein, P. W.; Beha, H.

    1985-09-01

    In the early stages of thg Josephson computer project conducted at an American computer company, it was recognized that a very fast cache memory was needed to complement Josephson logic. A subnanosecond access time memory was implemented experimentally on the basis of a 2.5-micron Pb-alloy technology. It was then decided to switch over to a Nb-base-electrode technology with the objective to alleviate problems with the long-term reliability and aging of Pb-based junctions. The present paper provides a general overview of the status of a 4 x 1 K-bit Josephson cache design employing a 2.5-micron Nb-edge-junction technology. Attention is given to the fabrication process and its implications, aspects of circuit design methodology, an overview of system environment and chip components, design changes and status, and various difficulties and uncertainties.

  6. Study on data acquisition system based on reconfigurable cache technology

    NASA Astrophysics Data System (ADS)

    Zhang, Qinchuan; Li, Min; Jiang, Jun

    2018-03-01

    Waveform capture rate is one of the key features of digital acquisition systems, which represents the waveform processing capability of the system in a unit time. The higher the waveform capture rate is, the larger the chance to capture elusive events is and the more reliable the test result is. First, this paper analyzes the impact of several factors on the waveform capture rate of the system, then the novel technology based on reconfigurable cache is further proposed to optimize system architecture, and the simulation results show that the signal-to-noise ratio of signal, capacity, and structure of cache have significant effects on the waveform capture rate. Finally, the technology is demonstrated by the engineering practice, and the results show that the waveform capture rate of the system is improved substantially without significant increase of system's cost, and the technology proposed has a broad application prospect.

  7. Better Cognitive Performance in Elderly Taking Antioxidant Vitamins E and C Supplements in Combination with NSAIDs: The Cache County Study

    PubMed Central

    Fotuhi, Majid; Zandi, Peter P.; Hayden, Kathleen M.; Khachaturian, Ara S.; Szekely, Christine A.; Wengreen, Heidi; Munger, Ronald G.; Norton, Maria C.; Tschanz, JoAnn T.; Lyketsos, Constantine G.; Breitner, John C.S.; Welsh-Bohmer, Kathleen A.

    2009-01-01

    Studies have shown less cognitive decline and lower risk of Alzheimer's disease in elderly individuals consuming either antioxidant vitamins or non-steroidal anti-inflammatory drugs (NSAIDs). The potential of added benefit from their combined use has not been studied. We therefore analyzed data from 3,376 elderly participants of the Cache County Study who were given the Modified Mini-Mental State (3MS) examination up to three times over eight years. Those who used a combination of Vitamins E and C supplements and NSAIDs at baseline declined by an average 0.96 fewer points every 3 years than non-users (p<0.05). This apparent effect was attributable entirely to participants with the APOE ε4 allele, whose users declined by 2.25 fewer points than non-users every 3 years (p<0.05). These results suggest that among elderly individuals with an APOE ε4 allele, there is an association between using antioxidant supplements in combination with NSAIDs and less cognitive decline over time. PMID:18631971

  8. The 'last mile' of data handling: Fermilab's IFDH tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lyon, Adam L.; Mengel, Marc W.

    2014-01-01

    IFDH (Intensity Frontier Data Handling), is a suite of tools for data movement tasks for Fermilab experiments and is an important part of the FIFE[2] (Fabric for Intensity Frontier [1] Experiments) initiative described at this conference. IFDH encompasses moving input data from caches or storage elements to compute nodes (the 'last mile' of data movement) and moving output data potentially to those caches as part of the journey back to the user. IFDH also involves throttling and locking to ensure that large numbers of jobs do not cause data movement bottlenecks. IFDH is realized as an easy to use layermore » that users call in their job scripts (e.g. 'ifdh cp'), hiding the low level data movement tools. One advantage of this layer is that the underlying low level tools can be selected or changed without the need for the user to alter their scripts. Logging and performance monitoring can also be added easily. This system will be presented in detail as well as its impact on the ease of data handling at Fermilab experiments.« less

  9. Advanced texture filtering: a versatile framework for reconstructing multi-dimensional image data on heterogeneous architectures

    NASA Astrophysics Data System (ADS)

    Zellmann, Stefan; Percan, Yvonne; Lang, Ulrich

    2015-01-01

    Reconstruction of 2-d image primitives or of 3-d volumetric primitives is one of the most common operations performed by the rendering components of modern visualization systems. Because this operation is often aided by GPUs, reconstruction is typically restricted to first-order interpolation. With the advent of in situ visualization, the assumption that rendering algorithms are in general executed on GPUs is however no longer adequate. We thus propose a framework that provides versatile texture filtering capabilities: up to third-order reconstruction using various types of cubic filtering and interpolation primitives; cache-optimized algorithms that integrate seamlessly with GPGPU rendering or with software rendering that was optimized for cache-friendly "Structure of Array" (SoA) access patterns; a memory management layer (MML) that gracefully hides the complexities of extra data copies necessary for memory access optimizations such as swizzling, for rendering on GPGPUs, or for reconstruction schemes that rely on pre-filtered data arrays. We prove the effectiveness of our software architecture by integrating it into and validating it using the open source direct volume rendering (DVR) software DeskVOX.

  10. Better cognitive performance in elderly taking antioxidant vitamins E and C supplements in combination with nonsteroidal anti-inflammatory drugs: the Cache County Study.

    PubMed

    Fotuhi, Majid; Zandi, Peter P; Hayden, Kathleen M; Khachaturian, Ara S; Szekely, Christine A; Wengreen, Heidi; Munger, Ronald G; Norton, Maria C; Tschanz, Joann T; Lyketsos, Constantine G; Breitner, John C S; Welsh-Bohmer, Kathleen

    2008-05-01

    Studies have shown less cognitive decline and lower risk of Alzheimer's disease in elderly individuals consuming either antioxidant vitamins or nonsteroidal anti-inflammatory drugs (NSAIDs). The potential of added benefit from their combined use has not been studied. We therefore analyzed data from 3,376 elderly participants of the Cache County Study who were given the Modified Mini-Mental State examination up to three times during a period of 8 years. Those who used a combination of vitamins E and C supplements and NSAIDs at baseline declined by an average 0.96 fewer points every 3 years than nonusers (P < .05). This apparent effect was attributable entirely to participants with the APOE epsilon4 allele, whose users declined by 2.25 fewer points than nonusers every 3 years (P < .05). These results suggest that among elderly individuals with an APOE epsilon4 allele, there is an association between using antioxidant supplements in combination with NSAIDs and less cognitive decline over time.

  11. Cyber-Physical Multi-Core Optimization for Resource and Cache Effects (C2ORES)

    DTIC Science & Technology

    2014-03-01

    DoD-sponsored ATAACK mobile cloud testbed funded through the DURIP program, which is deployed at Virginia Tech and Vanderbilt University to conduct...0.9.2. Jug was configured to use a filesystem (network file system (nfs)) backend for locking and task synchronization. 4.1.7.2 Experiment 1...and performance-aware virtual machine placement technique that is realized as cloud infrastructure middleware. The key contributions of iPlace include

  12. Feasibility Report and Environmental Statement for Water Resources Development, Cache Creek Basin, California

    DTIC Science & Technology

    1979-02-01

    classified as Porno , Lake Miwok, and Patwin. Recent surveys within the Clear Lake-Cache Creek Basin have located 28 archeological sites, some of which...additional 8,400 acre-feet annually to the Lakeport area. Porno Reservoir on Kelsey Creek, being studied by Lake County, also would supplement M&l water...project on Scotts Creek could provide 9,100 acre- feet annually of irrigation water. Also, as previously discussed, Porno Reservoir would furnish

  13. Optimizing physician access to surgical intensive care unit laboratory information through mobile computing.

    PubMed

    Strain, J J; Felciano, R M; Seiver, A; Acuff, R; Fagan, L

    1996-01-01

    Approximately 30 minutes of computer access time are required by surgical residents at Stanford University Medical Center (SUMC) to examine the lab values of all patients on a surgical intensive care unit (ICU) service, a task that must be performed several times a day. To reduce the time accessing this information and simultaneously increase the readability and currency of the data, we have created a mobile, pen-based user interface and software system that delivers lab results to surgeons in the ICU. The ScroungeMaster system, loaded on a portable tablet computer, retrieves lab results for a subset of patients from the central laboratory computer and stores them in a local database cache. The cache can be updated on command; this update takes approximately 2.7 minutes for all ICU patients being followed by the surgeon, and can be performed as a background task while the user continues to access selected lab results. The user interface presents lab results according to physiologic system. Which labs are displayed first is governed by a layout selection algorithm based on previous accesses to the patient's lab information, physician preferences, and the nature of the patient's medical condition. Initial evaluation of the system has shown that physicians prefer the ScroungeMaster interface to that of existing systems at SUMC and are satisfied with the system's performance. We discuss the evolution of ScroungeMaster and make observations on changes to physician work flow with the presence of mobile, pen-based computing in the ICU.

  14. Multithreaded Stochastic PDES for Reactions and Diffusions in Neurons.

    PubMed

    Lin, Zhongwei; Tropper, Carl; Mcdougal, Robert A; Patoary, Mohammand Nazrul Ishlam; Lytton, William W; Yao, Yiping; Hines, Michael L

    2017-07-01

    Cells exhibit stochastic behavior when the number of molecules is small. Hence a stochastic reaction-diffusion simulator capable of working at scale can provide a more accurate view of molecular dynamics within the cell. This paper describes a parallel discrete event simulator, Neuron Time Warp-Multi Thread (NTW-MT), developed for the simulation of reaction diffusion models of neurons. To the best of our knowledge, this is the first parallel discrete event simulator oriented towards stochastic simulation of chemical reactions in a neuron. The simulator was developed as part of the NEURON project. NTW-MT is optimistic and thread-based, which attempts to capitalize on multi-core architectures used in high performance machines. It makes use of a multi-level queue for the pending event set and a single roll-back message in place of individual anti-messages to disperse contention and decrease the overhead of processing rollbacks. Global Virtual Time is computed asynchronously both within and among processes to get rid of the overhead for synchronizing threads. Memory usage is managed in order to avoid locking and unlocking when allocating and de-allocating memory and to maximize cache locality. We verified our simulator on a calcium buffer model. We examined its performance on a calcium wave model, comparing it to the performance of a process based optimistic simulator and a threaded simulator which uses a single priority queue for each thread. Our multi-threaded simulator is shown to achieve superior performance to these simulators. Finally, we demonstrated the scalability of our simulator on a larger CICR model and a more detailed CICR model.

  15. Interactive distributed hardware-accelerated LOD-sprite terrain rendering with stable frame rates

    NASA Astrophysics Data System (ADS)

    Swan, J. E., II; Arango, Jesus; Nakshatrala, Bala K.

    2002-03-01

    A stable frame rate is important for interactive rendering systems. Image-based modeling and rendering (IBMR) techniques, which model parts of the scene with image sprites, are a promising technique for interactive systems because they allow the sprite to be manipulated instead of the underlying scene geometry. However, with IBMR techniques a frequent problem is an unstable frame rate, because generating an image sprite (with 3D rendering) is time-consuming relative to manipulating the sprite (with 2D image resampling). This paper describes one solution to this problem, by distributing an IBMR technique into a collection of cooperating threads and executable programs across two computers. The particular IBMR technique distributed here is the LOD-Sprite algorithm. This technique uses a multiple level-of-detail (LOD) scene representation. It first renders a keyframe from a high-LOD representation, and then caches the frame as an image sprite. It renders subsequent spriteframes by texture-mapping the cached image sprite into a lower-LOD representation. We describe a distributed architecture and implementation of LOD-Sprite, in the context of terrain rendering, which takes advantage of graphics hardware. We present timing results which indicate we have achieved a stable frame rate. In addition to LOD-Sprite, our distribution method holds promise for other IBMR techniques.

  16. A Comprehensive and Cost-Effective Computer Infrastructure for K-12 Schools

    NASA Technical Reports Server (NTRS)

    Warren, G. P.; Seaton, J. M.

    1996-01-01

    Since 1993, NASA Langley Research Center has been developing and implementing a low-cost Internet connection model, including system architecture, training, and support, to provide Internet access for an entire network of computers. This infrastructure allows local area networks which exceed 50 machines per school to independently access the complete functionality of the Internet by connecting to a central site, using state-of-the-art commercial modem technology, through a single standard telephone line. By locating high-cost resources at this central site and sharing these resources and their costs among the school districts throughout a region, a practical, efficient, and affordable infrastructure for providing scale-able Internet connectivity has been developed. As the demand for faster Internet access grows, the model has a simple expansion path that eliminates the need to replace major system components and re-train personnel. Observations of optical Internet usage within an environment, particularly school classrooms, have shown that after an initial period of 'surfing,' the Internet traffic becomes repetitive. By automatically storing requested Internet information on a high-capacity networked disk drive at the local site (network based disk caching), then updating this information only when it changes, well over 80 percent of the Internet traffic that leaves a location can be eliminated by retrieving the information from the local disk cache.

  17. Performance optimization of internet firewalls

    NASA Astrophysics Data System (ADS)

    Chiueh, Tzi-cker; Ballman, Allen

    1997-01-01

    Internet firewalls control the data traffic in and out of an enterprise network by checking network packets against a set of rules that embodies an organization's security policy. Because rule checking is computationally more expensive than routing-table look-up, it could become a potential bottleneck for scaling up the performance of IP routers, which typically implement firewall functions in software. in this paper, we analyzed the performance problems associated with firewalls, particularly packet filters, propose a good connection cache to amortize the costly security check over the packets in a connection, and report the preliminary performance results of a trace-driven simulation that show the average packet check time can be reduced by a factor of 2.5 at the least.

  18. Achieving Sub-Second Search in the CMR

    NASA Astrophysics Data System (ADS)

    Gilman, J.; Baynes, K.; Pilone, D.; Mitchell, A. E.; Murphy, K. J.

    2014-12-01

    The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA's Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation. The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes: * Architectural features like immutability and backpressure. * Data management techniques such as caching and parallel loading that give big performance gains. * Open Source and COTS tools like Elasticsearch search engine. * Adoption of Clojure, a functional programming language for the Java Virtual Machine. * Development of a custom spatial search plugin for Elasticsearch and why it was necessary. * Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.

  19. The Acceleration of Structural Microarchitectural Simulation via Scheduling

    DTIC Science & Technology

    2006-11-01

    193 viii List of Tables 1.1 Size of Intel R ©Processors...Table 1.1 shows the total and estimated non-cache transistor counts in succeeding generations of Intel R ©microprocessors. (Cache array transistors are...Intel486TM 1989 1,200,000 800,000 Intel R ©Pentium R © 1993 3,100,000 2,300,000 Intel R ©Pentium R ©II 1997 7,500,000 5,500,000 Intel R ©Pentium R ©III 1999

  20. Developing Tools and Technologies to Meet MSR Planetary Protection Requirements

    NASA Technical Reports Server (NTRS)

    Lin, Ying

    2013-01-01

    This paper describes the tools and technologies that need to be developed for a Caching Rover mission in order to meet the overall Planetary Protection requirements for future Mars Sample Return (MSR) campaign. This is the result of an eight-month study sponsored by the Mars Exploration Program Office. The goal of this study is to provide a future MSR project with a focused technology development plan for achieving the necessary planetary protection and sample integrity capabilities for a Mars Caching Rover mission.

Top