prefix cache memory: Topics by Science.gov

Sample records for prefix cache memory

Consolidation and reconsolidation of memory in black-capped chickadees (Poecile atricapillus).

PubMed

Barrett, Matthew C; Sherry, David F

2012-12-01

Multiple phases of protein synthesis are necessary for the synaptic modifications that consolidate long-term memory. The reconsolidation hypothesis supposes that information in long-term memory becomes labile and subject to change when retrieved and must be reconsolidated into long-term memory. The current study used the protein synthesis inhibitor anisomycin to examine memory consolidation in birds and to test the reconsolidation hypothesis. Black-capped chickadees store food and usually remember which of their caches they have emptied and which they have left full. In Experiment 1, anisomycin was injected either immediately and 2 hr after food caching, or 4 and 6 hr after food caching. Inhibition of protein synthesis impaired memory for cache sites 24 and 48 hr later. In Experiment 2, it was hypothesized that long-term memory for food caches becomes labile as predicted by the reconsolidation hypothesis when birds search for caches. Anisomycin was administered immediately after chickadees had searched for their caches. Inhibition of protein synthesis should disrupt memory for caches left full if these sites are retrieved from long-term memory and require reconsolidation. Control birds were later more likely to revisit full caches than caches they had emptied. Birds given anisomycin revisited both kinds of caches and did not distinguish between them. This result shows that reconsolidation of full caches into long-term memory is not necessary following search for cache sites, but also shows that protein synthesis-dependent consolidation is required for updating the status of emptied caches.
Optoelectronic-cache memory system architecture.

PubMed

Chiarulli, D M; Levitan, S P

1996-05-10

We present an investigation of the architecture of an optoelectronic cache that can integrate terabit optical memories with the electronic caches associated with high-performance uniprocessors and multiprocessors. The use of optoelectronic-cache memories enables these terabit technologies to provide transparently low-latency secondary memory with frame sizes comparable with disk pages but with latencies that approach those of electronic secondary-cache memories. This enables the implementation of terabit memories with effective access times comparable with the cycle times of current microprocessors. The cache design is based on the use of a smart-pixel array and combines parallel free-space optical input-output to-and-from optical memory with conventional electronic communication to the processor caches. This cache and the optical memory system to which it will interface provide a large random-access memory space that has a lower overall latency than that of magnetic disks and disk arrays. In addition, as a consequence of the high-bandwidth parallel input-output capabilities of optical memories, fault service times for the optoelectronic cache are substantially less than those currently achievable with any rotational media.
Cache write generate for parallel image processing on shared memory architectures.

PubMed

Wittenbrink, C M; Somani, A K; Chen, C H

1996-01-01

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Elements of episodic-like memory in animals.

PubMed

Clayton, N S; Griffiths, D P; Emery, N J; Dickinson, A

2001-09-29

A number of psychologists have suggested that episodic memory is a uniquely human phenomenon and, until recently, there was little evidence that animals could recall a unique past experience and respond appropriately. Experiments on food-caching memory in scrub jays question this assumption. On the basis of a single caching episode, scrub jays can remember when and where they cached a variety of foods that differ in the rate at which they degrade, in a way that is inexplicable by relative familiarity. They can update their memory of the contents of a cache depending on whether or not they have emptied the cache site, and can also remember where another bird has hidden caches, suggesting that they encode rich representations of the caching event. They make temporal generalizations about when perishable items should degrade and also remember the relative time since caching when the same food is cached in distinct sites at different times. These results show that jays form integrated memories for the location, content and time of caching. This memory capability fulfils Tulving's behavioural criteria for episodic memory and is thus termed 'episodic-like'. We suggest that several features of episodic memory may not be unique to humans.
Conditional load and store in a shared memory

DOEpatents

Blumrich, Matthias A; Ohmacht, Martin

2015-02-03

A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Lingda; Hayes, Ari; Song, Shuaiwen

Modern GPUs employ cache to improve memory system efficiency. However, large amount of cache space is underutilized due to irregular memory accesses and poor spatial locality which exhibited commonly in GPU applications. Our experiments show that using smaller cache lines could improve cache space utilization, but it also frequently suffers from significant performance loss by introducing large amount of extra cache requests. In this work, we propose a novel cache design named tag-split cache (TSC) that enables fine-grained cache storage to address the problem of cache space underutilization while keeping memory request number unchanged. TSC divides tag into two partsmore » to reduce storage overhead, and it supports multiple cache line replacement in one cycle.« less
Generalized enhanced suffix array construction in external memory.

PubMed

Louza, Felipe A; Telles, Guilherme P; Hoffmann, Steve; Ciferri, Cristina D A

2017-01-01

Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction. In: Proceedings of CPM. pp 201-10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, [Formula: see text] showed a competitive performance when compared to [Formula: see text] and [Formula: see text], which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm. The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.
Cost aware cache replacement policy in shared last-level cache for hybrid memory based fog computing

NASA Astrophysics Data System (ADS)

Jia, Gangyong; Han, Guangjie; Wang, Hao; Wang, Feng

2018-04-01

Fog computing requires a large main memory capacity to decrease latency and increase the Quality of Service (QoS). However, dynamic random access memory (DRAM), the commonly used random access memory, cannot be included into a fog computing system due to its high consumption of power. In recent years, non-volatile memories (NVM) such as Phase-Change Memory (PCM) and Spin-transfer torque RAM (STT-RAM) with their low power consumption have emerged to replace DRAM. Moreover, the currently proposed hybrid main memory, consisting of both DRAM and NVM, have shown promising advantages in terms of scalability and power consumption. However, the drawbacks of NVM, such as long read/write latency give rise to potential problems leading to asymmetric cache misses in the hybrid main memory. Current last level cache (LLC) policies are based on the unified miss cost, and result in poor performance in LLC and add to the cost of using NVM. In order to minimize the cache miss cost in the hybrid main memory, we propose a cost aware cache replacement policy (CACRP) that reduces the number of cache misses from NVM and improves the cache performance for a hybrid memory system. Experimental results show that our CACRP behaves better in LLC performance, improving performance up to 43.6% (15.5% on average) compared to LRU.
Long-term moderate elevation of corticosterone facilitates avian food-caching behaviour and enhances spatial memory.

PubMed

Pravosudov, Vladimir V

2003-12-22

It is widely assumed that chronic stress and corresponding chronic elevations of glucocorticoid levels have deleterious effects on animals' brain functions such as learning and memory. Some animals, however, appear to maintain moderately elevated levels of glucocorticoids over long periods of time under natural energetically demanding conditions, and it is not clear whether such chronic but moderate elevations may be adaptive. I implanted wild-caught food-caching mountain chickadees (Poecile gambeli), which rely at least in part on spatial memory to find their caches, with 90-day continuous time-release corticosterone pellets designed to approximately double the baseline corticosterone levels. Corticosterone-implanted birds cached and consumed significantly more food and showed more efficient cache recovery and superior spatial memory performance compared with placebo-implanted birds. Thus, contrary to prevailing assumptions, long-term moderate elevations of corticosterone appear to enhance spatial memory in food-caching mountain chickadees. These results suggest that moderate chronic elevation of corticosterone may serve as an adaptation to unpredictable environments by facilitating feeding and food-caching behaviour and by improving cache-retrieval efficiency in food-caching birds.
Don’t make cache too complex: A simple probability-based cache management scheme for SSDs

PubMed Central

Cho, Sangyeun; Choi, Jongmoo

2017-01-01

Solid-state drives (SSDs) have recently become a common storage component in computer systems, and they are fueled by continued bit cost reductions achieved with smaller feature sizes and multiple-level cell technologies. However, as the flash memory stores more bits per cell, the performance and reliability of the flash memory degrade substantially. To solve this problem, a fast non-volatile memory (NVM-)based cache has been employed within SSDs to reduce the long latency required to write data. Absorbing small writes in a fast NVM cache can also reduce the number of flash memory erase operations. To maximize the benefits of an NVM cache, it is important to increase the NVM cache utilization. In this paper, we propose and study ProCache, a simple NVM cache management scheme, that makes cache-entrance decisions based on random probability testing. Our scheme is motivated by the observation that frequently written hot data will eventually enter the cache with a high probability, and that infrequently accessed cold data will not enter the cache easily. Owing to its simplicity, ProCache is easy to implement at a substantially smaller cost than similar previously studied techniques. We evaluate ProCache and conclude that it achieves comparable performance compared to a more complex reference counter-based cache-management scheme. PMID:28358897
Don't make cache too complex: A simple probability-based cache management scheme for SSDs.

PubMed

Baek, Seungjae; Cho, Sangyeun; Choi, Jongmoo

2017-01-01

Solid-state drives (SSDs) have recently become a common storage component in computer systems, and they are fueled by continued bit cost reductions achieved with smaller feature sizes and multiple-level cell technologies. However, as the flash memory stores more bits per cell, the performance and reliability of the flash memory degrade substantially. To solve this problem, a fast non-volatile memory (NVM-)based cache has been employed within SSDs to reduce the long latency required to write data. Absorbing small writes in a fast NVM cache can also reduce the number of flash memory erase operations. To maximize the benefits of an NVM cache, it is important to increase the NVM cache utilization. In this paper, we propose and study ProCache, a simple NVM cache management scheme, that makes cache-entrance decisions based on random probability testing. Our scheme is motivated by the observation that frequently written hot data will eventually enter the cache with a high probability, and that infrequently accessed cold data will not enter the cache easily. Owing to its simplicity, ProCache is easy to implement at a substantially smaller cost than similar previously studied techniques. We evaluate ProCache and conclude that it achieves comparable performance compared to a more complex reference counter-based cache-management scheme.
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montry, G.R.; Benner, R.E.

1985-12-01

The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
Cache as point of coherence in multiprocessor system

DOEpatents

Blumrich, Matthias A.; Ceze, Luis H.; Chen, Dong; Gara, Alan; Heidelberger, Phlip; Ohmacht, Martin; Steinmacher-Burow, Burkhard; Zhuang, Xiaotong

2016-11-29

In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
OS friendly microprocessor architecture: Hardware level computer security

NASA Astrophysics Data System (ADS)

Jungwirth, Patrick; La Fratta, Patrick

2016-05-01

We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.
Cache directory look-up re-use as conflict check mechanism for speculative memory requests

DOEpatents

Ohmacht, Martin

2013-09-10

In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.
A cache-aided multiprocessor rollback recovery scheme

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent

1989-01-01

This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Changes in spatial memory mediated by experimental variation in food supply do not affect hippocampal anatomy in mountain chickadees (Poecile gambeli).

PubMed

Pravosudov, V V; Lavenex, P; Clayton, N S

2002-05-01

Earlier reports suggested that seasonal variation in food-caching behavior (caching intensity and cache retrieval accuracy) might correlate with morphological changes in the hippocampal formation, a brain structure thought to play a role in remembering cache locations. We demonstrated that changes in cache retrieval accuracy can also be triggered by experimental variation in food supply: captive mountain chickadees (Poecile gambeli) maintained on limited and unpredictable food supply were more accurate at recovering their caches and performed better on spatial memory tests than birds maintained on ad libitum food. In this study, we investigated whether these two treatment groups also differed in the volume and neuron number of the hippocampal formation. If variation in memory for food caches correlates with hippocampal size, then our birds with enhanced cache recovery and spatial memory performance should have larger hippocampal volumes and total neuron numbers. Contrary to this prediction we found no significant differences in volume or total neuron number of the hippocampal formation between the two treatment groups. Our results therefore indicate that changes in food-caching behavior and spatial memory performance, as mediated by experimental variations in food supply, are not necessarily accompanied by morphological changes in volume or neuron number of the hippocampal formation in fully developed, experienced food-caching birds. Copyright 2002 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Zhang, Zhao

With each CMOS technology generation, leakage energy consumption has been dramatically increasing and hence, managing leakage power consumption of large last-level caches (LLCs) has become a critical issue in modern processor design. In this paper, we present EnCache, a novel software-based technique which uses dynamic profiling-based cache reconfiguration for saving cache leakage energy. EnCache uses a simple hardware component called profiling cache, which dynamically predicts energy efficiency of an application for 32 possible cache configurations. Using these estimates, system software reconfigures the cache to the most energy efficient configuration. EnCache uses dynamic cache reconfiguration and hence, it does not requiremore » offline profiling or tuning the parameter for each application. Furthermore, EnCache optimizes directly for the overall memory subsystem (LLC and main memory) energy efficiency instead of the LLC energy efficiency alone. The experiments performed with an x86-64 simulator and workloads from SPEC2006 suite confirm that EnCache provides larger energy saving than a conventional energy saving scheme. For single core and dual-core system configurations, the average savings in memory subsystem energy over a shared baseline configuration are 30.0% and 27.3%, respectively.« less
Do Clark's nutcrackers demonstrate what-where-when memory on a cache-recovery task?

PubMed

Gould, Kristy L; Ort, Amy J; Kamil, Alan C

2012-01-01

What-where-when (WWW) memory during cache recovery was investigated in six Clark's nutcrackers. During caching, both red- and blue-colored pine seeds were cached by the birds in holes filled with sand. Either a short (3 day) retention interval (RI) or a long (9 day) RI was followed by a recovery session during which caches were replaced with either a single seed or wooden bead depending upon the color of the cache and length of the retention interval. Knowledge of what was in the cache (seed or bead), where it was located, and when the cache had been made (3 or 9 days ago) were the three WWW memory components under investigation. Birds recovered items (bead or seed) at above chance levels, demonstrating accurate spatial memory. They also recovered seeds more than beads after the long RI, but not after the short RI, when they recovered seeds and beads equally often. The differential recovery after the long RI demonstrates that nutcrackers may have the capacity for WWW memory during this task, but it is not clear why it was influenced by RI duration.
Practical Algorithms for the Longest Common Extension Problem

NASA Astrophysics Data System (ADS)

Ilie, Lucian; Tinta, Liviu

The Longest Common Extension problem considers a string s and computes, for each of a number of pairs (i,j), the longest substring of s that starts at both i and j. It appears as a subproblem in many fundamental string problems and can be solved by linear-time preprocessing of the string that allows (worst-case) constant-time computation for each pair. The two known approaches use powerful algorithms: either constant-time computation of the Lowest Common Ancestor in trees or constant-time computation of Range Minimum Queries (RMQ) in arrays. We show here that, from practical point of view, such complicated approaches are not needed. We give two very simple algorithms for this problem that require no preprocessing. The first needs only the string and is significantly faster than all previous algorithms on the average. The second combines the first with a direct RMQ computation on the Longest Common Prefix array. It takes advantage of the superior speed of the cache memory and is the fastest on virtually all inputs.

Single-pass memory system evaluation for multiprogramming workloads

NASA Technical Reports Server (NTRS)

Conte, Thomas M.; Hwu, Wen-Mei W.

1990-01-01

Modern memory systems are composed of levels of cache memories, a virtual memory system, and a backing store. Varying more than a few design parameters and measuring the performance of such systems has traditionally be constrained by the high cost of simulation. Models of cache performance recently introduced reduce the cost simulation but at the expense of accuracy of performance prediction. Stack-based methods predict performance accurately using one pass over the trace for all cache sizes, but these techniques have been limited to fully-associative organizations. This paper presents a stack-based method of evaluating the performance of cache memories using a recurrence/conflict model for the miss ratio. Unlike previous work, the performance of realistic cache designs, such as direct-mapped caches, are predicted by the method. The method also includes a new approach to the problem of the effects of multiprogramming. This new technique separates the characteristics of the individual program from that of the workload. The recurrence/conflict method is shown to be practical, general, and powerful by comparing its performance to that of a popular traditional cache simulator. The authors expect that the availability of such a tool will have a large impact on future architectural studies of memory systems.
Mapping virtual addresses to different physical addresses for value disambiguation for thread memory access requests

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gala, Alan; Ohmacht, Martin

A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memorymore » access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.« less
a Cache Design Method for Spatial Information Visualization in 3d Real-Time Rendering Engine

NASA Astrophysics Data System (ADS)

Dai, X.; Xiong, H.; Zheng, X.

2012-07-01

A well-designed cache system has positive impacts on the 3D real-time rendering engine. As the amount of visualization data getting larger, the effects become more obvious. They are the base of the 3D real-time rendering engine to smoothly browsing through the data, which is out of the core memory, or from the internet. In this article, a new kind of caches which are based on multi threads and large file are introduced. The memory cache consists of three parts, the rendering cache, the pre-rendering cache and the elimination cache. The rendering cache stores the data that is rendering in the engine; the data that is dispatched according to the position of the view point in the horizontal and vertical directions is stored in the pre-rendering cache; the data that is eliminated from the previous cache is stored in the eliminate cache and is going to write to the disk cache. Multi large files are used in the disk cache. When a disk cache file size reaches the limit length（128M is the top in the experiment), no item will be eliminated from the file, but a new large cache file will be created. If the large file number is greater than the maximum number that is pre-set, the earliest file will be deleted from the disk. In this way, only one file is opened for writing and reading, and the rest are read-only so the disk cache can be used in a high asynchronous way. The size of the large file is limited in order to map to the core memory to save loading time. Multi-thread is used to update the cache data. The threads are used to load data to the rendering cache as soon as possible for rendering, to load data to the pre-rendering cache for rendering next few frames, and to load data to the elimination cache which is not necessary for the moment. In our experiment, two threads are designed. The first thread is to organize the memory cache according to the view point, and created two threads: the adding list and the deleting list, the adding list index the data that should be loaded to the pre-rendering cache immediately, the deleting list index the data that is no longer visible in the rendering scene and should be moved to the eliminate cache; the other thread is to move the data in the memory and disk cache according to the adding and the deleting list, and create the download requests when the data is indexed in the adding but cannot be found either in memory cache or disk cache, eliminate cache data is moved to the disk cache when the adding list and deleting are empty. The cache designed as described above in our experiment shows reliable and efficient, and the data loading time and files I/O time decreased sharply, especially when the rendering data getting larger.
Is random access memory random?

NASA Technical Reports Server (NTRS)

Denning, P. J.

1986-01-01

Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.
Ordering of guarded and unguarded stores for no-sync I/O

DOEpatents

Gara, Alan; Ohmacht, Martin

2013-06-25

A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
An area model for on-chip memories and its application

NASA Technical Reports Server (NTRS)

Mulder, Johannes M.; Quach, Nhon T.; Flynn, Michael J.

1991-01-01

An area model suitable for comparing data buffers of different organizations and arbitrary sizes is described. The area model considers the supplied bandwidth of a memory cell and includes such buffer overhead as control logic, driver logic, and tag storage. The model gave less than 10 percent error when verified against real caches and register files. It is shown that, comparing caches and register files in terms of area for the same storage capacity, caches generally occupy more area per bit than register files for small caches because the overhead dominates the cache area at these sizes. For larger caches, the smaller storage cells in the cache provide a smaller total cache area per bit than the register set. Studying cache performance (traffic ratio) as a function of area, it is shown that, for small caches, direct-mapped caches perform significantly better than four-way set-associative caches and, for caches of medium areas, both direct-mapped and set-associative caches perform better than fully associative caches.
Experimental evaluation of multiprocessor cache-based error recovery

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. K.

1991-01-01

Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Efficacy of Code Optimization on Cache-based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Formal verification of an MMU and MMU cache

NASA Technical Reports Server (NTRS)

Schubert, E. T.

1991-01-01

We describe the formal verification of a hardware subsystem consisting of a memory management unit and a cache. These devices are verified independently and then shown to interact correctly when composed. The MMU authorizes memory requests and translates virtual addresses to real addresses. The cache improves performance by maintaining a LRU (least recently used) list from the memory resident segment table.
Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

NASA Astrophysics Data System (ADS)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Store-operate-coherence-on-value

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Dong; Heidelberger, Philip; Kumar, Sameer

A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the receivedmore » store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.« less
Effects of demanding foraging conditions on cache retrival accuracy in food-caching mountain chickadees (Poecile gambeli).

PubMed

Pravosudov, V V; Clayton, N S

2001-02-22

Birds rely, at least in part, on spatial memory for recovering previously hidden caches but accurate cache recovery may be more critical for birds that forage in harsh conditions where the food supply is limited and unpredictable. Failure to find caches in these conditions may potentially result in death from starvation. In order to test this hypothesis we compared the cache recovery behaviour of 24 wild-caught mountain chickadees (Poecile gambeli), half of which were maintained on a limited and unpredictable food supply while the rest were maintained on an ad libitum food supply for 60 days. We then tested their cache retrieval accuracy by allowing birds from both groups to cache seeds in the experimental room and recover them 5 hours later. Our results showed that birds maintained on a limited and unpredictable food supply made significantly fewer visits to non-cache sites when recovering their caches compared to birds maintained on ad libitum food. We found the same difference in performance in two versions of a one-trial associative learning task in which the birds had to rely on memory to find previously encountered hidden food. In a non-spatial memory version of the task, in which the baited feeder was clearly marked, there were no significant differences between the two groups. We therefore concluded that the two groups differed in their efficiency at cache retrieval. We suggest that this difference is more likely to be attributable to a difference in memory (encoding or recall) than to a difference in their motivation to search for hidden food, although the possibility of some motivational differences still exists. Overall, our results suggest that demanding foraging conditions favour more accurate cache retrieval in food-caching birds.
Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX)

PubMed Central

Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus

2015-01-01

Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 – Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning. PMID:26217710
Tuning the cache memory usage in tomographic reconstruction on standard computers with Advanced Vector eXtensions (AVX).

PubMed

Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus

2015-06-01

Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 - Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning.
Rapid effects of corticosterone on cache recovery in mountain chickadees (Parus gambeli).

PubMed

Saldanha, C J; Schlinger, B A; Clayton, N S

2000-03-01

Environmental perturbations increase adrenal activity in several vertebrates. Increases in corticosterone may serve as a proximate trigger whereby organisms can rapidly adapt their behavior to survive environmental fluctuations. In food-caching songbirds, inclement weather may present the need to alter caching and/or retrieval behaviors to ensure food supplies. We hypothesized that corticosterone may increase the rate of caching and/or retrieval behaviors in the mountain chickadee, a food-storing songbird, and tested if these potential effects were mediated by alterations in appetite, activity, or memory for cache sites. Corticosterone or vehicle was administered to subjects 5 min prior to either caching or recovery in a naturalistic laboratory paradigm during which we recorded the number of caching events, sites visited, and seeds eaten (caching) or caches recovered, total sites visited, cache-related visits, and non-cache-related visits (recovery). Data were analyzed using nested ANOVA for treatment within sequential trial. There was no effect on any caching behaviors following treatment. However, birds treated with corticosterone during retrieval recovered more seeds and tended to visit more cache-related sites than did controls. Since groups did not differ in the number of seeds eaten or the total number of sites visited, it seems unlikely that corticosterone affected appetite or activity. Rapid surges in corticosterone may increase the efficacy of an underlying memory process for cache sites which is reflected in higher cache recovery in corticosterone-treated birds than in controls. Thus, rapid alterations in plasma corticosterone following environmental change may alter memory-reliant behaviors which promote survival in the food-caching mountain chickadee. Copyright 2000 Academic Press.
Comparison between BIDE, PrefixSpan, and TRuleGrowth for Mining of Indonesian Text

NASA Astrophysics Data System (ADS)

Sa'adillah Maylawati, Dian; Irfan, Mohamad; Budiawan Zulfikar, Wildan

2017-01-01

Mining proscess for Indonesian language still be an interesting research. Multiple of words representation was claimed can keep the meaning of text better than bag of words. In this paper, we compare several sequential pattern algortihm, among others BIDE (BIDirectional Extention), PrefixSpan, and TRuleGrowth. All of those algorithm produce frequent word sequence to keep the meaning of text. However, the experiment result, with 14.006 of Indonesian tweet from Twitter, shows that BIDE can produce more efficient frequent word sequence than PrefixSpan and TRuleGrowth without missing the meaning of text. Then, the average of time process of PrefixSpan is faster than BIDE and TRuleGrowth. In the other hand, PrefixSpan and TRuleGrowth is more efficient in using memory than BIDE.
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong

Recent trends of CMOS scaling and increasing number of on-chip cores have led to a large increase in the size of on-chip caches. Since SRAM has low density and consumes large amount of leakage power, its use in designing on-chip caches has become more challenging. To address this issue, researchers are exploring the use of several emerging memory technologies, such as embedded DRAM, spin transfer torque RAM, resistive RAM, phase change RAM and domain wall memory. In this paper, we survey the architectural approaches proposed for designing memory systems and, specifically, caches with these emerging memory technologies. To highlight theirmore » similarities and differences, we present a classification of these technologies and architectural approaches based on their key characteristics. We also briefly summarize the challenges in using these technologies for architecting caches. We believe that this survey will help the readers gain insights into the emerging memory device technologies, and their potential use in designing future computing systems.« less
Cache-based error recovery for shared memory multiprocessor systems

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1989-01-01

A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
Binary mesh partitioning for cache-efficient visualization.

PubMed

Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno

2010-01-01

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\\schmi O}(N\\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

NASA Astrophysics Data System (ADS)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.

Memory and the hippocampus in food-storing birds: a comparative approach.

PubMed

Clayton, N S

1998-01-01

Comparative studies provide a unique source of evidence for the role of the hippocampus in learning and memory. Within birds and mammals, the hippocampal volume of scatter-hoarding species that cache food in many different locations is enlarged, relative to the remainder of the telencephalon, when compared with than that of species which cache food in one larder, or do not cache at all. Do food-storing species show enhanced memory function in association with the volumetric enlargement of the hippocampus? Comparative studies within the parids (titmice and chickadees) and corvids (jays, nutcrackers and magpies), two families of birds which show natural variation in food-storing behavior, suggest that there may be two kinds of memory specialization associated with scatter-hoarding. First, in terms of spatial memory, several scatter-hoarding species have a more accurate and enduring spatial memory, and a preference to rely more heavily upon spatial cues, than that of closely related species which store less food, or none at all. Second, some scatter-hoarding parids and corvids are also more resistant to memory interference. While the most critical component about a cache site may be its spatial location, there is mounting evidence that food-storing birds remember additional information about the contents and status of cache sites. What is the underlying neural mechanism by which the hippocampus learns and remembers cache sites? The current mammalian dogma is that the neural mechanisms of learning and memory are achieved primarily by variations in synaptic number and efficacy. Recent work on the concomitant development of food-storing, memory and the avian hippocampus illustrates that the avian hippocampus may swell or shrivel by as much as 30% in response to presence or absence of food-storing experience. Memory for food caches triggers a dramatic increase in the total number of number of neurons within the avian hippocampus by altering the rate at which these cells are born and die.
Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

DOEpatents

Gara, Alan; Ohmacht, Martin

2014-09-16

In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
The relationship between dominance, corticosterone, memory, and food caching in mountain chickadees (Poecile gambeli).

PubMed

Pravosudov, Vladimir V; Mendoza, Sally P; Clayton, Nicola S

2003-08-01

It has been hypothesized that in avian social groups subordinate individuals should maintain more energy reserves than dominants, as an insurance against increased perceived risk of starvation. Subordinates might also have elevated baseline corticosterone levels because corticosterone is known to facilitate fattening in birds. Recent experiments showed that moderately elevated corticosterone levels resulting from unpredictable food supply are correlated with enhanced cache retrieval efficiency and more accurate performance on a spatial memory task. Given the correlation between corticosterone and memory, a further prediction is that subordinates might be more efficient at cache retrieval and show more accurate performance on spatial memory tasks. We tested these predictions in dominant-subordinate pairs of mountain chickadees (Poecile gambeli). Each pair was housed in the same cage but caching behavior was tested individually in an adjacent aviary to avoid the confounding effects of small spaces in which birds could unnaturally and directly influence each other's behavior. In sharp contrast to our hypothesis, we found that subordinate chickadees cached less food, showed less efficient cache retrieval, and performed significantly worse on the spatial memory task than dominants. Although the behavioral differences could have resulted from social stress of subordination, and dominant birds reached significantly higher levels of corticosterone during their response to acute stress compared to subordinates, there were no significant differences between dominants and subordinates in baseline levels or in the pattern of adrenocortical stress response. We find no evidence, therefore, to support the hypothesis that subordinate mountain chickadees maintain elevated baseline corticosterone levels whereas lower caching rates and inferior cache retrieval efficiency might contribute to reduced survival of subordinates commonly found in food-caching parids.
Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

NASA Astrophysics Data System (ADS)

Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

2017-05-01

In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.
Episodic-like memory during cache recovery by scrub jays.

PubMed

Clayton, N S; Dickinson, A

1998-09-17

The recollection of past experiences allows us to recall what a particular event was, and where and when it occurred, a form of memory that is thought to be unique to humans. It is known, however, that food-storing birds remember the spatial location and contents of their caches. Furthermore, food-storing animals adapt their caching and recovery strategies to the perishability of food stores, which suggests that they are sensitive to temporal factors. Here we show that scrub jays (Aphelocoma coerulescens) remember 'when' food items are stored by allowing them to recover perishable 'wax worms' (wax-moth larvae) and non-perishable peanuts which they had previously cached in visuospatially distinct sites. Jays searched preferentially for fresh wax worms, their favoured food, when allowed to recover them shortly after caching. However, they rapidly learned to avoid searching for worms after a longer interval during which the worms had decayed. The recovery preference of jays demonstrates memory of where and when particular food items were cached, thereby fulfilling the behavioural criteria for episodic-like memory in non-human animals.
Temperature and leakage aware techniques to improve cache reliability

NASA Astrophysics Data System (ADS)

Akaaboune, Adil

Decreasing power consumption in small devices such as handhelds, cell phones and high-performance processors is now one of the most critical design concerns. On-chip cache memories dominate the chip area in microprocessors and thus arises the need for power efficient cache memories. Cache is the simplest cost effective method to attain high speed memory hierarchy and, its performance is extremely critical for high speed computers. Cache is used by the microprocessor for channeling the performance gap between processor and main memory (RAM) hence the memory bandwidth is frequently a bottleneck which can affect the peak throughput significantly. In the design of any cache system, the tradeoffs of area/cost, performance, power consumption, and thermal management must be taken into consideration. Previous work has mainly concentrated on performance and area/cost constraints. More recent works have focused on low power design especially for portable devices and media-processing systems, however fewer research has been done on the relationship between heat management, Leakage power and cost per die. Lately, the focus of power dissipation in the new generations of microprocessors has shifted from dynamic power to idle power, a previously underestimated form of power loss that causes battery charge to drain and shutdown too early due the waste of energy. The problem has been aggravated by the aggressive scaling of process; device level method used originally by designers to enhance performance, conserve dissipation and reduces the sizes of digital circuits that are increasingly condensed. This dissertation studies the impact of hotspots, in the cache memory, on leakage consumption and microprocessor reliability and durability. The work will first prove that by eliminating hotspots in the cache memory, leakage power will be reduced and therefore, the reliability will be improved. The second technique studied is data quality management that improves the quality of the data stored in the cache to reduce power consumption. The initial work done on this subject focuses on the type of data that increases leakage consumption and ways to manage without impacting the performance of the microprocessor. The second phase of the project focuses on managing the data storage in different blocks of the cache to smooth the leakage power as well as dynamic power consumption. The last technique is a voltage controlled cache to reduce the leakage consumption of the cache while in execution and even in idle state. Two blocks of the 4-way set associative cache go through a voltage regulator before getting to the voltage well, and the other two are directly connected to the voltage well. The idea behind this technique is to use the replacement algorithm information to increase or decrease voltage of the two blocks depending on the need of the information stored on them.
Short-term observational spatial memory in Jackdaws (Corvus monedula) and Ravens (Corvus corax).

PubMed

Scheid, Christelle; Bugnyar, Thomas

2008-10-01

Observational spatial memory (OSM) refers to the ability of remembering food caches made by other individuals, enabling observers to find and pilfer the others' caches. Within birds, OSM has only been demonstrated in corvids, with more social species such as Mexican jays (Aphelocoma ultramarine) showing a higher accuracy of finding conspecific' caches than less social species such as Clark's nutcrackers (Nucifraga columbiana). However, socially dynamic corvids such as ravens (Corvus corax) are capable of sophisticated pilfering manoeuvres based on OSM. We here compared the performance of ravens and jackdaws (Corvus monedula) in a short-term OSM task. In contrast to ravens, jackdaws are socially cohesive but hardly cache and compete over food caches. Birds had to recover food pieces after watching a human experimenter hiding them in 2, 4 or 6 out of 10 possible locations. Results showed that for tests with two, four and six caches, ravens performed more accurately than expected by chance whereas jackdaws did not. Moreover, ravens made fewer re-visits to already inspected cache sites than jackdaws. These findings suggest that the development of observational spatial memory skills is linked with the species' reliance on food caches rather than with a social life style per se.
Performance of defect-tolerant set-associative cache memories

NASA Technical Reports Server (NTRS)

Frenzel, J. F.

1991-01-01

The increased use of on-chip cache memories has led researchers to investigate their performance in the presence of manufacturing defects. Several techniques for yield improvement are discussed and results are presented which indicate that set-associativity may be used to provide defect tolerance as well as improve the cache performance. Tradeoffs between several cache organizations and replacement strategies are investigated and it is shown that token-based replacement may be a suitable alternative to the widely-used LRU strategy.
A test of the adaptive specialization hypothesis: population differences in caching, memory, and the hippocampus in black-capped chickadees (Poecile atricapilla).

PubMed

Pravosudov, Vladimir V; Clayton, Nicola S

2002-08-01

To test the hypothesis that accurate cache recovery is more critical for birds that live in harsh conditions where the food supply is limited and unpredictable, the authors compared food caching, memory, and the hippocampus of black-capped chickadees (Poecile atricapilla) from Alaska and Colorado. Under identical laboratory conditions, Alaska chickadees (a) cached significantly more food; (b) were more efficient at cache recovery: (c) performed more accurately on one-trial associative learning tasks in which birds had to rely on spatial memory, but did not differ when tested on a nonspatial version of this task; and (d) had significantly larger hippocampal volumes containing more neurons compared with Colorado chickadees. The results support the hypothesis that these population differences may reflect adaptations to a harsh environment.
The Optimization of In-Memory Space Partitioning Trees for Cache Utilization

NASA Astrophysics Data System (ADS)

Yeo, Myung Ho; Min, Young Soo; Bok, Kyoung Soo; Yoo, Jae Soo

In this paper, a novel cache conscious indexing technique based on space partitioning trees is proposed. Many researchers investigated efficient cache conscious indexing techniques which improve retrieval performance of in-memory database management system recently. However, most studies considered data partitioning and targeted fast information retrieval. Existing data partitioning-based index structures significantly degrade performance due to the redundant accesses of overlapped spaces. Specially, R-tree-based index structures suffer from the propagation of MBR (Minimum Bounding Rectangle) information by updating data frequently. In this paper, we propose an in-memory space partitioning index structure for optimal cache utilization. The proposed index structure is compared with the existing index structures in terms of update performance, insertion performance and cache-utilization rate in a variety of environments. The results demonstrate that the proposed index structure offers better performance than existing index structures.
Pilfering Eurasian jays use visual and acoustic information to locate caches.

PubMed

Shaw, Rachael C; Clayton, Nicola S

2014-11-01

Pilfering corvids use observational spatial memory to accurately locate caches that they have seen another individual make. Accordingly, many corvid cache-protection strategies limit the transfer of visual information to potential thieves. Eurasian jays (Garrulus glandarius) employ strategies that reduce the amount of visual and auditory information that is available to competitors. Here, we test whether or not the jays recall and use both visual and auditory information when pilfering other birds' caches. When jays had no visual or acoustic information about cache locations, the proportion of available caches that they found did not differ from the proportion expected if jays were searching at random. By contrast, after observing and listening to a conspecific caching in gravel or sand, jays located a greater proportion of caches, searched more frequently in the correct substrate type and searched in fewer empty locations to find the first cache than expected. After only listening to caching in gravel and sand, jays also found a larger proportion of caches and searched in the substrate type where they had heard caching take place more frequently than expected. These experiments demonstrate that Eurasian jays possess observational spatial memory and indicate that pilfering jays may gain information about cache location merely by listening to caching. This is the first evidence that a corvid may use recalled acoustic information to locate and pilfer caches.
Sex, estradiol, and spatial memory in a food-caching corvid.

PubMed

Rensel, Michelle A; Ellis, Jesse M S; Harvey, Brigit; Schlinger, Barney A

2015-09-01

Estrogens significantly impact spatial memory function in mammalian species. Songbirds express the estrogen synthetic enzyme aromatase at relatively high levels in the hippocampus and there is evidence from zebra finches that estrogens facilitate performance on spatial learning and/or memory tasks. It is unknown, however, whether estrogens influence hippocampal function in songbirds that naturally exhibit memory-intensive behaviors, such as cache recovery observed in many corvid species. To address this question, we examined the impact of estradiol on spatial memory in non-breeding Western scrub-jays, a species that routinely participates in food caching and retrieval in nature and in captivity. We also asked if there were sex differences in performance or responses to estradiol. Utilizing a combination of an aromatase inhibitor, fadrozole, with estradiol implants, we found that while overall cache recovery rates were unaffected by estradiol, several other indices of spatial memory, including searching efficiency and efficiency to retrieve the first item, were impaired in the presence of estradiol. In addition, males and females differed in some performance measures, although these differences appeared to be a consequence of the nature of the task as neither sex consistently out-performed the other. Overall, our data suggest that a sustained estradiol elevation in a food-caching bird impairs some, but not all, aspects of spatial memory on an innate behavioral task, at times in a sex-specific manner. Copyright © 2015 Elsevier Inc. All rights reserved.
SEX, ESTRADIOL, AND SPATIAL MEMORY IN A FOOD-CACHING CORVID

PubMed Central

Rensel, Michelle A.; Ellis, Jesse M.S.; Harvey, Brigit; Schlinger, Barney A.

2015-01-01

Estrogens significantly impact spatial memory function in mammalian species. Songbirds express the estrogen synthetic enzyme aromatase at relatively high levels in the hippocampus and there is evidence from zebra finches that estrogens facilitate performance on spatial learning and/or memory tasks. It is unknown, however, whether estrogens influence hippocampal function in songbirds that naturally exhibit memory-intensive behaviors, such as cache recovery observed in many corvid species. To address this question, we examined the impact of estradiol on spatial memory in non-breeding Western scrub-jays, a species that routinely participates in food caching and retrieval in nature and in captivity. We also asked if there were sex differences in performance or responses to estradiol. Utilizing a combination of an aromatase inhibitor, fadrozole, with estradiol implants, we found that while overall cache recovery rates were unaffected by estradiol, several other indices of spatial memory, including searching efficiency and efficiency to retrieve the first item, were impaired in the presence of estradiol. In addition, males and females differed in some performance measures, although these differences appeared to be a consequence of the nature of the task as neither sex consistently out-performed the other. Overall, our data suggest that a sustained estradiol elevation in a food-caching bird impairs some, but not all, aspects of spatial memory on an innate behavioral task, at times in a sex-specific manner. PMID:26232613
Reader set encoding for directory of shared cache memory in multiprocessor system

DOEpatents

Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

2014-06-10

In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Fault Tolerant Cache Schemes

NASA Astrophysics Data System (ADS)

Tu, H.-Yu.; Tasneem, Sarah

Most of modern microprocessors employ on—chip cache memories to meet the memory bandwidth demand. These caches are now occupying a greater real es tate of chip area. Also, continuous down scaling of transistors increases the possi bility of defects in the cache area which already starts to occupies more than 50% of chip area. For this reason, various techniques have been proposed to tolerate defects in cache blocks. These techniques can be classified into three different cat egories, namely, cache line disabling, replacement with spare block, and decoder reconfiguration without spare blocks. This chapter examines each of those fault tol erant techniques with a fixed typical size and organization of L1 cache, through extended simulation using SPEC2000 benchmark on individual techniques. The de sign and characteristics of each technique are summarized with a view to evaluate the scheme. We then present our simulation results and comparative study of the three different methods.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

PubMed Central

Zheng, Da; Burns, Randal; Szalay, Alexander S.

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

PubMed

Zheng, Da; Burns, Randal; Szalay, Alexander S

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE PAGES

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.; ...

2017-09-20

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less
A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Siddique, Nafiul A.; Grubel, Patricia A.; Badawy, Abdel-Hameed A.

Cache has long been used to minimize the latency of main memory accesses by storing frequently used data near the processor. Processor performance depends on the underlying cache performance. Therefore, significant research has been done to identify the most crucial metrics of cache performance. Although the majority of research focuses on measuring cache hit rates and data movement as the primary cache performance metrics, cache utilization is significantly important. We investigate the application’s locality using cache utilization metrics. In addition, we present cache utilization and traditional cache performance metrics as the program progresses providing detailed insights into the dynamic applicationmore » behavior on parallel applications from four benchmark suites running on multiple cores. We explore cache utilization for APEX, Mantevo, NAS, and PARSEC, mostly scientific benchmark suites. Our results indicate that 40% of the data bytes in a cache line are accessed at least once before line eviction. Also, on average a byte is accessed two times before the cache line is evicted for these applications. Moreover, we present runtime cache utilization, as well as, conventional performance metrics that illustrate a holistic understanding of cache behavior. To facilitate this research, we build a memory simulator incorporated into the Structural Simulation Toolkit (Rodrigues et al. in SIGMETRICS Perform Eval Rev 38(4):37–42, 2011). Finally, our results suggest that variable cache line size can result in better performance and can also conserve power.« less

Analysis of power gating in different hierarchical levels of 2MB cache, considering variation

NASA Astrophysics Data System (ADS)

Jafari, Mohsen; Imani, Mohsen; Fathipour, Morteza

2015-09-01

This article reintroduces power gating technique in different hierarchical levels of static random-access memory (SRAM) design including cell, row, bank and entire cache memory in 16 nm Fin field effect transistor. Different structures of SRAM cells such as 6T, 8T, 9T and 10T are used in design of 2MB cache memory. The power reduction of the entire cache memory employing cell-level optimisation is 99.7% with the expense of area and other stability overheads. The power saving of the cell-level optimisation is 3× (1.2×) higher than power gating in cache (bank) level due to its superior selectivity. The access delay times are allowed to increase by 4% in the same energy delay product to achieve the best power reduction for each supply voltages and optimisation levels. The results show the row-level power gating is the best for optimising the power of the entire cache with lowest drawbacks. Comparisons of cells show that the cells whose bodies have higher power consumption are the best candidates for power gating technique in row-level optimisation. The technique has the lowest percentage of saving in minimum energy point (MEP) of the design. The power gating also improves the variation of power in all structures by at least 70%.
Population genetic structure and its implications for adaptive variation in memory and the hippocampus on a continental scale in food-caching black-capped chickadees.

PubMed

Pravosudov, V V; Roth, T C; Forister, M L; Ladage, L D; Burg, T M; Braun, M J; Davidson, B S

2012-09-01

Food-caching birds rely on stored food to survive the winter, and spatial memory has been shown to be critical in successful cache recovery. Both spatial memory and the hippocampus, an area of the brain involved in spatial memory, exhibit significant geographic variation linked to climate-based environmental harshness and the potential reliance on food caches for survival. Such geographic variation has been suggested to have a heritable basis associated with differential selection. Here, we ask whether population genetic differentiation and potential isolation among multiple populations of food-caching black-capped chickadees is associated with differences in memory and hippocampal morphology by exploring population genetic structure within and among groups of populations that are divergent to different degrees in hippocampal morphology. Using mitochondrial DNA and 583 AFLP loci, we found that population divergence in hippocampal morphology is not significantly associated with neutral genetic divergence or geographic distance, but instead is significantly associated with differences in winter climate. These results are consistent with variation in a history of natural selection on memory and hippocampal morphology that creates and maintains differences in these traits regardless of population genetic structure and likely associated gene flow. Published 2012. This article is a US Government work and is in the public domain in the USA.
The Science of Computing: Virtual Memory

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1986-01-01

In the March-April issue, I described how a computer's storage system is organized as a hierarchy consisting of cache, main memory, and secondary memory (e.g., disk). The cache and main memory form a subsystem that functions like main memory but attains speeds approaching cache. What happens if a program and its data are too large for the main memory? This is not a frivolous question. Every generation of computer users has been frustrated by insufficient memory. A new line of computers may have sufficient storage for the computations of its predecessor, but new programs will soon exhaust its capacity. In 1960, a longrange planning committee at MIT dared to dream of a computer with 1 million words of main memory. In 1985, the Cray-2 was delivered with 256 million words. Computational physicists dream of computers with 1 billion words. Computer architects have done an outstanding job of enlarging main memories yet they have never kept up with demand. Only the shortsighted believe they can.
No evidence for memory interference across sessions in food hoarding marsh tits Poecile palustris under laboratory conditions.

PubMed

Urhan, A Utku; Brodin, Anders

2015-05-01

Scatter hoarding birds are known for their accurate spatial memory. In a previous experiment, we tested the retrieval accuracy in marsh tits in a typical laboratory set-up for this species. We also tested the performance of humans in this experimental set-up. Somewhat unexpectedly, humans performed much better than marsh tits. In the first five attempts, humans relocated almost 90 % of the caches they had hidden 5 h earlier. Marsh tits only relocated 25 % in the first five attempts and just above 40 % in the first ten attempts. Typically, in this type of experiment, the birds will be caching and retrieving many times in the same sites in the same experimental room. This is very different from the conditions in nature where hoarding parids only cache once in a caching site. Hence, it is possible that memories from previous sessions will disturb the formation of new memories. If there is such proactive interference, the prediction is that success should decay over sessions. Here, we have designed an experiment to investigate whether there is such memory interference in this type of experiment. We allowed marsh tits and humans to cache and retrieve in three repeated sessions without prior experience of the arena. The performance did not change over sessions, and on average, marsh tits correctly visited around 25 % of the caches in the first five attempts. The corresponding success in humans was constant across sessions, and it was around 90 % on average. We conclude that the somewhat poor performance of the marsh tits did not depend on proactive memory interference. We also discuss other possible reasons for why marsh tits in general do not perform better in laboratory experiments.
Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency

NASA Astrophysics Data System (ADS)

Soderquist, Peter; Leeser, Miriam E.

1999-01-01

Digital video decoding, enabled by the MPEG-2 Video standard, is an important future application for embedded systems, particularly PDAs and other information appliances. Many such system require portability and wireless communication capabilities, and thus face severe limitations in size and power consumption. This places a premium on integration and efficiency, and favors software solutions for video functionality over specialized hardware. The processors in most embedded system currently lack the computational power needed to perform video decoding, but a related and equally important problem is the required data bandwidth, and the need to cost-effectively insure adequate data supply. MPEG data sets are very large, and generate significant amounts of excess memory traffic for standard data caches, up to 100 times the amount required for decoding. Meanwhile, cost and power limitations restrict cache sizes in embedded systems. Some systems, including many media processors, eliminate caches in favor of memories under direct, painstaking software control in the manner of digital signal processors. Yet MPEG data has locality which caches can exploit if properly optimized, providing fast, flexible, and automatic data supply. We propose a set of enhancements which target the specific needs of the heterogeneous types within the MPEG decoder working set. These optimizations significantly improve the efficiency of small caches, reducing cache-memory traffic by almost 70 percent, and can make an enhanced 4 KB cache perform better than a standard 1 MB cache. This performance improvement can enable high-resolution, full frame rate video playback in cheaper, smaller system than woudl otherwise be possible.
Clark's nutcracker spatial memory: the importance of large, structural cues.

PubMed

Bednekoff, Peter A; Balda, Russell P

2014-02-01

Clark's nutcrackers, Nucifraga columbiana, cache and recover stored seeds in high alpine areas including areas where snowfall, wind, and rockslides may frequently obscure or alter cues near the cache site. Previous work in the laboratory has established that Clark's nutcrackers use spatial memory to relocate cached food. Following from aspects of this work, we performed experiments to test the importance of large, structural cues for Clark's nutcracker spatial memory. Birds were no more accurate in recovering caches when more objects were on the floor of a large experimental room nor when this room was subdivided with a set of panels. However, nutcrackers were consistently less accurate in this large room than in a small experimental room. Clark's nutcrackers probably use structural features of experimental rooms as important landmarks during recovery of cached food. This use of large, extremely stable cues may reflect the imperfect reliability of smaller, closer cues in the natural habitat of Clark's nutcrackers. This article is part of a Special Issue entitled: CO3 2013. Copyright © 2013 Elsevier B.V. All rights reserved.
Authenticating cache

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Tyler Barratt; Urrea, Jorge Mario

2012-06-01

The aim of the Authenticating Cache architecture is to ensure that machine instructions in a Read Only Memory (ROM) are legitimate from the time the ROM image is signed (immediately after compilation) to the time they are placed in the cache for the processor to consume. The proposed architecture allows the detection of ROM image modifications during distribution or when it is loaded into memory. It also ensures that modified instructions will not execute in the processor-as the cache will not be loaded with a page that fails an integrity check. The authenticity of the instruction stream can also bemore » verified in this architecture. The combination of integrity and authenticity assurance greatly improves the security profile of a system.« less
Error recovery in shared memory multiprocessors using private caches

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1990-01-01

The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.
Problems faced by food-caching corvids and the evolution of cognitive solutions

PubMed Central

Grodzinski, Uri; Clayton, Nicola S.

2010-01-01

The scatter hoarding of food, or caching, is a widespread and well-studied behaviour. Recent experiments with caching corvids have provided evidence for episodic-like memory, future planning and possibly mental attribution, all cognitive abilities that were thought to be unique to humans. In addition to the complexity of making flexible, informed decisions about caching and recovering, this behaviour is underpinned by a motivationally controlled compulsion to cache. In this review, we shall first discuss the compulsive side of caching both during ontogeny and in the caching behaviour of adult corvids. We then consider some of the problems that these birds face and review the evidence for the cognitive abilities they use to solve them. Thus, the emergence of episodic-like memory is viewed as a solution for coping with food perishability, while the various cache-protection and pilfering strategies may be sophisticated tools to deprive competitors of information, either by reducing the quality of information they can gather, or invalidating the information they already have. Finally, we shall examine whether such future-oriented behaviour involves future planning and ask why this and other cognitive abilities might have evolved in corvids. PMID:20156820
Problems faced by food-caching corvids and the evolution of cognitive solutions.

PubMed

Grodzinski, Uri; Clayton, Nicola S

2010-03-27

The scatter hoarding of food, or caching, is a widespread and well-studied behaviour. Recent experiments with caching corvids have provided evidence for episodic-like memory, future planning and possibly mental attribution, all cognitive abilities that were thought to be unique to humans. In addition to the complexity of making flexible, informed decisions about caching and recovering, this behaviour is underpinned by a motivationally controlled compulsion to cache. In this review, we shall first discuss the compulsive side of caching both during ontogeny and in the caching behaviour of adult corvids. We then consider some of the problems that these birds face and review the evidence for the cognitive abilities they use to solve them. Thus, the emergence of episodic-like memory is viewed as a solution for coping with food perishability, while the various cache-protection and pilfering strategies may be sophisticated tools to deprive competitors of information, either by reducing the quality of information they can gather, or invalidating the information they already have. Finally, we shall examine whether such future-oriented behaviour involves future planning and ask why this and other cognitive abilities might have evolved in corvids.
Addressing Inter-set Write-Variation for Improving Lifetime of Non-Volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S

We propose a technique which minimizes inter-set write variation in NVM caches for improving its lifetime. Our technique uses cache coloring scheme to add a software-controlled mapping layer between groups of physical pages (called memory regions) and cache sets. Periodically, the number of writes to different colors of the cache is computed and based on this result, the mapping of a few colors is changed to channel the write traffic to least utilized cache colors. This change helps to achieve wear-leveling.
Retrospective Cognition by Food-Caching Western Scrub-Jays

ERIC Educational Resources Information Center

de Kort, S.R.; Dickinson, A.; Clayton, N.S.

2005-01-01

Episodic-like memory, the retrospective component of cognitive time travel in animals, needs to fulfil three criteria to meet the behavioral properties of episodic memory as defined for humans. Here, we review results obtained with the cache-recovery paradigm with western scrub-jays and conclude that they fulfil these three criteria. The jays…
Interacting Cache memories: evidence for flexible memory use by Western Scrub-Jays (Aphelocoma californica).

PubMed

Clayton, Nicola S; Yu, Kara Shirley; Dickinson, Anthony

2003-01-01

When Western Scrub-Jays (Aphelocoma californica) cached and recovered perishable crickets, N. S. Clayton, K. S. Yu, and A. Dickinson (2001) reported that the jays rapidly learned to search for fresh crickets after a 1-day retention interval (RI) between caching and recovery but to avoid searching for perished crickets after a 4-day RI. In the present experiments, the jays generalized their search preference for crickets to intermediate RIs and used novel information about the rate of decay of crickets presented during the RI to reverse these search preferences at recovery. The authors interpret this reversal as evidence that the birds can integrate information about the caching episode with new information presented during the RI.
Corvid caching: Insights from a cognitive model.

PubMed

van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K

2011-07-01

Caching and recovery of food by corvids is well-studied, but some ambiguous results remain. To help clarify these, we built a computational cognitive model. It is inspired by similar models built for humans, and it assumes that memory strength depends on frequency and recency of use. We compared our model's behavior to that of real birds in previously published experiments. Our model successfully replicated the outcomes of two experiments on recovery behavior and two experiments on cache site choice. Our "virtual birds" reproduced declines in recovery accuracy across sessions, revisits to previously emptied cache sites, a lack of correlation between caching and recovery order, and a preference for caching in safe locations. The model also produced two new explanations. First, that Clark's nutcrackers may become less accurate as recovery progresses not because of differential memory for different cache sites, as was once assumed, but because of chance effects. And second, that Western scrub jays may choose their cache sites not on the basis of negative recovery experiences only, as was previously thought, but on the basis of positive recovery experiences instead. Alternatively, both "punishment" and "reward" may be playing a role. We conclude with a set of new insights, a testable prediction, and directions for future work. PsycINFO Database Record (c) 2011 APA, all rights reserved
Multiple channel data acquisition system

DOEpatents

Crawley, H. Bert; Rosenberg, Eli I.; Meyer, W. Thomas; Gorbics, Mark S.; Thomas, William D.; McKay, Roy L.; Homer, Jr., John F.

1990-05-22

A multiple channel data acquisition system for the transfer of large amounts of data from a multiplicity of data channels has a plurality of modules which operate in parallel to convert analog signals to digital data and transfer that data to a communications host via a FASTBUS. Each module has a plurality of submodules which include a front end buffer (FEB) connected to input circuitry having an analog to digital converter with cache memory for each of a plurality of channels. The submodules are interfaced with the FASTBUS via a FASTBUS coupler which controls a module bus and a module memory. The system is triggered to effect rapid parallel data samplings which are stored to the cache memories. The cache memories are uploaded to the FEBs during which zero suppression occurs. The data in the FEBs is reformatted and compressed by a local processor during transfer to the module memory. The FASTBUS coupler is used by the communications host to upload the compressed and formatted data from the module memory. The local processor executes programs which are downloaded to the module memory through the FASTBUS coupler.
Multiple channel data acquisition system

DOEpatents

Crawley, H.B.; Rosenberg, E.I.; Meyer, W.T.; Gorbics, M.S.; Thomas, W.D.; McKay, R.L.; Homer, J.F. Jr.

1990-05-22

A multiple channel data acquisition system for the transfer of large amounts of data from a multiplicity of data channels has a plurality of modules which operate in parallel to convert analog signals to digital data and transfer that data to a communications host via a FASTBUS. Each module has a plurality of submodules which include a front end buffer (FEB) connected to input circuitry having an analog to digital converter with cache memory for each of a plurality of channels. The submodules are interfaced with the FASTBUS via a FASTBUS coupler which controls a module bus and a module memory. The system is triggered to effect rapid parallel data samplings which are stored to the cache memories. The cache memories are uploaded to the FEBs during which zero suppression occurs. The data in the FEBs is reformatted and compressed by a local processor during transfer to the module memory. The FASTBUS coupler is used by the communications host to upload the compressed and formatted data from the module memory. The local processor executes programs which are downloaded to the module memory through the FASTBUS coupler. 25 figs.
PCM-Based Durable Write Cache for Fast Disk I/O

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Zhuo; Wang, Bin; Carpenter, Patrick

2012-01-01

Flash based solid-state devices (FSSDs) have been adopted within the memory hierarchy to improve the performance of hard disk drive (HDD) based storage system. However, with the fast development of storage-class memories, new storage technologies with better performance and higher write endurance than FSSDs are emerging, e.g., phase-change memory (PCM). Understanding how to leverage these state-of-the-art storage technologies for modern computing systems is important to solve challenging data intensive computing problems. In this paper, we propose to leverage PCM for a hybrid PCM-HDD storage architecture. We identify the limitations of traditional LRU caching algorithms for PCM-based caches, and develop amore » novel hash-based write caching scheme called HALO to improve random write performance of hard disks. To address the limited durability of PCM devices and solve the degraded spatial locality in traditional wear-leveling techniques, we further propose novel PCM management algorithms that provide effective wear-leveling while maximizing access parallelism. We have evaluated this PCM-based hybrid storage architecture using applications with a diverse set of I/O access patterns. Our experimental results demonstrate that the HALO caching scheme leads to an average reduction of 36.8% in execution time compared to the LRU caching scheme, and that the SFC wear leveling extends the lifetime of PCM by a factor of 21.6.« less
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Distributed Name Servers: Naming and Caching in Large Distributed Computing Environments

DTIC Science & Technology

1985-12-01

transmission rate of the communication medium1, transmission over a 56K bps line costs approx- imately 54r, and similarly, communication over a 9.6K...memories for modem computer systems attempt to maximize the hit ratio for a fixed-size cache by utilizing intelligent cache replacement algorithms
Simplifying and speeding the management of intra-node cache coherence

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-04-17

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
The history of scatter hoarding studies.

PubMed

Brodin, Anders

2010-03-27

In this review, I will present an overview of the development of the field of scatter hoarding studies. Scatter hoarding is a conspicuous behaviour and it has been observed by humans for a long time. Apart from an exceptional experimental study already published in 1720, it started with observational field studies of scatter hoarding birds in the 1940s. Driven by a general interest in birds, several ornithologists made large-scale studies of hoarding behaviour in species such as nutcrackers and boreal titmice. Scatter hoarding birds seem to remember caching locations accurately, and it was shown in the 1960s that successful retrieval is dependent on a specific part of the brain, the hippocampus. The study of scatter hoarding, spatial memory and the hippocampus has since then developed into a study system for evolutionary studies of spatial memory. In 1978, a game theoretical paper started the era of modern studies by establishing that a recovery advantage is necessary for individual hoarders for the evolution of a hoarding strategy. The same year, a combined theoretical and empirical study on scatter hoarding squirrels investigated how caches should be spaced out in order to minimize cache loss, a phenomenon sometimes called optimal cache density theory. Since then, the scatter hoarding paradigm has branched into a number of different fields: (i) theoretical and empirical studies of the evolution of hoarding, (ii) field studies with modern sampling methods, (iii) studies of the precise nature of the caching memory, (iv) a variety of studies of caching memory and its relationship to the hippocampus. Scatter hoarding has also been the subject of studies of (v) coevolution between scatter hoarding animals and the plants that are dispersed by these.
Multiple core computer processor with globally-accessible local memories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shalf, John; Donofrio, David; Oliker, Leonid

A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
MELOC - Memory and Location Optimized Caching for Mobile Ad Hoc Networks

DTIC Science & Technology

2011-01-01

required for such environments. Moreover, nodes located at centre have to be chosen as cache location, since it reduces the chance of being attacked...Figure 1.1. MANET Formed by Armed Forces 47 Example 3: Sharing of music and videos are famous among mobile users. Instead of downloading...The two tier caching scheme discussed in this paper is acoustic . The characteristics of two-tier caching are as follows, the content of data to be
Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

NASA Astrophysics Data System (ADS)

Sadeghi, Alireza; Sheikholeslami, Fatemeh; Giannakis, Georgios B.

2018-02-01

Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this work, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.
Improving energy efficiency of Embedded DRAM Caches for High-end Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong

2014-01-01

With increasing system core-count, the size of last level cache (LLC) has increased and since SRAM consumes high leakage power, power consumption of LLCs is becoming a significant fraction of processor power consumption. To address this, researchers have used embedded DRAM (eDRAM) LLCs which consume low-leakage power. However, eDRAM caches consume a significant amount of energy in the form of refresh energy. In this paper, we propose ESTEEM, an energy saving technique for embedded DRAM caches. ESTEEM uses dynamic cache reconfiguration to turn-off a portion of the cache to save both leakage and refresh energy. It logically divides the cachemore » sets into multiple modules and turns-off possibly different number of ways in each module. Microarchitectural simulations confirm that ESTEEM is effective in improving performance and energy efficiency and provides better results compared to a recently-proposed eDRAM cache energy saving technique, namely Refrint. For single and dual-core simulations, the average saving in memory subsystem (LLC+main memory) on using ESTEEM is 25.8% and 32.6%, respectively and average weighted speedup are 1.09X and 1.22X, respectively. Additional experiments confirm that ESTEEM works well for a wide-range of system parameters.« less
Solutions and debugging for data consistency in multiprocessors with noncoherent caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

1995-02-01

We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization

NASA Astrophysics Data System (ADS)

Wu, Jianping; Ling, Ming; Zhang, Yang; Mei, Chen; Wang, Huan

This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.
Re-caching by Western scrub-jays (Aphelocoma californica) cannot be attributed to stress.

PubMed

Thom, James M; Clayton, Nicola S

2013-01-01

Western scrub-jays (Aphelocoma californica) live double lives, storing food for the future while raiding the stores of other birds. One tactic scrub-jays employ to protect stores is "re-caching"-relocating caches out of sight of would-be thieves. Recent computational modelling work suggests that re-caching might be mediated not by complex cognition, but by a combination of memory failure and stress. The "Stress Model" asserts that re-caching is a manifestation of a general drive to cache, rather than a desire to protect existing stores. Here, we present evidence strongly contradicting the central assumption of these models: that stress drives caching, irrespective of social context. In Experiment (i), we replicate the finding that scrub-jays preferentially relocate food they were watched hiding. In Experiment (ii) we find no evidence that stress increases caching. In light of our results, we argue that the Stress Model cannot account for scrub-jay re-caching.
Store operations to maintain cache coherence

DOEpatents

Evangelinos, Constantinos; Nair, Ravi; Ohmacht, Martin

2017-08-01

In one embodiment, a computer-implemented method includes encountering a store operation during a compile-time of a program, where the store operation is applicable to a memory line. It is determined, by a computer processor, that no cache coherence action is necessary for the store operation. A store-without-coherence-action instruction is generated for the store operation, responsive to determining that no cache coherence action is necessary. The store-without-coherence-action instruction specifies that the store operation is to be performed without a cache coherence action, and cache coherence is maintained upon execution of the store-without-coherence-action instruction.
The Effects of Block Size on the Performance of Coherent Caches in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

increase with the bandwidth and latency. For those applications with poor spatial locality, the best choice of cache line size is determined by the...observation was used in the design of two schemes: LimitLESS di- rectories and Tag caches. LimitLESS directories [15] were designed for the ALEWIFE...small packets may be used to avoid network congestion. The most important factor influencing the choice of cache line size for a multipro- cessor is the
Cache directory lookup reader set encoding for partial cache line speculation support

DOEpatents

Gara, Alan; Ohmacht, Martin

2014-10-21

In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.
A population study of Alzheimer's disease: findings from the Cache County Study on Memory, Health, and Aging.

PubMed

Tschanz, Joann T; Treiber, Katherine; Norton, Maria C; Welsh-Bohmer, Kathleen A; Toone, Leslie; Zandi, Peter P; Szekely, Christine A; Lyketsos, Constantine; Breitner, John C S

2005-01-01

There are several population-based studies of aging, memory, and dementia being conducted worldwide. Of these, the Cache County Study on Memory, Health and Aging is noteworthy for its large number of "oldest-old" members. This study, which has been following an initial cohort of 5,092 seniors since 1995, has reported among its major findings the role of the Apolipoprotein E gene on modifying the risk for Alzheimer's disease (AD) in males and females and identifying pharmacologic compounds that may act to reduce AD risk. This article summarizes the major findings of the Cache County study to date, describes ongoing investigations, and reports preliminary analyses on the outcome of the oldest-old in this population, the subgroup of participants who were over age 84 at the study's inception.
Combining instruction prefetching with partial cache locking to improve WCET in real-time systems.

PubMed

Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

2013-01-01

Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking.
Combining Instruction Prefetching with Partial Cache Locking to Improve WCET in Real-Time Systems

PubMed Central

Ni, Fan; Long, Xiang; Wan, Han; Gao, Xiaopeng

2013-01-01

Caches play an important role in embedded systems to bridge the performance gap between fast processor and slow memory. And prefetching mechanisms are proposed to further improve the cache performance. While in real-time systems, the application of caches complicates the Worst-Case Execution Time (WCET) analysis due to its unpredictable behavior. Modern embedded processors often equip locking mechanism to improve timing predictability of the instruction cache. However, locking the whole cache may degrade the cache performance and increase the WCET of the real-time application. In this paper, we proposed an instruction-prefetching combined partial cache locking mechanism, which combines an instruction prefetching mechanism (termed as BBIP) with partial cache locking to improve the WCET estimates of real-time applications. BBIP is an instruction prefetching mechanism we have already proposed to improve the worst-case cache performance and in turn the worst-case execution time. The estimations on typical real-time applications show that the partial cache locking mechanism shows remarkable WCET improvement over static analysis and full cache locking. PMID:24386133
Corvid re-caching without 'theory of mind': a model.

PubMed

van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K

2012-01-01

Scrub jays are thought to use many tactics to protect their caches. For instance, they predominantly bury food far away from conspecifics, and if they must cache while being watched, they often re-cache their worms later, once they are in private. Two explanations have been offered for such observations, and they are intensely debated. First, the birds may reason about their competitors' mental states, with a 'theory of mind'; alternatively, they may apply behavioral rules learned in daily life. Although this second hypothesis is cognitively simpler, it does seem to require a different, ad-hoc behavioral rule for every caching and re-caching pattern exhibited by the birds. Our new theory avoids this drawback by explaining a large variety of patterns as side-effects of stress and the resulting memory errors. Inspired by experimental data, we assume that re-caching is not motivated by a deliberate effort to safeguard specific caches from theft, but by a general desire to cache more. This desire is brought on by stress, which is determined by the presence and dominance of onlookers, and by unsuccessful recovery attempts. We study this theory in two experiments similar to those done with real birds with a kind of 'virtual bird', whose behavior depends on a set of basic assumptions about corvid cognition, and a well-established model of human memory. Our results show that the 'virtual bird' acts as the real birds did; its re-caching reflects whether it has been watched, how dominant its onlooker was, and how close to that onlooker it has cached. This happens even though it cannot attribute mental states, and it has only a single behavioral rule assumed to be previously learned. Thus, our simulations indicate that corvid re-caching can be explained without sophisticated social cognition. Given our specific predictions, our theory can easily be tested empirically.
Corvid Re-Caching without ‘Theory of Mind’: A Model

PubMed Central

van der Vaart, Elske; Verbrugge, Rineke; Hemelrijk, Charlotte K.

2012-01-01

Scrub jays are thought to use many tactics to protect their caches. For instance, they predominantly bury food far away from conspecifics, and if they must cache while being watched, they often re-cache their worms later, once they are in private. Two explanations have been offered for such observations, and they are intensely debated. First, the birds may reason about their competitors' mental states, with a ‘theory of mind’; alternatively, they may apply behavioral rules learned in daily life. Although this second hypothesis is cognitively simpler, it does seem to require a different, ad-hoc behavioral rule for every caching and re-caching pattern exhibited by the birds. Our new theory avoids this drawback by explaining a large variety of patterns as side-effects of stress and the resulting memory errors. Inspired by experimental data, we assume that re-caching is not motivated by a deliberate effort to safeguard specific caches from theft, but by a general desire to cache more. This desire is brought on by stress, which is determined by the presence and dominance of onlookers, and by unsuccessful recovery attempts. We study this theory in two experiments similar to those done with real birds with a kind of ‘virtual bird’, whose behavior depends on a set of basic assumptions about corvid cognition, and a well-established model of human memory. Our results show that the ‘virtual bird’ acts as the real birds did; its re-caching reflects whether it has been watched, how dominant its onlooker was, and how close to that onlooker it has cached. This happens even though it cannot attribute mental states, and it has only a single behavioral rule assumed to be previously learned. Thus, our simulations indicate that corvid re-caching can be explained without sophisticated social cognition. Given our specific predictions, our theory can easily be tested empirically. PMID:22396799
Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning

PubMed Central

Ichnowski, Jeffrey; Prins, Jan F.; Alterovitz, Ron

2014-01-01

We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU’s cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot’s configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot. PMID:25419474
Cache-Aware Asymptotically-Optimal Sampling-Based Motion Planning.

PubMed

Ichnowski, Jeffrey; Prins, Jan F; Alterovitz, Ron

2014-05-01

We present CARRT* (Cache-Aware Rapidly Exploring Random Tree*), an asymptotically optimal sampling-based motion planner that significantly reduces motion planning computation time by effectively utilizing the cache memory hierarchy of modern central processing units (CPUs). CARRT* can account for the CPU's cache size in a manner that keeps its working dataset in the cache. The motion planner progressively subdivides the robot's configuration space into smaller regions as the number of configuration samples rises. By focusing configuration exploration in a region for periods of time, nearest neighbor searching is accelerated since the working dataset is small enough to fit in the cache. CARRT* also rewires the motion planning graph in a manner that complements the cache-aware subdivision strategy to more quickly refine the motion planning graph toward optimality. We demonstrate the performance benefit of our cache-aware motion planning approach for scenarios involving a point robot as well as the Rethink Robotics Baxter robot.
The Cache County Study on Memory in Aging: Factors Affecting Risk of Alzheimer's disease and its Progression after Onset

PubMed Central

Tschanz, JoAnn T.; Norton, Maria C.; Zandi, Peter P.; Lyketsos, Constantine G.

2014-01-01

The Cache County Study on Memory in Aging is a longitudinal, population-based study of Alzheimer's disease (AD) and other dementias. Initiated in 1995 and extending to 2013, the study has followed over 5,000 elderly residents of Cache County, Utah (USA) for over twelve years. Achieving a 90% participation rate at enrollment, and spawning two ancillary projects, the study has contributed to the literature on genetic, psychosocial and environmental risk factors for AD, late life cognitive decline, and the clinical progression of dementia after its onset. This paper describes the major study contributions to the literature on AD and dementia. PMID:24423221

The Cache County Study on Memory in Aging: factors affecting risk of Alzheimer's disease and its progression after onset.

PubMed

Tschanz, Joann T; Norton, Maria C; Zandi, Peter P; Lyketsos, Constantine G

2013-12-01

The Cache County Study on Memory in Aging is a longitudinal, population-based study of Alzheimer's disease (AD) and other dementias. Initiated in 1995 and extending to 2013, the study has followed over 5,000 elderly residents of Cache County, Utah (USA) for over twelve years. Achieving a 90% participation rate at enrolment, and spawning two ancillary projects, the study has contributed to the literature on genetic, psychosocial and environmental risk factors for AD, late-life cognitive decline, and the clinical progression of dementia after its onset. This paper describes the major study contributions to the literature on AD and dementia.
Side Channel Attacks on STTRAM and Low Overhead Countermeasures

DTIC Science & Technology

2017-03-20

introduce security vulnerabilities and expose the cache memory to side channel attacks. In this paper, we propose a side channel attack (SCA) model...where the adversary can monitor the supply current of the memory array to partially identify the sensi- tive cache data that is being read or written. We...propose solutions such as short retention STTRAM, obfuscation of SCA using 1-bit parity, multi-bit random write, and, neutral- izing the SCA using
Josephson 4 K-bit cache memory design for a prototype signal processor. I - General overview

NASA Astrophysics Data System (ADS)

Henkels, W. H.; Geppert, L. M.; Kadlec, J.; Epperlein, P. W.; Beha, H.

1985-09-01

In the early stages of thg Josephson computer project conducted at an American computer company, it was recognized that a very fast cache memory was needed to complement Josephson logic. A subnanosecond access time memory was implemented experimentally on the basis of a 2.5-micron Pb-alloy technology. It was then decided to switch over to a Nb-base-electrode technology with the objective to alleviate problems with the long-term reliability and aging of Pb-based junctions. The present paper provides a general overview of the status of a 4 x 1 K-bit Josephson cache design employing a 2.5-micron Nb-edge-junction technology. Attention is given to the fabrication process and its implications, aspects of circuit design methodology, an overview of system environment and chip components, design changes and status, and various difficulties and uncertainties.
Modifying dementia risk and trajectories of cognitive decline in aging: the Cache County Memory Study.

PubMed

Welsh-Bohmer, Kathleen A; Breitner, John C S; Hayden, Kathleen M; Lyketsos, Constantine; Zandi, Peter P; Tschanz, Joann T; Norton, Maria C; Munger, Ron

2006-07-01

The Cache County Study of Memory, Health, and Aging, more commonly referred to as the "Cache County Memory Study (CCMS)" is a longitudinal investigation of aging and Alzheimer's disease (AD) based in an exceptionally long-lived population residing in northern Utah. The study begun in 1994 has followed an initial cohort of 5,092 older individuals (many over age 84) and has examined the development of cognitive impairment and dementia in relation to genetic and environmental antecedents. This article summarizes the major contributions of the CCMS towards the understanding of mild cognitive disorders and AD across the lifespan, underscoring the role of common health exposures in modifying dementia risk and trajectories of cognitive change. The study now in its fourth wave of ascertainment illustrates the role of population-based approaches in informing testable models of cognitive aging and Alzheimer's disease.
The effect of environmental harshness on neurogenesis: a large-scale comparison.

PubMed

Chancellor, Leia V; Roth, Timothy C; LaDage, Lara D; Pravosudov, Vladimir V

2011-03-01

Harsh environmental conditions may produce strong selection pressure on traits, such as memory, that may enhance fitness. Enhanced memory may be crucial for survival in animals that use memory to find food and, thus, particularly important in environments where food sources may be unpredictable. For example, animals that cache and later retrieve their food may exhibit enhanced spatial memory in harsh environments compared with those in mild environments. One way that selection may enhance memory is via the hippocampus, a brain region involved in spatial memory. In a previous study, we established a positive relationship between environmental severity and hippocampal morphology in food-caching black-capped chickadees (Poecile atricapillus). Here, we expanded upon this previous work to investigate the relationship between environmental harshness and neurogenesis, a process that may support hippocampal cytoarchitecture. We report a significant and positive relationship between the degree of environmental harshness across several populations over a large geographic area and (1) the total number of immature hippocampal neurons, (2) the number of immature neurons relative to the hippocampal volume, and (3) the number of immature neurons relative to the total number of hippocampal neurons. Our results suggest that hippocampal neurogenesis may play an important role in environments where increased reliance on memory for cache recovery is critical. Copyright © 2010 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Evangelinos, Constantinos; Nair, Ravi; Ohmacht, Martin

In one embodiment, a computer-implemented method includes encountering a store operation during a compile-time of a program, where the store operation is applicable to a memory line. It is determined, by a computer processor, that no cache coherence action is necessary for the store operation. A store-without-coherence-action instruction is generated for the store operation, responsive to determining that no cache coherence action is necessary. The store-without-coherence-action instruction specifies that the store operation is to be performed without a cache coherence action, and cache coherence is maintained upon execution of the store-without-coherence-action instruction.
Cache and energy efficient algorithms for Nussinov's RNA Folding.

PubMed

Zhao, Chunchun; Sahni, Sartaj

2017-12-06

An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Software Exploit Prevention and Remediation via Software Memory Protection

DTIC Science & Technology

2009-05-01

trampolines that are necessary. Trampolines are pieces of code emitted into the fragment cache to transfer con- trol back to Strata. Most control...transfer instructions (CTIs) are initially linked to trampolines (unless the transfer target already exists in the fragment cache). Once a CTI’s target...instruction becomes available in the fragment cache, the CTI is linked directly to the destination, avoiding future uses of the trampoline . This
High Performance Analytics with the R3-Cache

NASA Astrophysics Data System (ADS)

Eavis, Todd; Sayeed, Ruhan

Contemporary data warehouses now represent some of the world’s largest databases. As these systems grow in size and complexity, however, it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users. Certainly, improved indexing and more selective view materialization are helpful in this regard. Nevertheless, with warehouses moving into the multi-terabyte range, it is clear that the minimization of external memory accesses must be a primary performance objective. In this paper, we describe the R 3-cache, a natively multi-dimensional caching framework designed specifically to support sophisticated warehouse/OLAP environments. R 3-cache is based upon an in-memory version of the R-tree that has been extended to support buffer pages rather than disk blocks. A key strength of the R 3-cache is that it is able to utilize multi-dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses. Moreover, the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro-active updates that exploit the existence of query “hot spots”. The current prototype has been evaluated as a component of the Sidera DBMS, a “shared nothing” parallel OLAP server designed for multi-terabyte analytics. Experimental results demonstrate significant performance improvements relative to simpler alternatives.
Dynamically programmable cache

NASA Astrophysics Data System (ADS)

Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas

1998-10-01

Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2011-01-11

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-02-21

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
A set-associative, fault-tolerant cache design

NASA Technical Reports Server (NTRS)

Lamet, Dan; Frenzel, James F.

1992-01-01

The design of a defect-tolerant control circuit for a set-associative cache memory is presented. The circuit maintains the stack ordering necessary for implementing the Least Recently Used (LRU) replacement algorithm. A discussion of programming techniques for bypassing defective blocks is included.
Accurate low-cost methods for performance evaluation of cache memory systems

NASA Technical Reports Server (NTRS)

Laha, Subhasis; Patel, Janak H.; Iyer, Ravishankar K.

1988-01-01

Methods of simulation based on statistical techniques are proposed to decrease the need for large trace measurements and for predicting true program behavior. Sampling techniques are applied while the address trace is collected from a workload. This drastically reduces the space and time needed to collect the trace. Simulation techniques are developed to use the sampled data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a concept of primed cache is introduced to simulate large caches by the sampling-based method.
Spin-transfer torque magnetoresistive random-access memory technologies for normally off computing (invited)

NASA Astrophysics Data System (ADS)

Ando, K.; Fujita, S.; Ito, J.; Yuasa, S.; Suzuki, Y.; Nakatani, Y.; Miyazaki, T.; Yoda, H.

2014-05-01

Most parts of present computer systems are made of volatile devices, and the power to supply them to avoid information loss causes huge energy losses. We can eliminate this meaningless energy loss by utilizing the non-volatile function of advanced spin-transfer torque magnetoresistive random-access memory (STT-MRAM) technology and create a new type of computer, i.e., normally off computers. Critical tasks to achieve normally off computers are implementations of STT-MRAM technologies in the main memory and low-level cache memories. STT-MRAM technology for applications to the main memory has been successfully developed by using perpendicular STT-MRAMs, and faster STT-MRAM technologies for applications to the cache memory are now being developed. The present status of STT-MRAMs and challenges that remain for normally off computers are discussed.
Programmable stream prefetch with resource optimization

DOEpatents

Boyle, Peter; Christ, Norman; Gara, Alan; Mawhinney, Robert; Ohmacht, Martin; Sugavanam, Krishnan

2013-01-08

A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.
A High-Precision Counter Using the DSP Technique

DTIC Science & Technology

2004-09-01

DSP is not good enough to process all the 1-second samples. The cache memory is also not sufficient to store all the sampling data. So we cut the...sampling number in a cycle is not good enough to achieve an accuracy less than 2×10-11. For this reason, a correlation operation is performed for... not good enough to process all the 1-second samples. The cache memory is also not sufficient to store all the sampling data. We will solve this
A Measurement and Simulation Based Methodology for Cache Performance Modeling and Tuning

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cache performance for applications executing on shared memory multiprocessors by accurately predicting the effects of source code level modifications. Measurements on a single processor are initially used for identifying parts of code where cache utilization improvements may significantly impact the overall performance. Cache simulation based on trace-driven techniques can be carried out without gathering detailed address traces. Minimal runtime information for modeling cache performance of a selected code block includes: base virtual addresses of arrays, virtual addresses of variables, and loop bounds for that code block. Rest of the information is obtained from the source code. We show that the cache performance predictions are as reliable as those obtained through trace-driven simulations. This technique is particularly helpful to the exploration of various "what-if' scenarios regarding the cache performance impact for alternative code structures. We explain and validate this methodology using a simple matrix-matrix multiplication program. We then apply this methodology to predict and tune the cache performance of two realistic scientific applications taken from the Computational Fluid Dynamics (CFD) domain.
Controlled replication: reduce the capacity occupied by redundant replicas in tiled chip multiprocessors

NASA Astrophysics Data System (ADS)

Li, Hao; Xie, Lunguo

2013-03-01

The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.
AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S

2014-01-01

Recently, researchers have explored way-based hybrid SRAM-NVM (non-volatile memory) last level caches (LLCs) to bring the best of SRAM and NVM together. However, the limited write endurance of NVMs restricts the lifetime of these hybrid caches. We present AYUSH, a technique to enhance the lifetime of hybrid caches, which works by using data-migration to preferentially use SRAM for storing frequently-reused data. Microarchitectural simulations confirm that AYUSH achieves larger improvement in lifetime than a previous technique and also maintains performance and energy efficiency. For single, dual and quad-core workloads, the average increase in cache lifetime with AYUSH is 6.90X, 24.06X andmore » 47.62X, respectively.« less

Software-Controlled Caches in the VMP Multiprocessor

DTIC Science & Technology

1986-03-01

programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Methods for compressible fluid simulation on GPUs using high-order finite differences

NASA Astrophysics Data System (ADS)

Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer

2017-08-01

We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
Managing coherence via put/get windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A; Chen, Dong; Coteus, Paul W

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Mobile Thread Task Manager

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

2013-01-01

The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
On the Efficacy of Source Code Optimizations for Cache-Based Systems

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Saphir, William C.

1998-01-01

Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
On the Efficacy of Source Code Optimizations for Cache-Based Systems

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)

1998-01-01

Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
Tough times call for bigger brains

PubMed Central

Pravosudov, Vladimir V

2009-01-01

Memory is crucial for survival in many animals. Spatial memory in particular is important for food-caching species and may be influenced by selective pressures such as climate. The influence of climate on memory may be facilitated through the hippocampus (Hp), the part of the brain responsible in part for spatial memory. In a recent paper, we conducted the first large-scale test of the relationship between memory, the climate and the brain in a single food-caching species, the black-capped chickadee (Poecile atricapillus). We found that birds from more harsh northern climates had significantly larger hippocampal volumes and more neurons than those from more mild southern latitudes. This work suggests that environmental pressures are capable of influencing specific brain regions, which may result in enhanced memory, and hence survival, in harsh climates. This work gives us a better understanding of how the brain responds to different environments and how animals can adapt to their environment in general. PMID:19641741
EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S

To address the limitations of SRAM such as high-leakage and low-density, researchers have explored use of non-volatile memory (NVM) devices, such as ReRAM (resistive RAM) and STT-RAM (spin transfer torque RAM) for designing on-chip caches. A crucial limitation of NVMs, however, is that their write endurance is low and the large intra-set write variation introduced by existing cache management policies may further exacerbate this problem, thereby reducing the cache lifetime significantly. We present EqualChance, a technique to increase cache lifetime by reducing intra-set write variation. EqualChance works by periodically changing the physical cache-block location of a write-intensive data item withinmore » a set to achieve wear-leveling. Simulations using workloads from SPEC CPU2006 suite and HPC (high-performance computing) field show that EqualChance improves the cache lifetime by 4.29X. Also, its implementation overhead is small, and it incurs very small performance and energy loss.« less
The ontogeny of food-caching behaviour in New Zealand robins (Petroica longipes).

PubMed

Clark, Lisabertha L; Shaw, Rachael C

2018-06-01

Hoarding or caching behaviour is a widely-used paradigm for examining a range of cognitive processes in birds, such as social cognition and spatial memory. However, much is still unknown about how caching develops in young birds, especially in the wild. Studying the ontogeny of caching in the wild will help researchers to identify the mechanisms that shape this advantageous foraging strategy. We examined the ontogeny of food caching behaviour in a wild New Zealand passerine, the North Island robin (Petroica longipes). For 12-weeks following fledging, we observed 34 juveniles to examine the development of caching and cache retrieval. Additionally, we compared the caching behaviour of juveniles at 12 weeks post-fledging to 35 adult robins to determine whether juveniles had developed adult-like caching behaviour by this age. Juveniles began caching mealworms shortly after achieving foraging independency. Multivariate analyses revealed that caching rate increased and handling time decreased with increasing age. Juveniles spontaneously began retrieving caches as soon as they had begun to cache and their retrieval rates then remained constant throughout their ensuing development. Likewise, the number of sites used by juveniles did not change with age. Juvenile sex, caregiver sex and the duration of post-fledging parental care did not influence the development of caching, cache retrieval, the number of cache sites used and the time juveniles spent handling mealworms. At 12 weeks post-fledging, juveniles demonstrated levels of caching, cache retrieval and cache site usage that were comparable to adults. However, juvenile prey handling time was still longer than adults. The spontaneous emergence of cache retrieval and the consistency in the number of cache sites used throughout development suggests that these aspects of caching in North Island robins are likely to be innate, but that age and experience have an important role in the development of adult caching behaviours. Copyright © 2018 Elsevier B.V. All rights reserved.
Cache placement, pilfering, and a recovery advantage in a seed-dispersing rodent: Could predation of scatter hoarders contribute to seedling establishment?

NASA Astrophysics Data System (ADS)

Steele, Michael A.; Bugdal, Melissa; Yuan, Amy; Bartlow, Andrew; Buzalewski, Jarrod; Lichti, Nathan; Swihart, Robert

2011-11-01

Scatter-hoarding mammals are thought to rely on spatial memory to relocate food caches. Yet, we know little about how long these granivores (primarily rodents) recall specific cache locations or whether individual hoarders have an advantage when recovering their own caches. Indeed, a few recent studies suggest that high rates of pilferage are common and that individual hoarders may not have a retriever's advantage. We tested this hypothesis in a high-density (>7 animals/ha) population of eastern gray squirrels ( Sciurus carolinensis) by presenting individually marked animals (>20) with tagged acorns, mapping cache sites, and following the fate of seed caches. PIT tags allowed us to monitor individual seeds without disturbing cache sites. Acorns only remained in the caches for 12-119 h (0.5-5 d). However, when we live-trapped and removed some animals from the site immediately after they stored seeds (thus simulating predation), their seed caches remained intact for significantly longer periods (16-27 d). Cache duration corresponded roughly to the time at which squirrels were returned to the study area. These results suggest that squirrels have a retriever's advantage and may remember specific cache sites longer than previously thought. We further suggest that predation of scatter hoarders who store seeds for long periods and also possess a recovery advantage may be one important mechanism by which seed establishment is achieved.
An effective write policy for software coherence schemes

NASA Technical Reports Server (NTRS)

Chen, Yung-Chin; Veidenbaum, Alexander V.

1992-01-01

The authors study the write behavior and evaluate the performance of various write strategies and buffering techniques for a MIN-based multiprocessor system using the simple software coherence scheme. Hit ratios, memory latencies, total execution time, and total write traffic are used as the performance indices. The write-through write-allocate no-fetch cache using a write-back write buffer is shown to have a better performance than both write-through and write-back caches. This type of write buffer is effective in reducing the volume as well as bursts of write traffic. On average, the use of a write-back cache reduces by 60 percent the total write traffic generated by a write-through cache.
DSP code optimization based on cache

NASA Astrophysics Data System (ADS)

Xu, Chengfa; Li, Chengcheng; Tang, Bin

2013-03-01

DSP program's running efficiency on board is often lower than which via the software simulation during the program development, which is mainly resulted from the user's improper use and incomplete understanding of the cache-based memory. This paper took the TI TMS320C6455 DSP as an example, analyzed its two-level internal cache, and summarized the methods of code optimization. Processor can achieve its best performance when using these code optimization methods. At last, a specific algorithm application in radar signal processing is proposed. Experiment result shows that these optimization are efficient.
Distributed shared memory for roaming large volumes.

PubMed

Castanié, Laurent; Mion, Christophe; Cavin, Xavier; Lévy, Bruno

2006-01-01

We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming.
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports

DTIC Science & Technology

1991-06-01

Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. � Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
DESTINY

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-03-10

DESTINY is a comprehensive tool for modeling 3D and 2D cache designs using SRAM,embedded DRAM (eDRAM), spin transfer torque RAM (STT-RAM), resistive RAM (ReRAM), and phase change RAM (PCN). In its purpose, it is similar to CACTI, CACTI-3DD or NVSim. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g. latency, area or energy-delay product) for agiven memory technology, choosing the suitable memory technology or fabrication method (i.e. 2D v/s 3D) for a given optimization target, etc. DESTINY has been validated against several cache prototypes. DESTINY is expected to boost studies ofmore » next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers.« less
GPU-Accelerated Forward and Back-Projections with Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction.

PubMed

Ha, S; Matej, S; Ispiryan, M; Mueller, K

2013-02-01

We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
Distributed Saturation

NASA Technical Reports Server (NTRS)

Chung, Ming-Ying; Ciardo, Gianfranco; Siminiceanu, Radu I.

2007-01-01

The Saturation algorithm for symbolic state-space generation, has been a recent break-through in the exhaustive veri cation of complex systems, in particular globally-asyn- chronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and the fastest symbolic exploration algo- rithm to date. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during the highly irregular exploration. A crucial factor in limiting the memory consumption during the symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using the symbolic approach. In this technical report, we develop a garbage collection scheme and several operation cache policies to help on solving extremely complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTNow, in terms of time and memory efficiency.
GPU-Accelerated Forward and Back-Projections With Spatially Varying Kernels for 3D DIRECT TOF PET Reconstruction

NASA Astrophysics Data System (ADS)

Ha, S.; Matej, S.; Ispiryan, M.; Mueller, K.

2013-02-01

We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
Accelerating a Particle-in-Cell Simulation Using a Hybrid Counting Sort

NASA Astrophysics Data System (ADS)

Bowers, K. J.

2001-11-01

In this article, performance limitations of the particle advance in a particle-in-cell (PIC) simulation are discussed. It is shown that the memory subsystem and cache-thrashing severely limit the speed of such simulations. Methods to implement a PIC simulation under such conditions are explored. An algorithm based on a counting sort is developed which effectively eliminates PIC simulation cache thrashing. Sustained performance gains of 40 to 70 percent are measured on commodity workstations for a minimal 2d2v electrostatic PIC simulation. More complete simulations are expected to have even better results as larger simulations are usually even more memory subsystem limited.

Performance of hashed cache data migration schemes on multicomputers

NASA Technical Reports Server (NTRS)

Hiranandani, Seema; Saltz, Joel; Mehrotra, Piyush; Berryman, Harry

1991-01-01

After conducting an examination of several data-migration mechanisms which permit an explicit and controlled mapping of data to memory, a set of schemes for storage and retrieval of off-processor array elements is experimentally evaluated and modeled. All schemes considered have their basis in the use of hash tables for efficient access of nonlocal data. The techniques in question are those of hashed cache, partial enumeration, and full enumeration; in these, nonlocal data are stored in hash tables, so that the operative difference lies in the amount of memory used by each scheme and in the retrieval mechanism used for nonlocal data.
Domain Wall Fermion Inverter on Pentium 4

NASA Astrophysics Data System (ADS)

Pochinsky, Andrew

2005-03-01

A highly optimized domain wall fermion inverter has been developed as part of the SciDAC lattice initiative. By designing the code to minimize memory bus traffic, it achieves high cache reuse and performance in excess of 2 GFlops for out of L2 cache problem sizes on a GigE cluster with 2.66 GHz Xeon processors. The code uses the SciDAC QMP communication library.
Massively parallel algorithms for trace-driven cache simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

1991-01-01

Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
NAS Applications and Advanced Algorithms

NASA Technical Reports Server (NTRS)

Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)

1997-01-01

This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Assessment of TREM2 rs75932628 association with Alzheimer's disease in a population-based sample: the Cache County Study.

PubMed

Gonzalez Murcia, Josue D; Schmutz, Cameron; Munger, Caitlin; Perkes, Ammon; Gustin, Aaron; Peterson, Michael; Ebbert, Mark T W; Norton, Maria C; Tschanz, Joann T; Munger, Ronald G; Corcoran, Christopher D; Kauwe, John S K

2013-12-01

Recent studies have identified the rs75932628 (R47H) variant in TREM2 as an Alzheimer's disease risk factor with estimated odds ratio ranging from 2.9 to 5.1. The Cache County Memory Study is a large, population-based sample designed for the study of memory and aging. We genotyped R47H in 2974 samples (427 cases and 2540 control subjects) from the Cache County study using a custom TaqMan assay. We observed 7 heterozygous cases and 12 heterozygous control subjects with an odds ratio of 3.5 (95% confidence interval, 1.3-8.8; p = 0.0076). The minor allele frequency and population attributable fraction for R47H were 0.0029 and 0.004, respectively. This study replicates the association between R47H and Alzheimer's disease risk in a large, population-based sample, and estimates the population frequency and attributable risk of this rare variant. Copyright © 2013 Elsevier Inc. All rights reserved.
Magpies can use local cues to retrieve their food caches.

PubMed

Feenders, Gesa; Smulders, Tom V

2011-03-01

Much importance has been placed on the use of spatial cues by food-hoarding birds in the retrieval of their caches. In this study, we investigate whether food-hoarding birds can be trained to use local cues ("beacons") in their cache retrieval. We test magpies (Pica pica) in an active hoarding-retrieval paradigm, where local cues are always reliable, while spatial cues are not. Our results show that the birds use the local cues to retrieve their caches, even when occasionally contradicting spatial information is available. The design of our study does not allow us to test rigorously whether the birds prefer using local over spatial cues, nor to investigate the process through which they learn to use local cues. We furthermore provide evidence that magpies develop landmark preferences, which improve their retrieval accuracy. Our findings support the hypothesis that birds are flexible in their use of memory information, using a combination of the most reliable or salient information to retrieve their caches. © Springer-Verlag 2010
What makes specialized food-caching mountain chickadees successful city slickers?

PubMed

Kozlovsky, Dovid Y; Weissgerber, Emily A; Pravosudov, Vladimir V

2017-05-31

Anthropogenic environments are a dominant feature of the modern world; therefore, understanding which traits allow animals to succeed in these urban environments is especially important. Overall, generalist species are thought to be most successful in urban environments, with better general cognition and less neophobia as suggested critical traits. It is less clear, however, which traits would be favoured in urban environments in highly specialized species. Here, we compared highly specialized food-caching mountain chickadees living in an urban environment (Reno, NV, USA) with those living in their natural environment to investigate what makes this species successful in the city. Using a 'common garden' paradigm, we found that urban mountain chickadees tended to explore a novel environment faster and moved more frequently, were better at novel problem-solving, had better long-term spatial memory retention and had a larger telencephalon volume compared with forest chickadees. There were no significant differences between urban and forest chickadees in neophobia, food-caching rates, spatial memory acquisition, hippocampus volume, or the total number of hippocampal neurons. Our results partially support the idea that some traits associated with behavioural flexibility and innovation are associated with successful establishment in urban environments, but differences in long-term spatial memory retention suggest that even this trait specialized for food-caching may be advantageous. Our results highlight the importance of environmental context, species biology, and temporal aspects of invasion in understanding how urban environments are associated with behavioural and cognitive phenotypes and suggest that there is likely no one suite of traits that makes urban animals successful. © 2017 The Author(s).
Algorithms for Data Intensive Applications on Intelligent and Smart Memories

DTIC Science & Technology

2003-03-01

editors). Parallel Algorithms and Architectures. North Holland, 1986. [8] P. Diniz . USC ISI, Personal Communication, March, 2001. [9] M. Frigo, C. E ...hierarchy as well as the Translation Lookaside Buer TLB aect the e ectiveness of cache friendly optimizations These penalties vary among...processors and cause large variations in the e ectiveness of cache performance optimizations The area of graph problems is fundamental in a wide variety of
Visual landmark-directed scatter-hoarding of Siberian chipmunks Tamias sibiricus.

PubMed

Zhang, Dongyuan; Li, Jia; Wang, Zhenyu; Yi, Xianfeng

2016-05-01

Spatial memory of cached food items plays an important role in cache recovery by scatter-hoarding animals. However, whether scatter-hoarding animals intentionally select cache sites with respect to visual landmarks in the environment and then rely on them to recover their cached seeds for later use has not been extensively explored. Furthermore, there is a lack of evidence on whether there are sex differences in visual landmark-based food-hoarding behaviors in small rodents even though male and female animals exhibit different spatial abilities. In the present study, we used a scatter-hoarding animal, the Siberian chipmunk, Tamias sibiricus to explore these questions in semi-natural enclosures. Our results showed that T. sibiricus preferred to establish caches in the shallow pits labeled with visual landmarks (branches of Pinus sylvestris, leaves of Athyrium brevifrons and PVC tubes). In addition, visual landmarks of P. sylvestris facilitated cache recovery by T. sibiricus. We also found significant sex differences in visual landmark-based food-hoarding strategies in Siberian chipmunks. Males, rather than females, chipmunks tended to establish their caches with respect to the visual landmarks. Our studies show that T. sibiricus rely on visual landmarks to establish and recover their caches, and that sex differences exist in visual landmark-based food hoarding in Siberian chipmunks. © 2015 International Society of Zoological Sciences, Institute of Zoology/Chinese Academy of Sciences and John Wiley & Sons Australia, Ltd.
Efficacy of Code Optimization on Cache-Based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Saphir, William C.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

In this paper a number of techniques for improving the cache performance of a representative piece of numerical software is presented. Target machines are popular processors from several vendors: MIPS R5000 (SGI Indy), MIPS R8000 (SGI PowerChallenge), MIPS R10000 (SGI Origin), DEC Alpha EV4 + EV5 (Cray T3D & T3E), IBM RS6000 (SP Wide-node), Intel PentiumPro (Ames' Whitney), Sun UltraSparc (NERSC's NOW). The optimizations all attempt to increase the locality of memory accesses. But they meet with rather varied and often counterintuitive success on the different computing platforms. We conclude that it may be genuinely impossible to obtain portable performance on the current generation of cache-based machines. At the least, it appears that the performance of modern commodity processors cannot be described with parameters defining the cache alone.
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine

NASA Astrophysics Data System (ADS)

Zaitsu, Kazuya; Yamamoto, Koji; Kuroda, Yasuto; Inoue, Kazunari; Ata, Shingo; Oka, Ikuo

Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Design of a memory-access controller with 3.71-times-enhanced energy efficiency for Internet-of-Things-oriented nonvolatile microcontroller unit

NASA Astrophysics Data System (ADS)

Natsui, Masanori; Hanyu, Takahiro

2018-04-01

In realizing a nonvolatile microcontroller unit (MCU) for sensor nodes in Internet-of-Things (IoT) applications, it is important to solve the data-transfer bottleneck between the central processing unit (CPU) and the nonvolatile memory constituting the MCU. As one circuit-oriented approach to solving this problem, we propose a memory access minimization technique for magnetoresistive-random-access-memory (MRAM)-embedded nonvolatile MCUs. In addition to multiplexing and prefetching of memory access, the proposed technique realizes efficient instruction fetch by eliminating redundant memory access while considering the code length of the instruction to be fetched and the transition of the memory address to be accessed. As a result, the performance of the MCU can be improved while relaxing the performance requirement for the embedded MRAM, and compact and low-power implementation can be performed as compared with the conventional cache-based one. Through the evaluation using a system consisting of a general purpose 32-bit CPU and embedded MRAM, it is demonstrated that the proposed technique increases the peak efficiency of the system up to 3.71 times, while a 2.29-fold area reduction is achieved compared with the cache-based one.
Examination of long-term visual memorization capacity in the Clark's nutcracker (Nucifraga columbiana).

PubMed

Qadri, Muhammad A J; Leonard, Kevin; Cook, Robert G; Kelly, Debbie M

2018-02-15

Clark's nutcrackers exhibit remarkable cache recovery behavior, remembering thousands of seed locations over the winter. No direct laboratory test of their visual memory capacity, however, has yet been performed. Here, two nutcrackers were tested in an operant procedure used to measure different species' visual memory capacities. The nutcrackers were incrementally tested with an ever-expanding pool of pictorial stimuli in a two-alternative discrimination task. Each picture was randomly assigned to either a right or a left choice response, forcing the nutcrackers to memorize each picture-response association. The nutcrackers' visual memorization capacity was estimated at a little over 500 pictures, and the testing suggested effects of primacy, recency, and memory decay over time. The size of this long-term visual memory was less than the approximately 800-picture capacity established for pigeons. These results support the hypothesis that nutcrackers' spatial memory is a specialized adaptation tied to their natural history of food-caching and recovery, and not to a larger long-term, general memory capacity. Furthermore, despite millennia of separate and divergent evolution, the mechanisms of visual information retention seem to reflect common memory systems of differing capacities across the different species tested in this design.
A VLSI VAX chip set

NASA Astrophysics Data System (ADS)

Johnson, W. N.; Herrick, W. V.; Grundmann, W. J.

1984-10-01

For the first time, VLSI technology is used to compress the full functinality and comparable performance of the VAX 11/780 super-minicomputer into a 1.2 M transistor microprocessor chip set. There was no subsetting of the 304 instruction set and the 17 data types, nor reduction in hardware support for the 4 Gbyte virtual memory management architecture. The chipset supports an integral 8 kbyte memory cache, a 13.3 Mbyte/s system bus, and sophisticated multiprocessing. High performance is achieved through microcode optimizations afforded by the large control store, tightly coupled address and data caches, the use of internal and external 32 bit datapaths, the extensive aplication of both microlevel and macrolevel pipelining, and the use of specialized hardware assists.
The Glass Computer

ERIC Educational Resources Information Center

Paesler, M. A.

2009-01-01

Digital computers use different kinds of memory, each of which is either volatile or nonvolatile. On most computers only the hard drive memory is nonvolatile, i.e., it retains all information stored on it when the power is off. When a computer is turned on, an operating system stored on the hard drive is loaded into the computer's memory cache and…
EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S.

Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less
EqualWrites: Reducing Intra-set Write Variations for Enhancing Lifetime of Non-volatile Caches

DOE PAGES

Mittal, Sparsh; Vetter, Jeffrey S.

2015-01-29

Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased and hence, the researchers have explored non-volatile memories (NVMs) which provide high density and consume low-leakage power. Since NVMs have low write-endurance and the existing cache management policies are write variation-unaware, effective wear-leveling techniques are required for achieving reasonable cache lifetimes using NVMs. We present EqualWrites, a technique for mitigating intra-set write variation. In this paper, our technique works by recording the number of writes on a block and changing the cache-block location of a hot data-item to redirect themore » future writes to a cold block to achieve wear-leveling. Simulation experiments have been performed using an x86-64 simulator and benchmarks from SPEC06 and HPC (high-performance computing) field. The results show that for single, dual and quad-core system configurations, EqualWrites improves cache lifetime by 6.31X, 8.74X and 10.54X, respectively. In addition, its implementation overhead is very small and it provides larger improvement in lifetime than three other intra-set wear-leveling techniques and a cache replacement policy.« less
An evaluation of memory accuracy in food hoarding marsh tits Poecile palustris--how accurate are they compared to humans?

PubMed

Brodin, Anders; Urhan, A Utku

2013-07-01

Laboratory studies of scatter hoarding birds have become a model system for spatial memory studies. Considering that such birds are known to have a good spatial memory, recovery success in lab studies seems low. In parids (titmice and chickadees) typically ranging between 25 and 60% if five seeds are cached in 50-128 available caching sites. Since these birds store many thousands of food items in nature in one autumn one might expect that they should easily retrieve five seeds in a laboratory where they know the environment with its caching sites in detail. We designed a laboratory set up to be as similar as possible with previous studies and trained wild caught marsh tits Poecile palustris to store and retrieve in this set up. Our results agree closely with earlier studies, of the first ten looks around 40% were correct when the birds had stored five seeds in 100 available sites both 5 and 24h after storing. The cumulative success curve suggests high success during the first 15 looks where after it declines. Humans performed much better, in the first five looks most subjects were 100% correct. We discuss possible reasons for why the birds were not doing better. Copyright © 2013 Elsevier B.V. All rights reserved.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

NASA Technical Reports Server (NTRS)

Probst, David K.

1993-01-01

A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
DARPA Status Report - November 1988

DTIC Science & Technology

1988-11-01

style used in the applic4#ons reference to that block was by processor j. where j It. We was influenced by it. MACH is a multiprocessor operating S call...it can be order they occurred. However. the exact time at which the treated specially in memory management , and so most of the reference wa, made is...on cache consistency performance, sophisti- peak can be explained as clinging references that occur when cated cache management schemes that take

Heap/stack guard pages using a wakeup unit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M; Satterfield, David L; Steinmacher-Burow, Burkhard

A method and system for providing a memory access check on a processor including the steps of detecting accesses to a memory device including level-1 cache using a wakeup unit. The method includes invalidating level-1 cache ranges corresponding to a guard page, and configuring a plurality of wakeup address compare (WAC) registers to allow access to selected WAC registers. The method selects one of the plurality of WAC registers, and sets up a WAC register related to the guard page. The method configures the wakeup unit to interrupt on access of the selected WAC register. The method detects access ofmore » the memory device using the wakeup unit when a guard page is violated. The method generates an interrupt to the core using the wakeup unit, and determines the source of the interrupt. The method detects the activated WAC registers assigned to the violated guard page, and initiates a response.« less
Reducing the stochasticity of crystal nucleation to enable subnanosecond memory writing

NASA Astrophysics Data System (ADS)

Rao, Feng; Ding, Keyuan; Zhou, Yuxing; Zheng, Yonghui; Xia, Mengjiao; Lv, Shilong; Song, Zhitang; Feng, Songlin; Ronneberger, Ider; Mazzarello, Riccardo; Zhang, Wei; Ma, Evan

2017-12-01

Operation speed is a key challenge in phase-change random-access memory (PCRAM) technology, especially for achieving subnanosecond high-speed cache memory. Commercialized PCRAM products are limited by the tens of nanoseconds writing speed, originating from the stochastic crystal nucleation during the crystallization of amorphous germanium antimony telluride (Ge2Sb2Te5). Here, we demonstrate an alloying strategy to speed up the crystallization kinetics. The scandium antimony telluride (Sc0.2Sb2Te3) compound that we designed allows a writing speed of only 700 picoseconds without preprogramming in a large conventional PCRAM device. This ultrafast crystallization stems from the reduced stochasticity of nucleation through geometrically matched and robust scandium telluride (ScTe) chemical bonds that stabilize crystal precursors in the amorphous state. Controlling nucleation through alloy design paves the way for the development of cache-type PCRAM technology to boost the working efficiency of computing systems.
Analysis of the Intel 386 and i486 microprocessors for the Space Station Freedom Data Management System

NASA Technical Reports Server (NTRS)

Liu, Yuan-Kwei

1991-01-01

The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/386 DX memory hierachy appears to be the most beneficial change to the current DMS design at this time.
Analysis of the Intel 386 and i486 microprocessors for the Space Station Freedom Data Management System

NASA Technical Reports Server (NTRS)

Liu, Yuan-Kwei

1991-01-01

The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/387 DX memory hierarchy appears to be the most beneficial change to the current DMS design at this time.
Turtle: identifying frequent k-mers with cache-efficient algorithms.

PubMed

Roy, Rajat Shuvro; Bhattacharya, Debashish; Schliep, Alexander

2014-07-15

Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result of sequencing errors. The frequent k-mers constitute a reduced but error-free representation of the experiment, which can inform read error correction or serve as the input to de novo assembly methods. Ideally, the memory requirement for counting should be linear in the number of frequent k-mers and not in the, typically much larger, total number of k-mers in the read library. We present a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Our method is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches. A comparison with the state-of-the-art shows reduced memory requirements and running times. The tools are freely available for download at http://bioinformatics.rutgers.edu/Software/Turtle and http://figshare.com/articles/Turtle/791582. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Architectural Techniques For Managing Non-volatile Caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less
Compiler-directed cache management in multiprocessors

NASA Technical Reports Server (NTRS)

Cheong, Hoichi; Veidenbaum, Alexander V.

1990-01-01

The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.
Memory hierarchy using row-based compression

DOEpatents

Loh, Gabriel H.; O'Connor, James M.

2016-10-25

A system includes a first memory and a device coupleable to the first memory. The device includes a second memory to cache data from the first memory. The second memory includes a plurality of rows, each row including a corresponding set of compressed data blocks of non-uniform sizes and a corresponding set of tag blocks. Each tag block represents a corresponding compressed data block of the row. The device further includes decompression logic to decompress data blocks accessed from the second memory. The device further includes compression logic to compress data blocks to be stored in the second memory.
The Processing of Novel and Lexicalised Prefixed Words in Reading

ERIC Educational Resources Information Center

Pollatsek, Alexander; Slattery, Timothy J.; Juhasz, Barbara

2008-01-01

Two experiments compared how relatively long novel prefixed words (e.g., "overfarm") and existing prefixed words were processed in reading. The use of novel prefixed words allows one to examine the roles of whole-word access and decompositional processing in the processing of non-novel prefixed words. The two experiments found that,…
Modified stretched exponential model of computer system resources management limitations-The case of cache memory

NASA Astrophysics Data System (ADS)

Strzałka, Dominik; Dymora, Paweł; Mazurek, Mirosław

2018-02-01

In this paper we present some preliminary results in the field of computer systems management with relation to Tsallis thermostatistics and the ubiquitous problem of hardware limited resources. In the case of systems with non-deterministic behaviour, management of their resources is a key point that guarantees theirs acceptable performance and proper working. This is very wide problem that stands for many challenges in financial, transport, water and food, health, etc. areas. We focus on computer systems with attention paid to cache memory and propose to use an analytical model that is able to connect non-extensive entropy formalism, long-range dependencies, management of system resources and queuing theory. Obtained analytical results are related to the practical experiment showing interesting and valuable results.
Locality-Conscious Lock-Free Linked Lists

NASA Astrophysics Data System (ADS)

Braginsky, Anastasia; Petrank, Erez

We extend state-of-the-art lock-free linked lists by building linked lists with special care for locality of traversals. These linked lists are built of sequences of entries that reside on consecutive chunks of memory. When traversing such lists, subsequent entries typically reside on the same chunk and are thus close to each other, e.g., in same cache line or on the same virtual memory page. Such cache-conscious implementations of linked lists are frequently used in practice, but making them lock-free requires care. The basic component of this construction is a chunk of entries in the list that maintains a minimum and a maximum number of entries. This basic chunk component is an interesting tool on its own and may be used to build other lock-free data structures as well.
Prefixation of Simplex Pairs in Czech: An Analysis of Spatial Semantics, Distributive Verbs, and Procedural Meanings

ERIC Educational Resources Information Center

Hilchey, Christian Thomas

2014-01-01

This dissertation examines prefixation of simplex pairs. A simplex pair consists of an iterative imperfective and a semelfactive perfective verb. When prefixed, both of these verbs are perfective. The prefixed forms derived from semelfactives are labeled single act verbs, while the prefixed forms derived from iterative imperfective simplex verbs…
Efficient Aho-Corasick String Matching on Emerging Multicore Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste; Secchi, Simone

String matching algorithms are critical to several scientific fields. Beside text processing and databases, emerging applications such as DNA protein sequence analysis, data mining, information security software, antivirus, ma- chine learning, all exploit string matching algorithms [3]. All these applica- tions usually process large quantity of textual data, require high performance and/or predictable execution times. Among all the string matching algorithms, one of the most studied, especially for text processing and security applica- tions, is the Aho-Corasick algorithm. 1 2 Book title goes here Aho-Corasick is an exact, multi-pattern string matching algorithm which performs the search in a time linearlymore » proportional to the length of the input text independently from pattern set size. However, depending on the imple- mentation, when the number of patterns increase, the memory occupation may raise drastically. In turn, this can lead to significant variability in the performance, due to the memory access times and the caching effects. This is a significant concern for many mission critical applications and modern high performance architectures. For example, security applications such as Network Intrusion Detection Systems (NIDS), must be able to scan network traffic against very large dictionaries in real time. Modern Ethernet links reach up to 10 Gbps, and malicious threats are already well over 1 million, and expo- nentially growing [28]. When performing the search, a NIDS should not slow down the network, or let network packets pass unchecked. Nevertheless, on the current state-of-the-art cache based processors, there may be a large per- formance variability when dealing with big dictionaries and inputs that have different frequencies of matching patterns. In particular, when few patterns are matched and they are all in the cache, the procedure is fast. Instead, when they are not in the cache, often because many patterns are matched and the caches are continuously thrashed, they should be retrieved from the system memory and the procedure is slowed down by the increased latency. Efficient implementations of string matching algorithms have been the fo- cus of several works, targeting Field Programmable Gate Arrays [4, 25, 15, 5], highly multi-threaded solutions like the Cray XMT [34], multicore proces- sors [19] or heterogeneous processors like the Cell Broadband Engine [35, 22]. Recently, several researchers have also started to investigate the use Graphic Processing Units (GPUs) for string matching algorithms in security applica- tions [20, 10, 32, 33]. Most of these approaches mainly focus on reaching high peak performance, or try to optimize the memory occupation, rather than looking at performance stability. However, hardware solutions supports only small dictionary sizes due to lack of memory and are difficult to customize, while platforms such as the Cell/B.E. are very complex to program.« less
Memory for Multiple Cache Locations and Prey Quantities in a Food-Hoarding Songbird

PubMed Central

Armstrong, Nicola; Garland, Alexis; Burns, K. C.

2012-01-01

Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training. PMID:23293622
Memory for multiple cache locations and prey quantities in a food-hoarding songbird.

PubMed

Armstrong, Nicola; Garland, Alexis; Burns, K C

2012-01-01

Most animals can discriminate between pairs of numbers that are each less than four without training. However, North Island robins (Petroica longipes), a food-hoarding songbird endemic to New Zealand, can discriminate between quantities of items as high as eight without training. Here we investigate whether robins are capable of other complex quantity discrimination tasks. We test whether their ability to discriminate between small quantities declines with (1) the number of cache sites containing prey rewards and (2) the length of time separating cache creation and retrieval (retention interval). Results showed that subjects generally performed above-chance expectations. They were equally able to discriminate between different combinations of prey quantities that were hidden from view in 2, 3, and 4 cache sites from between 1, 10, and 60 s. Overall results indicate that North Island robins can process complex quantity information involving more than two discrete quantities of items for up to 1 min long retention intervals without training.
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size

NASA Astrophysics Data System (ADS)

Shao, Rui; Zhong, Sheng; Yan, Luxin

2015-12-01

Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
The Role of Configurational Asymmetry in the Lexical Access of Prefixed Verbs: Evidence from French

ERIC Educational Resources Information Center

Tsapkini, Kyrana; Jarema, Gonia; Di Sciullo, Anna-Maria

2004-01-01

In this paper we investigated the effects of configurational asymmetry in prefixed verbs in French. We used a simple lexical decision paradigm to compare prefixed verbs with external and internal prefixes as specified in linguistic theory (Di Sciullo, 1997) where external prefixes do not change the aktionsart and the verb argument structure of the…
WriteShield: A Pseudo Thin Client for Prevention of Information Leakage

NASA Astrophysics Data System (ADS)

Kirihata, Yasuhiro; Sameshima, Yoshiki; Onoyama, Takashi; Komoda, Norihisa

While thin-client systems are diffusing as an effective security method in enterprises and organizations, there is a new approach called pseudo thin-client system. In this system, local disks of clients are write-protected and user data is forced to save on the central file server to realize the same security effect of conventional thin-client systems. Since it takes purely the software-based simple approach, it does not require the hardware enhancement of network and servers to reduce the installation cost. However there are several problems such as no write control to external media, memory depletion possibility, and lower security because of the exceptional write permission to the system processes. In this paper, we propose WriteShield, a pseudo thin-client system which solves these issues. In this system, the local disks are write-protected with volume filter driver and it has a virtual cache mechanism to extend the memory cache size for the write protection. This paper presents design and implementation details of WriteShield. Besides we describe the security analysis and simulation evaluation of paging algorithms for virtual cache mechanism and measure the disk I/O performance to verify its feasibility in the actual environment.
Evolution of magnetic disk subsystems

NASA Astrophysics Data System (ADS)

Kaneko, Satoru

1994-06-01

The higher recording density of magnetic disk realized today has brought larger storage capacity per unit and smaller form factors. If the required access performance per MB is constant, the performance of large subsystems has to be several times better. This article describes mainly the technology for improving the performance of the magnetic disk subsystems and the prospects of their future evolution. Also considered are 'crosscall pathing' which makes the data transfer channel more effective, 'disk cache' which improves performance coupling with solid state memory technology, and 'RAID' which improves the availability and integrity of disk subsystems by organizing multiple disk drives in a subsystem. As a result, it is concluded that since the performance of the subsystem is dominated by that of the disk cache, maximation of the performance of the disk cache subsystems is very important.
Checkpoint repair for high-performance out-of-order execution machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hwu, W.M.W.; Patt, Y.N.

Out-or-order execution and branch prediction are two mechanisms that can be used profitably in the design of supercomputers to increase performance. Proper exception handling and branch prediction miss handling in an out-of-order execution machine to require some kind of repair mechanism which can restore the machine to a known previous state. In this paper the authors present a class of repair mechanisms using the concept of checkpointing. The authors derive several properties of checkpoint repair mechanisms. In addition, they provide algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware, which also require nomore » additional complexity or time for use with write-back cache memory systems than they do with write-through cache memory systems, contrary to statements made by previous researchers.« less

Improved cache performance in Monte Carlo transport calculations using energy banding

NASA Astrophysics Data System (ADS)

Siegel, A.; Smith, K.; Felker, K.; Romano, P.; Forget, B.; Beckman, P.

2014-04-01

We present an energy banding algorithm for Monte Carlo (MC) neutral particle transport simulations which depend on large cross section lookup tables. In MC codes, read-only cross section data tables are accessed frequently, exhibit poor locality, and are typically too much large to fit in fast memory. Thus, performance is often limited by long latencies to RAM, or by off-node communication latencies when the data footprint is very large and must be decomposed on a distributed memory machine. The proposed energy banding algorithm allows maximal temporal reuse of data in band sizes that can flexibly accommodate different architectural features. The energy banding algorithm is general and has a number of benefits compared to the traditional approach. In the present analysis we explore its potential to achieve improvements in time-to-solution on modern cache-based architectures.
Evidence against observational spatial memory for cache locations of conspecifics in marsh tits Poecile palustris.

PubMed

Urhan, A Utku; Emilsson, Ellen; Brodin, Anders

2017-01-01

Many species in the family Paridae, such as marsh tits Poecile palustris , are large-scale scatter hoarders of food that make cryptic caches and disperse these in large year-round territories. The perhaps most well-known species in the family, the great tit Parus major , does not store food itself but is skilled in stealing caches from the other species. We have previously demonstrated that great tits are able to memorise positions of caches they have observed marsh tits make and later return and steal the food. As great tits are explorative in nature and unusually good learners, it is possible that such "memorisation of caches from a distance" is a unique ability of theirs. The other possibility is that this ability is general in the parid family. Here, we tested marsh tits in the same experimental set-up as where we previously have tested great tits. We allowed caged marsh tits to observe a caching conspecific in a specially designed indoor arena. After a retention interval of 1 or 24 h, we allowed the observer to enter the arena and search for the caches. The marsh tits showed no evidence of such observational memorization ability, and we believe that such ability is more useful for a non-hoarding species. Why should a marsh tit that memorises hundreds of their own caches in the field bother with the difficult task of memorising other individuals' caches? We argue that the close-up memorisation procedure that marsh tits use at their own caches may be a different type of observational learning than memorisation of caches made by others. For example, the latter must be done from a distance and hence may require the ability to adopt an allocentric perspective, i.e. the ability to visualise the cache from the hoarder's perspective. Members of the Paridae family are known to possess foraging techniques that are cognitively advanced. Previously, we have demonstrated that a non-hoarding parid species, the great tit P. major , is able to memorise positions of caches that they have observed marsh tits P. palustris make. However, it is unknown whether this cognitively advanced foraging strategy is unique to great tits or if it occurs also in other parids. Here, we demonstrated that "pilfering by observational memorization strategy" is not a general strategy in parids. We believe that such ability is important for a non-hoarding species such as the great tit and, most likely, birds owning many caches do not need this foraging strategy.
Memory management and compiler support for rapid recovery from failures in computer systems

NASA Technical Reports Server (NTRS)

Fuchs, W. K.

1991-01-01

This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.
Power reduction by power gating in differential pair type spin-transfer-torque magnetic random access memories for low-power nonvolatile cache memories

NASA Astrophysics Data System (ADS)

Ohsawa, Takashi; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo

2014-01-01

Array operation currents in spin-transfer-torque magnetic random access memories (STT-MRAMs) that use four differential pair type magnetic tunnel junction (MTJ)-based memory cells (4T2MTJ, two 6T2MTJs and 8T2MTJ) are simulated and compared with that in SRAM. With L3 cache applications in mind, it is assumed that the memories are composed of 32 Mbyte capacity to be accessed in 64 byte in parallel. All the STT-MRAMs except for the 8T2MTJ one are designed with 32 bit fine-grained power gating scheme applied to eliminate static currents in the memory cells that are not accessed. The 8T2MTJ STT-MRAM, the cell’s design concept being not suitable for the fine-grained power gating, loads and saves 32 Mbyte data in 64 Mbyte unit per 1 Mbit sub-array in 2 × 103 cycles. It is shown that the array operation current of the 4T2MTJ STT-MRAM is 70 mA averaged in 15 ns write cycles at Vdd = 0.9 V. This is the smallest among the STT-MRAMs, about the half of the low standby power (LSTP) SRAM whose array operation current is totally dominated by the cells’ subthreshold leakage.
Programmable partitioning for high-performance coherence domains in a multiprocessor system

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

2011-01-25

A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.
Recognizing Prefixes in Scientific Quantities

NASA Astrophysics Data System (ADS)

Sokolowski, Andrzej

2015-09-01

Although recognizing prefixes in physical quantities is inherent for practitioners, it might not be inherent for students, who do not use prefixes in their everyday life experiences. This deficiency surfaces in AP Physics exams. For example, readers of an AP Physics exam reported "a common mistake of incorrectly converting nanometers to meters." Similar students' mistakes were reported also by AP Chemistry readers "as in previous years, students still had difficulty converting kJ to J." While traditional teaching focuses on memorizing the symbols of prefixes, little attention is given to helping learners recognize a prefix in a given quantity. I noticed in my teaching practice that by making the processes of identifying prefixes more explicit, students make fewer mistakes on unit conversion. Thus, this paper presents an outline of a lesson that focuses on prefix recognition. It is designed for a first-year college physics class; however, its key points can be addressed to any group of physics students.
Errors made by animals in memory paradigms are not always due to failure of memory.

PubMed

Wilkie, D M; Willson, R J; Carr, J A

1999-01-01

It is commonly assumed that errors in animal memory paradigms such as delayed matching to sample, radial mazes, and food-cache recovery are due to failures in memory for information necessary to perform the task successfully. A body of research, reviewed here, suggests that this is not always the case: animals sometimes make errors despite apparently being able to remember the appropriate information. In this paper a case study of this phenomenon is described, along with a demonstration of a simple procedural modification that successfully reduced these non-memory errors, thereby producing a better measure of memory.
gpuSPHASE-A shared memory caching implementation for 2D SPH using CUDA

NASA Astrophysics Data System (ADS)

Winkler, Daniel; Meister, Michael; Rezavand, Massoud; Rauch, Wolfgang

2017-04-01

Smoothed particle hydrodynamics (SPH) is a meshless Lagrangian method that has been successfully applied to computational fluid dynamics (CFD), solid mechanics and many other multi-physics problems. Using the method to solve transport phenomena in process engineering requires the simulation of several days to weeks of physical time. Based on the high computational demand of CFD such simulations in 3D need a computation time of years so that a reduction to a 2D domain is inevitable. In this paper gpuSPHASE, a new open-source 2D SPH solver implementation for graphics devices, is developed. It is optimized for simulations that must be executed with thousands of frames per second to be computed in reasonable time. A novel caching algorithm for Compute Unified Device Architecture (CUDA) shared memory is proposed and implemented. The software is validated and the performance is evaluated for the well established dambreak test case.
Reducing the stochasticity of crystal nucleation to enable subnanosecond memory writing.

PubMed

Rao, Feng; Ding, Keyuan; Zhou, Yuxing; Zheng, Yonghui; Xia, Mengjiao; Lv, Shilong; Song, Zhitang; Feng, Songlin; Ronneberger, Ider; Mazzarello, Riccardo; Zhang, Wei; Ma, Evan

2017-12-15

Operation speed is a key challenge in phase-change random-access memory (PCRAM) technology, especially for achieving subnanosecond high-speed cache memory. Commercialized PCRAM products are limited by the tens of nanoseconds writing speed, originating from the stochastic crystal nucleation during the crystallization of amorphous germanium antimony telluride (Ge 2 Sb 2 Te 5 ). Here, we demonstrate an alloying strategy to speed up the crystallization kinetics. The scandium antimony telluride (Sc 0.2 Sb 2 Te 3 ) compound that we designed allows a writing speed of only 700 picoseconds without preprogramming in a large conventional PCRAM device. This ultrafast crystallization stems from the reduced stochasticity of nucleation through geometrically matched and robust scandium telluride (ScTe) chemical bonds that stabilize crystal precursors in the amorphous state. Controlling nucleation through alloy design paves the way for the development of cache-type PCRAM technology to boost the working efficiency of computing systems. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Dietary folate, vitamin B-12, vitamin B-6 and incident Alzheimer's disease: the cache county memory, health and aging study.

PubMed

Nelson, C; Wengreen, H J; Munger, R G; Corcoran, C D

2009-12-01

To examine associations between dietary and supplemental folate, vitamin B-12 and vitamin B-6 and incident Alzheimer's disease (AD) among elderly men and women. Data collected were from participants of the Cache County Memory, Health and Aging Study, a longitudinal study of 5092 men and women 65 years and older who were residents of Cache County, Utah in 1995. Multistage clinical assessment procedures were used to identify incident cases of AD. Dietary data were collected using a 142-item food frequency questionnaire. Cox Proportional Hazards (CPH) modeling was used to determine hazard ratios across quintiles of micronutrient intake. 202 participants were diagnosed with incident AD during follow-up (1995-2004). In multivariable CPH models that controlled for the effects of gender, age, education, and other covariates there were no observed differences in risk of AD or dementia by increasing quintiles of total intake of folate, vitamin B-12, or vitamin B-6. Similarly, there were no observed differences in risk of AD by regular use of either folate or B6 supplements. Dietary intake of B-vitamins from food and supplemental sources appears unrelated to incidence of dementia and AD. Further studies examining associations between dietary intakes of B-vitamins, biomarkers of B-vitamin status and cognitive endpoints are warranted.
Automated Cache Performance Analysis And Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mohror, Kathryn

While there is no lack of performance counter tools for coarse-grained measurement of cache activity, there is a critical lack of tools for relating data layout to cache behavior to application performance. Generally, any nontrivial optimizations are either not done at all, or are done ”by hand” requiring significant time and expertise. To the best of our knowledge no tool available to users measures the latency of memory reference instructions for partic- ular addresses and makes this information available to users in an easy-to-use and intuitive way. In this project, we worked to enable the Open|SpeedShop performance analysis tool tomore » gather memory reference latency information for specific instructions and memory ad- dresses, and to gather and display this information in an easy-to-use and intuitive way to aid performance analysts in identifying problematic data structures in their codes. This tool was primarily designed for use in the supercomputer domain as well as grid, cluster, cloud-based parallel e-commerce, and engineering systems and middleware. Ultimately, we envision a tool to automate optimization of application cache layout and utilization in the Open|SpeedShop performance analysis tool. To commercialize this soft- ware, we worked to develop core capabilities for gathering enhanced memory usage per- formance data from applications and create and apply novel methods for automatic data structure layout optimizations, tailoring the overall approach to support existing supercom- puter and cluster programming models and constraints. In this Phase I project, we focused on infrastructure necessary to gather performance data and present it in an intuitive way to users. With the advent of enhanced Precise Event-Based Sampling (PEBS) counters on recent Intel processor architectures and equivalent technology on AMD processors, we are now in a position to access memory reference information for particular addresses. Prior to the introduction of PEBS counters, cache behavior could only be measured reliably in the ag- gregate across tens or hundreds of thousands of instructions. With the newest iteration of PEBS technology, cache events can be tied to a tuple of instruction pointer, target address (for both loads and stores), memory hierarchy, and observed latency. With this information we can now begin asking questions regarding the efficiency of not only regions of code, but how these regions interact with particular data structures and how these interactions evolve over time. In the short term, this information will be vital for performance analysts understanding and optimizing the behavior of their codes for the memory hierarchy. In the future, we can begin to ask how data layouts might be changed to improve performance and, for a particular application, what the theoretical optimal performance might be. The overall benefit to be produced by this effort was a commercial quality easy-to- use and scalable performance tool that will allow both beginner and experienced parallel programmers to automatically tune their applications for optimal cache usage. Effective use of such a tool can literally save weeks of performance tuning effort. Easy to use. With the proposed innovations, finding and fixing memory performance issues would be more automated and hide most to all of the performance engineer exper- tise ”under the hood” of the Open|SpeedShop performance tool. One of the biggest public benefits from the proposed innovations is that it makes performance analysis more usable to a larger group of application developers. Intuitive reporting of results. The Open|SpeedShop performance analysis tool has a rich set of intuitive, yet detailed reports for presenting performance results to application developers. Our goal was to leverage this existing technology to present the results from our memory performance addition to Open|SpeedShop. Suitable for experts as well as novices. Application performance is getting more difficult to measure as the hardware platforms they run on become more complicated. This makes life difficult for the application developer, in that they need to know more about the hardware platform, including the memory system hierarchy, in order to understand the performance of their application. Some application developers are comfortable in that sce- nario, while others want to do their scientific research and not have to understand all the nuances in the hardware platform they are running their application on. Our proposed innovations were aimed to support both experts and novice performance analysts. Useful in many markets. The enhancement to Open|SpeedShop would appeal to a broader market space, as it will be useful in scientific, commercial, and cloud computing environments. Our goal was to use technology developed initially at the and Lawrence Livermore Na- tional Laboratory combined with the development and commercial software experience of the Argo Navis Technologies, LLC (ANT) to form a powerful combination to delivery these objectives.« less
Efficient implementation of parallel three-dimensional FFT on clusters of PCs

NASA Astrophysics Data System (ADS)

Takahashi, Daisuke

2003-05-01

In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Morphological Decomposition in the Recognition of Prefixed and Suffixed Words: Evidence from Korean

ERIC Educational Resources Information Center

Kim, Say Young; Wang, Min; Taft, Marcus

2015-01-01

Korean has visually salient syllable units that are often mapped onto either prefixes or suffixes in derived words. In addition, prefixed and suffixed words may be processed differently given a left-to-right parsing procedure and the need to resolve morphemic ambiguity in prefixes in Korean. To test this hypothesis, four experiments using the…
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sczyrba, Alex; Pratap, Abhishek; Canon, Shane

2011-03-22

Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Performance Prediction Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chennupati, Gopinath; Santhi, Nanadakishore; Eidenbenz, Stephen

The Performance Prediction Toolkit (PPT), is a scalable co-design tool that contains the hardware and middle-ware models, which accept proxy applications as input in runtime prediction. PPT relies on Simian, a parallel discrete event simulation engine in Python or Lua, that uses the process concept, where each computing unit (host, node, core) is a Simian entity. Processes perform their task through message exchanges to remain active, sleep, wake-up, begin and end. The PPT hardware model of a compute core (such as a Haswell core) consists of a set of parameters, such as clock speed, memory hierarchy levels, their respective sizes,more » cache-lines, access times for different cache levels, average cycle counts of ALU operations, etc. These parameters are ideally read off a spec sheet or are learned using regression models learned from hardware counters (PAPI) data. The compute core model offers an API to the software model, a function called time_compute(), which takes as input a tasklist. A tasklist is an unordered set of ALU, and other CPU-type operations (in particular virtual memory loads and stores). The PPT application model mimics the loop structure of the application and replaces the computational kernels with a call to the hardware model's time_compute() function giving tasklists as input that model the compute kernel. A PPT application model thus consists of tasklists representing kernels and the high-er level loop structure that we like to think of as pseudo code. The key challenge for the hardware model's time_compute-function is to translate virtual memory accesses into actual cache hierarchy level hits and misses.PPT also contains another CPU core level hardware model, Analytical Memory Model (AMM). The AMM solves this challenge soundly, where our previous alternatives explicitly include the L1,L2,L3 hit-rates as inputs to the tasklists. Explicit hit-rates inevitably only reflect the application modeler's best guess, perhaps informed by a few small test problems using hardware counters; also, hard-coded hit-rates make the hardware model insensitive to changes in cache sizes. Alternatively, we use reuse distance distributions in the tasklists. In general, reuse profiles require the application modeler to run a very expensive trace analysis on the real code that realistically can be done at best for small examples.« less
PROSPECTIVE STUDY OF READY-TO-EAT BREAKFAST CEREAL CONSUMPTION AND COGNITIVE DECLINE AMONG ELDERLY MEN AND WOMEN IN CACHE COUNTY, UTAH, STUDY ON MEMORY, HEALTH, AND AGING

PubMed Central

WENGREEN, H.; NELSON, C.; MUNGER, R.G.; CORCORAN, C.

2013-01-01

Objective To examine associations between frequency of ready-to-eat-cereal (RTEC) consumption and cognitive function among elderly men and women of the Cache County Study on Memory and Aging in Utah. Design A population-based prospective cohort study established in Cache County, Utah in 1995. Setting and Participants 3831 men and women > 65 years of age who were living in Cache County, Utah in 1995. Measurement Diet was assessed using a 142-item food frequency questionnaire at baseline. Cognitive function was assessed using an adapted version of the Modified Mini-Mental State examination (3MS) at baseline and three subsequent interviews over 11 years. RTEC consumption was defined as daily, weekly, or infrequent use. Results In multivariable models, more frequent RTEC consumption was not associated with a cognitive benefit. Those consuming RTEC weekly but less than daily scored higher on their baseline 3MS than did those consuming RTEC more or less frequently (91.7, 90.6, 90.6, respectively; p-value <0.001). This association was maintained across 11 years of observation such that those consuming RTEC weekly but less than daily declined on average 3.96 points compared to an average 5.13 and 4.57 point decline for those consuming cereal more or less frequently (p-value = 0.0009). Conclusion Those consuming RTEC at least daily had poorer cognitive performance at baseline and over 11 years of follow-up compared to those who consumed cereal more or less frequently. RTEC is a nutrient dense food, but should not replace the consumption of other healthy foods in the diets’ of elderly people. Associations between RTEC consumption, dietary patterns, and cognitive function deserve further study. PMID:21369668
Implementation Of The Configurable Fault Tolerant System Experiment On NPSAT 1

DTIC Science & Technology

2016-03-01

REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE IMPLEMENTATION OF THE CONFIGURABLE FAULT TOLERANT SYSTEM EXPERIMENT ON NPSAT...open-source microprocessor without interlocked pipeline stages (MIPS) based processor softcore, a cached memory structure capable of accessing double...data rate type three and secure digital card memories, an interface to the main satellite bus, and XILINX’s soft error mitigation softcore. The
Data Movement Dominates: Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacob, Bruce L.

Over the past three years in this project, what we have observed is that the primary reason for data movement in large-scale systems is that the per-node capacity is not large enough—i.e., one of the solutions to the data-movement problem (certainly not the only solution that is required, but a significant one nonetheless) is to increase per-node capacity so that inter-node traffic is reduced. This unfortunately is not as simple as it sounds. Today’s main memory systems for datacenters, enterprise computing systems, and supercomputers, fail to provide high per-socket capacity [Dirik & Jacob 2009; Cooper-Balis et al. 2012], except atmore » extremely high price points (factors of 10–100x the cost/bit of consumer main-memory systems) [Stokes 2008]. The reason is that our choice of technology for today’s main memory systems—i.e., DRAM, which we have used as a main-memory technology since the 1970s [Jacob et al. 2007]—can no longer keep up with our needs for density and price per bit. Main memory systems have always been built from the cheapest, densest, lowest-power memory technology available, and DRAM is no longer the cheapest, the densest, nor the lowest-power storage technology out there. It is now time for DRAM to go the way that SRAM went: move out of the way for a cheaper, slower, denser storage technology, and become a cache instead. This inflection point has happened before, in the context of SRAM yielding to DRAM. There was once a time that SRAM was the storage technology of choice for all main memories [Tomasulo 1967; Thornton 1970; Kidder 1981]. However, once DRAM hit volume production in the 1970s and 80s, it supplanted SRAM as a main memory technology because it was cheaper, and it was denser. It also happened to be lower power, but that was not the primary consideration of the day. At the time, it was recognized that DRAM was much slower than SRAM, but it was only at the supercomputer level (For instance the Cray X-MP in the 1980s and its follow-on, the Cray Y-MP, in the 1990s) that could one afford to build ever- larger main memories out of SRAM—the reasoning for moving to DRAM was that an appropriately designed memory hierarchy, built of DRAM as main memory and SRAM as a cache, would approach the performance of SRAM, at the price-per-bit of DRAM [Mashey 1999]. Today it is quite clear that, were one to build an entire multi-gigabyte main memory out of SRAM instead of DRAM, one could improve the performance of almost any computer system by up to an order of magnitude—but this option is not even considered, because to build that system would be prohibitively expensive. It is now time to revisit the same design choice in the context of modern technologies and modern systems. For reasons both technical and economic, we can no longer afford to build ever-larger main memory systems out of DRAM. Flash memory, on the other hand, is significantly cheaper and denser than DRAM and therefore should take its place. While it is true that flash is significantly slower than DRAM, one can afford to build much larger main memories out of flash than out of DRAM, and we show that an appropriately designed memory hierarchy, built of flash as main memory and DRAM as a cache, will approach the performance of DRAM, at the price-per-bit of flash. In our studies as part of this project, we have investigated Non-Volatile Main Memory (NVMM), a new main-memory architecture for large-scale computing systems, one that is specifically designed to address the weaknesses described previously. In particular, it provides the following features: non-volatility: The bulk of the storage is comprised of NAND flash, and in this organization DRAM is used only as a cache, not as main memory. Furthermore, the flash is journaled, which means that operations such as checkpoint/restore are already built into the system. 1+ terabytes of storage per socket: SSDs and DRAM DIMMs have roughly the same form factor (several square inches of PCB surface area), and terabyte SSDs are now commonplace. performance approaching that of DRAM: DRAM is used as a cache to the flash system. price-per-bit approaching that of NAND: Flash is currently well under $0.50 per gigabyte; DDR3 SDRAM is currently just over $10 per gigabyte [Newegg 2014]. Even today, one can build an easily affordable main memory system with a terabyte or more of NAND storage per CPU socket (which would be extremely expensive were one to use DRAM), and our cycle- accurate, full-system experiments show that this can be done at a performance point that lies within a factor of two of DRAM.« less
Parallelization Issues and Particle-In Codes.

NASA Astrophysics Data System (ADS)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Text Processing and Formatting: Composure, Composition and Eros.

ERIC Educational Resources Information Center

Blair, John C., Jr.

1984-01-01

Review of computer software offering text editing/processing capabilities highlights work habits, elements of computer style and composition, buffers, the CRT, line- and screen-oriented text editors, video attributes, "swapping,""cache" memory, "disk emulators," text editing versus text processing, and UNIX operating…

Library API for Z-Order Memory Layout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E. Wes

This library provides a simple-to-use API for implementing an altnerative to traditional row-major order in-memory layout, one based on a Morton- order space filling curve (SFC) , specifically, a Z-order variant of the Morton order curve. The library enables programmers to, after a simple initialization step, to convert a multidimensional array from row-major to Z- order layouts, then use a single, generic API call to access data from any arbitrary (i,j,k) location from within the array, whether it it be stored in row- major or z-order format. The motivation for using a SFC in-memory layout is for improved spatial locality,more » which results in increased use of local high speed cache memory. The basic idea is that with row-major order layouts, a data access to some location that is nearby in index space is likely far away in physical memory, resulting in poor spatial locality and slow runtime. On the other hand, with a SFC-based layout, accesses that are nearby in index space are much more likely to also be nearby in physical memory, resulting in much better spatial locality, and better runtime performance. Numerous studies over the years have shown significant runtime performance gains are realized by using a SFC-based memory layout compared to a row-major layout, sometimes by as much as 50%, which result from the better use of the memory and cache hierarchy that are attendant with a SFC-based layout (see, for example, [Beth2012]). This library implementation is intended for use with codes that work with structured, array-based data in 2 or 3 dimensions. It is not appropriate for use with unstructured or point-based data.« less
14 CFR § 1274.212 - Document format and numbering.

Code of Federal Regulations, 2014 CFR

2014-01-01

... implemented, cooperative agreement numbering shall conform to NFS 1804.7102, except that a NCC prefix will be used in lieu of the NAS prefix. Along with the prefix NCC, a one or two digit Center Identification...
14 CFR 1274.212 - Document format and numbering.

Code of Federal Regulations, 2012 CFR

2012-01-01

... implemented, cooperative agreement numbering shall conform to NFS 1804.7102, except that a NCC prefix will be used in lieu of the NAS prefix. Along with the prefix NCC, a one or two digit Center Identification...
14 CFR 1274.212 - Document format and numbering.

Code of Federal Regulations, 2011 CFR

2011-01-01

... implemented, cooperative agreement numbering shall conform to NFS 1804.7102, except that a NCC prefix will be used in lieu of the NAS prefix. Along with the prefix NCC, a one or two digit Center Identification...
14 CFR 1274.212 - Document format and numbering.

Code of Federal Regulations, 2013 CFR

2013-01-01

... implemented, cooperative agreement numbering shall conform to NFS 1804.7102, except that a NCC prefix will be used in lieu of the NAS prefix. Along with the prefix NCC, a one or two digit Center Identification...
14 CFR 1274.212 - Document format and numbering.

Code of Federal Regulations, 2010 CFR

2010-01-01

... implemented, cooperative agreement numbering shall conform to NFS 1804.7102, except that a NCC prefix will be used in lieu of the NAS prefix. Along with the prefix NCC, a one or two digit Center Identification...
Constraints on Negative Prefixation in Polish Sign Language.

PubMed

Tomaszewski, Piotr

2015-01-01

The aim of this article is to describe a negative prefix, NEG-, in Polish Sign Language (PJM) which appears to be indigenous to the language. This is of interest given the relative rarity of prefixes in sign languages. Prefixed PJM signs were analyzed on the basis of both a corpus of texts signed by 15 deaf PJM users who are either native or near-native signers, and material including a specified range of prefixed signs as demonstrated by native signers in dictionary form (i.e. signs produced in isolation, not as part of phrases or sentences). In order to define the morphological rules behind prefixation on both the phonological and morphological levels, native PJM users were consulted for their expertise. The research results can enrich models for describing processes of grammaticalization in the context of the visual-gestural modality that forms the basis for sign language structure.
DIETARY FOLATE, VITAMIN B-12, VITAMIN B-6 AND INCIDENT ALZHEIMER’S DISEASE: THE CACHE COUNTY MEMORY, HEALTH, AND AGING STUDY

PubMed Central

NELSON, C.; WENGREEN, H.J.; MUNGER, R.G.; CORCORAN, C.D.

2013-01-01

Objective To examine associations between dietary and supplemental folate, vitamin B-12 and vitamin B-6 and incident Alzheimer’s disease (AD) among elderly men and women. Design, Setting and Participants Data collected were from participants of the Cache County Memory, Health and Aging Study, a longitudinal study of 5092 men and women 65 years and older who were residents of Cache County, Utah in 1995. Measurements Multistage clinical assessment procedures were used to identify incident cases of AD. Dietary data were collected using a 142-item food frequency questionnaire. Cox Proportional Hazards (CPH) modeling was used to determine hazard ratios across quintiles of micronutrient intake. Results 202 participants were diagnosed with incident AD during follow-up (1995–2004). In multivariable CPH models that controlled for the effects of gender, age, education, and other covariates there were no observed differences in risk of AD or dementia by increasing quintiles of total intake of folate, vitamin B-12, or vitamin B-6. Similarly, there were no observed differences in risk of AD by regular use of either folate or B6 supplements. Conclusion Dietary intake of B-vitamins from food and supplemental sources appears unrelated to incidence of dementia and AD. Further studies examining associations between dietary intakes of B-vitamins, biomarkers of B-vitamin status and cognitive endpoints are warranted. PMID:19924351
Efficient Maintenance and Update of Nonbonded Lists in Macromolecular Simulations.

PubMed

Chowdhury, Rezaul; Beglov, Dmitri; Moghadasi, Mohammad; Paschalidis, Ioannis Ch; Vakili, Pirooz; Vajda, Sandor; Bajaj, Chandrajit; Kozakov, Dima

2014-10-14

Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists.
A Cache Design to Exploit Structural Locality

DTIC Science & Technology

1991-12-01

memory and secondary storage. Main memory was used to store the instructions and data of an executing pro- gram, while secondary storage held programs ...efficiency of the CPU and faster turnaround of executing programs . In addition to the well known spatial and temporal aspects of locality, Hobart has...identified a third aspect, which he has called structural locality (9). This type of locality is defined as the tendency of an executing program to
Transfers and Enhancements of the Teleconferencing System and Support of the Special Operations Planning Aids

DTIC Science & Technology

1984-10-31

five colors , page forward, page back, erase, clear the page, store previously annotated material, and later retrieve it. From this developed a four...system to secure sites. These * enchancements are discussed below. -2- .7- -. . . --. J -. . . . .. . . . . . . . ..- . _77 . -.- 2.1 Enhancements to the...and large cache memory of the Winchester drive allows the SGWS software to run much faster when doing file access or direct memory access (DMA) than
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
High-Assurance System Support through 3-D Integration

DTIC Science & Technology

2007-11-09

algorithms ), tagging, and in selected systems, offensive mecha- nisms. For example, we can exploit the control plane to tag all traffic traveling...October 2005. [35] D. Page. Theoretical use of cache memory as a cryptanalytic side-channel. Technical Report CSTR - 02-003, Department of Computer
Quantifying animal movement for caching foragers: the path identification index (PII) and cougars, Puma concolor.

PubMed

Ironside, Kirsten E; Mattson, David J; Theimer, Tad; Jansen, Brian; Holton, Brandon; Arundel, Terence; Peters, Michael; Sexton, Joseph O; Edwards, Thomas C

2017-01-01

Many studies of animal movement have focused on directed versus area-restricted movement, which rely on correlations between step-length and turn-angles and on stationarity through time to define behavioral states. Although these approaches might apply well to grazing in patchy landscapes, species that either feed for short periods on large, concentrated food sources or cache food exhibit movements that are difficult to model using the traditional metrics of turn-angle and step-length alone. We used GPS telemetry collected from a prey-caching predator, the cougar ( Puma concolor, Linnaeus ), to test whether combining metrics of site recursion, spatiotemporal clustering, speed, and turning into an index of movement using partial sums, improves the ability to identify caching behavior. The index was used to identify changes in movement characteristics over time and segment paths into behavioral classes. The identification of behaviors from the Path Identification Index (PII) was evaluated using field investigations of cougar activities at GPS locations. We tested for statistical stationarity across behaviors for use of topographic view-sheds. Changes in the frequency and duration of PII were useful for identifying seasonal activities such as migration, gestation, and denning. The comparison of field investigations of cougar activities to behavioral PII classes resulted in an overall classification accuracy of 81%. Changes in behaviors were reflected in cougars' use of topographic view-sheds, resulting in statistical nonstationarity over time, and revealed important aspects of hunting behavior. Incorporating metrics of site recursion and spatiotemporal clustering revealed the temporal structure in movements of a caching forager. The movement index PII, shows promise for identifying behaviors in species that frequently return to specific locations such as food caches, watering holes, or dens, and highlights the potential role memory and cognitive abilities play in determining animal movements.
Past Participle Formation in Specific Language Impairment

ERIC Educational Resources Information Center

Kauschke, Christina; Renner, Lena F.; Domahs, Ulrike

2017-01-01

Background: German participles are formed by a co-occurrence of prefixation and suffixation. While the acquisition of regular and irregular suffixation has been investigated exhaustively, it is still unclear how German children master the prosodically determined prefixation rule (prefix "ge-"). Findings reported in the literature are…
Some pitfalls in measuring memory in animals.

PubMed

Thorpe, Christina M; Jacova, Claudia; Wilkie, Donald M

2004-11-01

Because the presence or absence of memories in the brain cannot be directly observed, scientists must rely on indirect measures and use inferential reasoning to make statements about the status of memories. In humans, memories are often accessed through spoken or written language. In animals, memory is accessed through overt behaviours such as running down an arm in a maze, pressing a lever, or visiting a food cache site. Because memory is measured by these indirect methods, errors in the veracity of statements about memory can occur. In this brief paper, we identify three areas that may serve as pitfalls in reasoning about memory in animals: (1) the presence of 'silent associations', (2) intrusions of species-typical behaviours on memory tasks, and (3) improper mapping between human and animals memory tasks. There are undoubtedly other areas in which scientists should act cautiously when reasoning about the status of memory.
A Program for Teaching Russian Verb Prefixation.

ERIC Educational Resources Information Center

Schupbach, R. D.

1979-01-01

In this five- to ten-hour presentation, intermediate and advanced students of Russian learn how prefixation affects all types of motion in terms of displacement, transitivity, and perfectivity. The features of the prefix are detailed. Throughout, changes in government (subject, object, and prepositional complements) are explained in relation to…
Recognizing Prefixes in Scientific Quantities

ERIC Educational Resources Information Center

Sokolowski, Andrzej

2015-01-01

Although recognizing prefixes in physical quantities is inherent for practitioners, it might not be inherent for students, who do not use prefixes in their everyday life experiences. This deficiency surfaces in AP Physics exams. For example, readers of an AP Physics exam reported "a common mistake of incorrectly converting nanometers to…
Medical Terminology: Prefixes. Health Occupations Education Module.

ERIC Educational Resources Information Center

Temple Univ., Philadelphia, PA. Div. of Vocational Education.

This module on medical terminology (prefixes) is one of 17 modules designed for individualized instruction in health occupations education programs at both the secondary and postsecondary levels. This module consists of an introduction to prefixes, a list of resources needed, and three learning experiences. Each learning experience contains an…
Measuring the development of a common scientific lexicon in nanotechnology

NASA Astrophysics Data System (ADS)

Arora, Sanjay K.; Youtie, Jan; Carley, Stephen; Porter, Alan L.; Shapira, Philip

2014-01-01

Over the last two decades, nanotechnology has not only grown considerably but also evolved in its use of scientific terminology. This paper examines the growth in nano-prefixed terms in a corpus of nanotechnology scholarly publications over a 21-year time period. The percentage of publications using a nano-prefixed term has increased from <10 % in the early 1990s to nearly 80 % by 2010. A co-word analysis of nano-prefixed terms indicates that the network of these terms has moved from being densely organized around a few common nano-prefixed terms such as "nanostructure" in 2000 to becoming less dense and more differentiated in using additional nano-prefixed terms while continuing to coalesce around the common nano-prefixed terms by 2010. We further observe that the share of nanotechnology papers oriented toward biomedical and clinical medicine applications has risen from just over 5 % to more than 11 %. While these results cannot fully distinguish between the use of nano-prefixed terms in response to broader policy or societal influences, they do suggest that there are intellectual and scientific underpinnings to the growth of a collectively shared vocabulary. We consider whether our findings signify the maturation of a scientific field and the extent to which this denotes the emergence of a shared scientific understanding regarding nanotechnology.

Untangling elevation-related differences in the hippocampus in food-caching mountain chickadees: the effect of a uniform captive environment.

PubMed

Freas, C A; Bingman, K; Ladage, L D; Pravosudov, V V

2013-01-01

Variation in environmental conditions associated with differential selection on spatial memory has been hypothesized to result in evolutionary changes in the morphology of the hippocampus, a brain region involved in spatial memory. At the same time, it is well known that the morphology of the hippocampus might also be directly affected by environmental conditions. Understanding the role of environment-based plasticity is therefore critical when investigating potential adaptive evolutionary changes in the hippocampus associated with environmental variation. We previously demonstrated large elevation-related variation in hippocampus morphology in mountain chickadees over an extremely small spatial scale. We hypothesized that this variation is related to differential selection pressures associated with differences in winter climate severity along an elevation gradient, which make different demands on spatial memory used for food cache retrieval. Here, we tested whether such variation is experience based, generated by potential differences in the environment, by comparing the hippocampus morphology of chickadees from different elevations maintained in a uniform captive environment in a laboratory with those sampled directly from the wild. In addition, we compared hippocampal neuron soma size in chickadees sampled directly from the wild with those maintained in laboratory conditions with restricted and unrestricted spatial memory use via manipulation of food-caching experiences to test whether memory use can affect neuron soma size. There were significant elevation-related differences in hippocampus volume and the total number of hippocampal neurons, but not in neuron soma size, in captive birds. Captive environmental conditions were associated with a large reduction in hippocampus volume and neuron soma size, but not in the total number of neurons or in neuron soma size in other telencephalic regions. Restriction of memory use while in laboratory conditions produced no significant effects on hippocampal neuron soma size. Overall our results showed that captivity has a strong effect on hippocampus volume, which could be due, at least partly, to a reduction in neuron soma size specifically in the hippocampus, but it did not override elevation-related differences in hippocampus volume or in the total number of hippocampal neurons. These data are consistent with the idea of the adaptive nature of the elevation-related differences associated with selection on spatial memory, while at the same time demonstrating additional environment-based plasticity in hippocampus volume, but not in neuron numbers. Our results, however, cannot rule out that the differences between elevations might still be driven by some developmental or early posthatching conditions/experiences. © 2013 S. Karger AG, Basel.
A Case for Tamper-Resistant and Tamper-Evident Computer Systems

DTIC Science & Technology

2007-02-01

such as Kerberos is hard to apply [2] B . Gassend, G. Sub, D. Clarke, M. Dijk, and S. Devadas . Caches and Hash Trees for Efficient Memory Integrity...the block’s data from DRAM. For authentication, Merkle [14] G. Suh, D. Clarke, B . Gassend, M. van Dijk, and S. Devadas . Efficient Memory Integrity...wwi4serverwatch.com/news/article.php/ tion where a data block is encrypted or decrypted through an XOR 1399451, 2000. [11] B . Rogers, Y. Solihin
Fault Tolerant VLSI Design Assessments for Advanced Avionics Department

DTIC Science & Technology

1982-02-06

negative sense. Another facet of the literature review is to acquaint the researchers with the immense literature base for electronic technology applicable ...Report: Semiconductor Memories are Tested Over Data-Storage Application ", Electronics, vol. 46, August 19. G. Luecke, J. P. Mlize and W. N. Carr...Semiconductor Memories, Desi-n and Application , New York, McGraw iLiii, 1973. 20. P, A. Lee, N. Ghani and K. Heron, "A Recovery Cache for the PDP-lI" Digest
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Decomposition of prefixed words in Russian.

PubMed

Kazanina, Nina

2011-11-01

I examined the nature of morphological decomposition in a series of masked-priming experiments with Russian prefixed nouns. In Experiments 1A and 1B, I tested 3 types of prime-target pairs in which the prime was a morphologically simple word, and a facilitation was found when the prime and the target were truly morphologically related (e.g., narost [outgrowth] and rost [growth] are morphologically related via the prefix na- [on]) or apparently morphologically related (e.g., priton [den] and ton [tone] seem to be morphologically related via the prefix pri- [toward], but this relation is false) but not when the relation was purely orthographic (e.g., kumir [idol] and mir [peace]; ku- is not an existing prefix of Russian). These results suggest that all orthographic forms that can be exhaustively parsed into a prefix and a stem are decomposed into (apparent) constituent morphemes during their retrieval from the lexicon. This early segmentation process is driven by morpho-orthographic but not by morphosemantic considerations and applies even for derived forms that are more frequent than their stem.
The impact of Moore's Law and loss of Dennard scaling: Are DSP SoCs an energy efficient alternative to x86 SoCs?

NASA Astrophysics Data System (ADS)

Johnsson, L.; Netzer, G.

2016-10-01

Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be “underutilized” or “dark” due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark)
Neuropsychological Performance in Advanced Age- Influences of Demographic Factors and Apolipoprotein E: Findings from the Cache County Memory Study

PubMed Central

Welsh-Bohmer, Katheen A.; Østbye, Truls; Sanders, Linda; Pieper, Carl F.; Hayden, Kathleen M.; Tschanz, JoAnn T.; Norton, Maria C.

2009-01-01

The Cache County Study of Memory in Aging (CCMS) is an epidemiological study of Alzheimer’s disease (AD), mild cognitive disorders, and aging in a population of exceptionally long-lived individuals (7th to 11th decade). Observation of population members without dementia provides an opportunity for establishing the range of normal neurocognitive performance in a representative sample of the very old. We examined neurocognitive performance of the normal participants undergoing full clinical evaluations (n=507) and we tested the potential modifying effects of APOE genotype, a known genetic risk factor for the later development of AD. The results indicate that advanced age and low education are related to lower test scores across nearly all of the neurocognitive measures. Gender and APOE ε4 both had negligible and inconsistent influences, affecting only isolated measures of memory and expressive speech (in case of gender). The gender and APOE effects disappeared once age and education were controlled. The study of this exceptionally long-lived population provides useful normative information regarding the broad range of “normal” cognition seen in advanced age. Among elderly without dementia or other cognitive impairment, APOE does not appear to exert any major effects on cognition once other demographic influences are controlled. PMID:18609337
Neuropsychological performance in advanced age: influences of demographic factors and Apolipoprotein E: findings from the Cache County Memory Study.

PubMed

Welsh-Bohmer, Katheen A; Ostbye, Truls; Sanders, Linda; Pieper, Carl F; Hayden, Kathleen M; Tschanz, JoAnn T; Norton, Maria C

2009-01-01

The Cache County Study of Memory in Aging (CCMS) is an epidemiological study of Alzheimer's disease (AD), mild cognitive disorders, and aging in a population of exceptionally long-lived individuals (7th to 11th decade). Observation of population members without dementia provides an opportunity for establishing the range of normal neurocognitive performance in a representative sample of the very old. We examined neurocognitive performance of the normal participants undergoing full clinical evaluations (n = 507) and we tested the potential modifying effects of apolipoprotein E (APOE) genotype, a known genetic risk factor for the later development of AD. The results indicate that advanced age and low education are related to lower test scores across nearly all of the neurocognitive measures. Gender and APOE epsilon4 both had negligible and inconsistent influences, affecting only isolated measures of memory and expressive speech (in case of gender). The gender and APOE effects disappeared once age and education were controlled. The study of this exceptionally long-lived population provides useful normative information regarding the broad range of "normal" cognition seen in advanced age. Among elderly without dementia or other cognitive impairment, APOE does not appear to exert any major effects on cognition once other demographic influences are controlled.
Differences in the Processing of Prefixes and Suffixes Revealed by a Letter-Search Task

ERIC Educational Resources Information Center

Beyersmann, Elisabeth; Ziegler, Johannes C.; Grainger, Jonathan

2015-01-01

A letter-search task was used to test the hypothesis that affixes are chunked during morphological processing and that such chunking might operate differently for prefixes and suffixes. Participants had to detect a letter target that was embedded either in a prefix or suffix (e.g., "R" in "propoint" or "filmure") or…
Advanced texture filtering: a versatile framework for reconstructing multi-dimensional image data on heterogeneous architectures

NASA Astrophysics Data System (ADS)

Zellmann, Stefan; Percan, Yvonne; Lang, Ulrich

2015-01-01

Reconstruction of 2-d image primitives or of 3-d volumetric primitives is one of the most common operations performed by the rendering components of modern visualization systems. Because this operation is often aided by GPUs, reconstruction is typically restricted to first-order interpolation. With the advent of in situ visualization, the assumption that rendering algorithms are in general executed on GPUs is however no longer adequate. We thus propose a framework that provides versatile texture filtering capabilities: up to third-order reconstruction using various types of cubic filtering and interpolation primitives; cache-optimized algorithms that integrate seamlessly with GPGPU rendering or with software rendering that was optimized for cache-friendly "Structure of Array" (SoA) access patterns; a memory management layer (MML) that gracefully hides the complexities of extra data copies necessary for memory access optimizations such as swizzling, for rendering on GPGPUs, or for reconstruction schemes that rely on pre-filtered data arrays. We prove the effectiveness of our software architecture by integrating it into and validating it using the open source direct volume rendering (DVR) software DeskVOX.
Epidemiology of cognitive aging and Alzheimer's disease: contributions of the cache county utah study of memory, health and aging.

PubMed

Hayden, Kathleen M; Welsh-Bohmer, Kathleen A

2012-01-01

Epidemiological studies of Alzheimer's disease (AD) provide insights into changing public health trends and their contribution to disease incidence. The current chapter considers how the population-based approach has contributed to our understanding of lifetime exposures that contribute to later disease risk and may act to modify onset of symptoms. We focus on the findings from a recent survey of an exceptionally long-lived population, the Cache County Utah Study of Memory, Health, and Aging. This study is confined to a single geographic population has allowed estimation of the genetic and environmental influences on AD expression across the expected human lifespan of 95+ years. Given the emphasis of this text on the behavioral neurosciences of aging, we highlight within the current chapter the particular contributions of this population-based study to the neuropsychology of aging and AD. We also discuss hypotheses generated from this survey with respect to factors that may either accelerate or delay symptom onset in AD and the conditions that appear to be associated with successful cognitive aging.
Calculating Reuse Distance from Source Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Narayanan, Sri Hari Krishna; Hovland, Paul

The efficient use of a system is of paramount importance in high-performance computing. Applications need to be engineered for future systems even before the architecture of such a system is clearly known. Static performance analysis that generates performance bounds is one way to approach the task of understanding application behavior. Performance bounds provide an upper limit on the performance of an application on a given architecture. Predicting cache hierarchy behavior and accesses to main memory is a requirement for accurate performance bounds. This work presents our static reuse distance algorithm to generate reuse distance histograms. We then use these histogramsmore » to predict cache miss rates. Experimental results for kernels studied show that the approach is accurate.« less
Locality Aware Concurrent Start for Stencil Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Quantifying animal movement for caching foragers: the path identification index (PII) and cougars, Puma concolor

USGS Publications Warehouse

Ironside, Kirsten E.; Mattson, David J.; Theimer, Tad; Jansen, Brian; Holton, Brandon; Arundel, Terry; Peters, Michael; Sexton, Joseph O.; Edwards, Thomas C.

2017-01-01

Relocation studies of animal movement have focused on directed versus area restricted movement, which rely on correlations between step-length and turn angles, along with a degree of stationarity through time to define behavioral states. Although these approaches may work well for grazing foraging strategies in a patchy landscape, species that do not spend a significant amount of time searching out and gathering small dispersed food items, but instead feed for short periods on large, concentrated sources or cache food result in movements that maybe difficult to analyze using turning and velocity alone. We use GPS telemetry collected from a prey-caching predator, the cougar (Puma concolor), to test whether adding additional movement metrics capturing site recursion, to the more traditional velocity and turning, improve the ability to identify behaviors. We evaluated our movement index’s ability to identify behaviors using field investigations. We further tested for statistical stationarity across behaviors for use of topographic view-sheds. We found little correlation between turn angle, velocity, tortuosity, and site fidelity and combined them into a movement index used to identify movement paths (temporally autocorrelated movements) related to fast directed movements (taxis), area restricted movements (search), and prey caching (foraging). Changes in the frequency and duration of these movements were helpful for identifying seasonal activities such as migration and denning in females. Comparison of field investigations of cougar activities to behavioral classes defined using the movement index and found an overall classification accuracy of 81%. Changes in behaviors resulted in changes in how cougars used topographic view-sheds, showing statistical non-stationarity over time. The movement index shows promise for identifying behaviors in species that frequently return to specific locations such as food caches, watering holes, or dens, and highlights the role memory and cognitive abilities may play in determining animal movements. With the addition of measures capturing site recursion the temporal structure in movements of a caching forager was revealed.
BIRDS AS A MODEL TO STUDY ADULT NEUROGENESIS: BRIDGING EVOLUTIONARY, COMPARATIVE AND NEUROETHOLOGICAL APPROCHES

PubMed Central

BARNEA, ANAT; PRAVOSUDOV, VLADIMIR

2011-01-01

During the last few decades evidence has demonstrated that adult neurogenesis is a well-preserved feature throughout the animal kingdom. In birds, ongoing neuronal addition occurs rather broadly, to a number of brain regions. This review describes adult avian neurogenesis and neuronal recruitment, discusses factors that regulate these processes, and touches upon the question of their genetic control. Several attributes make birds an extremely advantageous model to study neurogenesis. First, song learning exhibits seasonal variation that is associated with seasonal variation in neuronal turnover in some song control brain nuclei, which seems to be regulated via adult neurogenesis. Second, food-caching birds naturally use memory-dependent behavior in learning locations of thousands of food caches scattered over their home ranges. In comparison with other birds, food-caching species have relatively enlarged hippocampi with more neurons and intense neurogenesis, which appears to be related to spatial learning. Finally, migratory behavior and naturally occurring social systems in birds also provide opportunities to investigate neurogenesis. Such diversity of naturally-occurring memory-based behaviors, combined with the fact that birds can be studied both in the wild and in the laboratory, make them ideal for investigation of neural processes underlying learning. This can be done by using various approaches, from evolutionary and comparative to neuroethological and molecular. Finally, we connect the avian arena to a broader view by providing a brief comparative and evolutionary overview of adult neurogenesis and by discussing the possible functional role of the new neurons. We conclude by indicating future directions and possible medical applications. PMID:21929623
Integrating ecology, psychology and neurobiology within a food-hoarding paradigm

PubMed Central

Pravosudov, Vladimir V.; Smulders, Tom V.

2010-01-01

Many animals regularly hoard food for future use, which appears to be an important adaptation to a seasonally and/or unpredictably changing environment. This food-hoarding paradigm is an excellent example of a natural system that has broadly influenced both theoretical and empirical work in the field of biology. The food-hoarding paradigm has played a major role in the conceptual framework of numerous fields from ecology (e.g. plant–animal interactions) and evolution (e.g. the coevolution of caching, spatial memory and the hippocampus) to psychology (e.g. memory and cognition) and neurobiology (e.g. neurogenesis and the neurobiology of learning and memory). Many food-hoarding animals retrieve caches by using spatial memory. This memory-based behavioural system has the inherent advantage of being tractable for study in both the field and laboratory and has been shaped by natural selection, which produces variation with strong fitness consequences in a variety of taxa. Thus, food hoarding is an excellent model for a highly integrative approach to understanding numerous questions across a variety of disciplines. Recently, there has been a surge of interest in the complexity of animal cognition such as future planning and episodic-like-memory as well as in the relationship between memory, the environment and the brain. In addition, new breakthroughs in neurobiology have enhanced our ability to address the mechanisms underlying these behaviours. Consequently, the field is necessarily becoming more integrative by assessing behavioural questions in the context of natural ecological systems and by addressing mechanisms through neurobiology and psychology, but, importantly, within an evolutionary and ecological framework. In this issue, we aim to bring together a series of papers providing a modern synthesis of ecology, psychology, physiology and neurobiology and identifying new directions and developments in the use of food-hoarding animals as a model system. PMID:20156812
Reducing Router Forwarding Table Size Using Aggregation and Caching

ERIC Educational Resources Information Center

Liu, Yaoqing

2013-01-01

The fast growth of global routing table size has been causing concerns that the Forwarding Information Base (FIB) will not be able to fit in existing routers' expensive line-card memory, and upgrades will lead to a higher cost for network operators and customers. FIB Aggregation, a technique that merges multiple FIB entries into one, is probably…
Memory-efficient table look-up optimized algorithm for context-based adaptive variable length decoding in H.264/advanced video coding

NASA Astrophysics Data System (ADS)

Wang, Jianhua; Cheng, Lianglun; Wang, Tao; Peng, Xiaodong

2016-03-01

Table look-up operation plays a very important role during the decoding processing of context-based adaptive variable length decoding (CAVLD) in H.264/advanced video coding (AVC). However, frequent table look-up operation can result in big table memory access, and then lead to high table power consumption. Aiming to solve the problem of big table memory access of current methods, and then reduce high power consumption, a memory-efficient table look-up optimized algorithm is presented for CAVLD. The contribution of this paper lies that index search technology is introduced to reduce big memory access for table look-up, and then reduce high table power consumption. Specifically, in our schemes, we use index search technology to reduce memory access by reducing the searching and matching operations for code_word on the basis of taking advantage of the internal relationship among length of zero in code_prefix, value of code_suffix and code_lengh, thus saving the power consumption of table look-up. The experimental results show that our proposed table look-up algorithm based on index search can lower about 60% memory access consumption compared with table look-up by sequential search scheme, and then save much power consumption for CAVLD in H.264/AVC.
Past participle formation in specific language impairment.

PubMed

Kauschke, Christina; Renner, Lena F; Domahs, Ulrike

2017-03-01

German participles are formed by a co-occurrence of prefixation and suffixation. While the acquisition of regular and irregular suffixation has been investigated exhaustively, it is still unclear how German children master the prosodically determined prefixation rule (prefix ge-). Findings reported in the literature are inconsistent on this point. In particular, it is unclear whether participle formation is vulnerable in German children with specific language impairment (SLI). To compare children with and without SLI in their abilities to form German participles correctly, and to determine their relative sensitivities to the morphophonological regularities of prefixation. The performance of 14 German-speaking children with SLI (mean age = 7;5) in a participle formation task was compared with that of age-matched and younger typically developing controls. The materials included 60 regular verbs and 20 pseudo-verbs, half of them requiring the prefix ge-. Overall, children with SLI performed poorly compared with both groups of typically developing children. Children with SLI tended either to avoid participle markings or choose inappropriate affixes. However, while such children showed marked impairment at the morphological level, they were generally successful in applying the morphoprosodic rules governing prefixation. In contrast to earlier findings, the present results demonstrate that regular participle formation is problematic for German children with SLI. © 2016 Royal College of Speech and Language Therapists.

Efficiently Serving HDF5 Products via OPeNDAP

NASA Technical Reports Server (NTRS)

Yang, Kent

2017-01-01

Hyrax OPeNDAP services are widely used by the Earth Science data centers in NASA, NOAA and other organizations to serve end users. In this talk, we will present some key features added in the HDF5 Hyrax OPeNDAP handler that can help data centers to better serve the HDF5netCDF-4 data products. Among these new features, we will focus on the following:1.The DAP4 support 2.The memory cache and the disk cache support that can reduce the service access time 3.The enhancement that makes the swath-like HDF5 products visualized by CF-client tools. We will also discuss the role of the HDF5 handler in-depth in the recent study of the Hyrax service in the cloud environment.
Norms for CERAD Constructional Praxis Recall

PubMed Central

Fillenbaum, Gerda G.; Burchett, Bruce M.; Unverzagt, Frederick W.; Rexroth, Daniel F.; Welsh-Bohmer, Kathleen

2012-01-01

Recall of the 4-item constructional praxis measure was a later addition to the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) neuropsychological battery. Norms for this measure, based on cognitively intact African Americans age ≥70 (Indianapolis-Ibadan Dementia Project, N=372), European American participants age ≥66 (Cache County Study of Memory, Health and Aging, N=507), and European American CERAD clinic controls age ≥50 (N=182), are presented here. Performance varied by site; by sex, education and age (African Americans in Indianapolis); education and age (Cache County European Americans; and only age (CERAD European American controls). Performance declined with increased age, within age with less education, and was poorer for women. Means, standard deviations, and percentiles are presented separately for each sample. PMID:21992077
Hippocampal-prefrontal input supports spatial encoding in working memory.

PubMed

Spellman, Timothy; Rigotti, Mattia; Ahmari, Susanne E; Fusi, Stefano; Gogos, Joseph A; Gordon, Joshua A

2015-06-18

Spatial working memory, the caching of behaviourally relevant spatial cues on a timescale of seconds, is a fundamental constituent of cognition. Although the prefrontal cortex and hippocampus are known to contribute jointly to successful spatial working memory, the anatomical pathway and temporal window for the interaction of these structures critical to spatial working memory has not yet been established. Here we find that direct hippocampal-prefrontal afferents are critical for encoding, but not for maintenance or retrieval, of spatial cues in mice. These cues are represented by the activity of individual prefrontal units in a manner that is dependent on hippocampal input only during the cue-encoding phase of a spatial working memory task. Successful encoding of these cues appears to be mediated by gamma-frequency synchrony between the two structures. These findings indicate a critical role for the direct hippocampal-prefrontal afferent pathway in the continuous updating of task-related spatial information during spatial working memory.
In-memory interconnect protocol configuration registers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cheng, Kevin Y.; Roberts, David A.

Systems, apparatuses, and methods for moving the interconnect protocol configuration registers into the main memory space of a node. The region of memory used for storing the interconnect protocol configuration registers may also be made cacheable to reduce the latency of accesses to the interconnect protocol configuration registers. Interconnect protocol configuration registers which are used during a startup routine may be prefetched into the host's cache to make the startup routine more efficient. The interconnect protocol configuration registers for various interconnect protocols may include one or more of device capability tables, memory-side statistics (e.g., to support two-level memory data mappingmore » decisions), advanced memory and interconnect features such as repair resources and routing tables, prefetching hints, error correcting code (ECC) bits, lists of device capabilities, set and store base address, capability, device ID, status, configuration, capabilities, and other settings.« less
Memory development in the second year: for events or locations?

PubMed

Russell, James; Thompson, Doreen

2003-04-01

We employed an object-placement/object-removal design, inspired by recent work on 'episodic-like' memory in scrub jays (Clayton, N. S., & Dickinson, A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395, 272-274), to examine the possibility that children in the second year of life have event-based memories. In one task, a successful search could have been due to the recall of an object-removal event. In the second task, a successful search could only have been caused by recall of where objects were located. Success was general in the oldest group of children (21-25 months), while performance was broadly similar on the two tasks. The parsimonious interpretation of this outcome is that the first task was performed by location memory, not by event memory. We place these data in the context of object permanence development.
The BlueGene/L supercomputer

NASA Astrophysics Data System (ADS)

Bhanota, Gyan; Chen, Dong; Gara, Alan; Vranas, Pavlos

2003-05-01

The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32×32×64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.
Parallel 3D-TLM algorithm for simulation of the Earth-ionosphere cavity

NASA Astrophysics Data System (ADS)

Toledo-Redondo, Sergio; Salinas, Alfonso; Morente-Molinera, Juan Antonio; Méndez, Antonio; Fornieles, Jesús; Portí, Jorge; Morente, Juan Antonio

2013-03-01

A parallel 3D algorithm for solving time-domain electromagnetic problems with arbitrary geometries is presented. The technique employed is the Transmission Line Modeling (TLM) method implemented in Shared Memory (SM) environments. The benchmarking performed reveals that the maximum speedup depends on the memory size of the problem as well as multiple hardware factors, like the disposition of CPUs, cache, or memory. A maximum speedup of 15 has been measured for the largest problem. In certain circumstances of low memory requirements, superlinear speedup is achieved using our algorithm. The model is employed to model the Earth-ionosphere cavity, thus enabling a study of the natural electromagnetic phenomena that occur in it. The algorithm allows complete 3D simulations of the cavity with a resolution of 10 km, within a reasonable timescale.
A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics

NASA Astrophysics Data System (ADS)

Bard, Christopher M.; Dorelli, John C.

2014-02-01

We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of ≈126 for a 10242 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
High Performance Computing Multicast

DTIC Science & Technology

2012-02-01

responsiveness, first-tier applications often implement replicated in- memory key-value stores , using them to store state or to cache data from services...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response. A...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
An ultra-compact processor module based on the R3000

NASA Astrophysics Data System (ADS)

Mullenhoff, D. J.; Kaschmitter, J. L.; Lyke, J. C.; Forman, G. A.

1992-08-01

Viable high density packaging is of critical importance for future military systems, particularly space borne systems which require minimum weight and size and high mechanical integrity. A leading, emerging technology for high density packaging is multi-chip modules (MCM). During the 1980's, a number of different MCM technologies have emerged. In support of Strategic Defense Initiative Organization (SDIO) programs, Lawrence Livermore National Laboratory (LLNL) has developed, utilized, and evaluated several different MCM technologies. Prior LLNL efforts include modules developed in 1986, using hybrid wafer scale packaging, which are still operational in an Air Force satellite mission. More recent efforts have included very high density cache memory modules, developed using laser pantography. As part of the demonstration effort, LLNL and Phillips Laboratory began collaborating in 1990 in the Phase 3 Multi-Chip Module (MCM) technology demonstration project. The goal of this program was to demonstrate the feasibility of General Electric's (GE) High Density Interconnect (HDI) MCM technology. The design chosen for this demonstration was the processor core for a MIPS R3000 based reduced instruction set computer (RISC), which has been described previously. It consists of the R3000 microprocessor, R3010 floating point coprocessor and 128 Kbytes of cache memory.
Windowed multipole for cross section Doppler broadening

NASA Astrophysics Data System (ADS)

Josey, C.; Ducru, P.; Forget, B.; Smith, K.

2016-02-01

This paper presents an in-depth analysis on the accuracy and performance of the windowed multipole Doppler broadening method. The basic theory behind cross section data is described, along with the basic multipole formalism followed by the approximations leading to windowed multipole method and the algorithm used to efficiently evaluate Doppler broadened cross sections. The method is tested by simulating the BEAVRS benchmark with a windowed multipole library composed of 70 nuclides. Accuracy of the method is demonstrated on a single assembly case where total neutron production rates and 238U capture rates compare within 0.1% to ACE format files at the same temperature. With regards to performance, clock cycle counts and cache misses were measured for single temperature ACE table lookup and for windowed multipole. The windowed multipole method was found to require 39.6% more clock cycles to evaluate, translating to a 7.9% performance loss overall. However, the algorithm has significantly better last-level cache performance, with 3 fewer misses per evaluation, or a 65% reduction in last-level misses. This is due to the small memory footprint of the windowed multipole method and better memory access pattern of the algorithm.
Use of diuretics is associated with reduced risk of Alzheimer's disease: the Cache County Study.

PubMed

Chuang, Yi-Fang; Breitner, John C S; Chiu, Yen-Ling; Khachaturian, Ara; Hayden, Kathleen; Corcoran, Chris; Tschanz, JoAnn; Norton, Maria; Munger, Ron; Welsh-Bohmer, Kathleen; Zandi, Peter P

2014-11-01

Although the use of antihypertensive medications has been associated with reduced risk of Alzheimer's disease (AD), it remains unclear which class provides the most benefit. The Cache County Study of Memory Health and Aging is a prospective longitudinal cohort study of dementing illnesses among the elderly population of Cache County, Utah. Using waves I to IV data of the Cache County Study, 3417 participants had a mean of 7.1 years of follow-up. Time-varying use of antihypertensive medications including different class of diuretics, angiotensin converting enzyme inhibitors, β-blockers, and calcium channel blockers was used to predict the incidence of AD using Cox proportional hazards analyses. During follow-up, 325 AD cases were ascertained with a total of 23,590 person-years. Use of any antihypertensive medication was associated with lower incidence of AD (adjusted hazard ratio [aHR], 0.77; 95% confidence interval [CI], 0.61-0.97). Among different classes of antihypertensive medications, thiazide (aHR, 0.7; 95% CI, 0.53-0.93), and potassium-sparing diuretics (aHR, 0.69; 95% CI, 0.48-0.99) were associated with the greatest reduction of AD risk. Thiazide and potassium-sparing diuretics were associated with decreased risk of AD. The inverse association of potassium-sparing diuretics confirms an earlier finding in this cohort, now with longer follow-up, and merits further investigation. Copyright © 2014 Elsevier Inc. All rights reserved.
Use of diuretics is associated with reduced risk of Alzheimer’s disease: the Cache County Study

PubMed Central

Chuang, Yi-Fang; Breitner, John C.S.; Chiu, Yen-Ling; Khachaturian, Ara; Hayden, Kathleen; Corcoran, Chris; Tschanz, JoAnn; Norton, Maria; Munger, Ron; Welsh-Bohmer, Kathleen; Zandi, Peter P.

2015-01-01

Although the use of antihypertensive medications has been associated with reduced risk of Alzheimer’s disease (AD), it remains unclear which class provides the most benefit. The Cache County Study of Memory Health and Aging is a prospective longitudinal cohort study of dementing illnesses among the elderly population of Cache County, Utah. Using waves I to IV data of the Cache County Study, 3417 participants had a mean of 7.1 years of follow-up. Time-varying use of antihypertensive medications including different class of diuretics, angiotensin converting enzyme inhibitors, β-blockers, and calcium channel blockers was used to predict the incidence of AD using Cox proportional hazards analyses. During follow-up, 325 AD cases were ascertained with a total of 23,590 person-years. Use of any anti-hypertensive medication was associated with lower incidence of AD (adjusted hazard ratio [aHR], 0.77; 95% confidence interval [CI], 0.61–0.97). Among different classes of antihypertensive medications, thiazide (aHR, 0.7; 95% CI, 0.53–0.93), and potassium-sparing diuretics (aHR, 0.69; 95% CI, 0.48–0.99) were associated with the greatest reduction of AD risk. Thiazide and potassium-sparing diuretics were associated with decreased risk of AD. The inverse association of potassium-sparing diuretics confirms an earlier finding in this cohort, now with longer follow-up, and merits further investigation. PMID:24910391
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Toward Large-Graph Comparison Measures to Understand Internet Topology Dynamics

DTIC Science & Technology

2013-09-01

continuously from randomly selected vantage points in these monitors to destination IP addresses . From each IPv4 /24 prefix on the Internet, a destination is...expected to be more similar. This was verified when the esd and vsd measures applied to this dataset gave a low reading 5 An IPv4 address is a 32-bit...integer value. /24 is the prefix of the IPv4 network starting at a given address , having 24 bits allocated for the network prefix. 6 This utility
The comparative metabolism of 2,6-dichlorothiobenzamide (prefix) and 2,6-dichlorobenzonitrile in the dog and rat

PubMed Central

Griffiths, M. H.; Moss, J. A.; Rose, J. A.; Hathway, D. E.

1966-01-01

1. A single oral dose of either [14C]Prefix or 2,6-dichlorobenzo[14C]nitrile to rats is almost entirely eliminated in 4 days: 84·8–100·5% of 14C from [14C]Prefix is excreted, 67·3–79·7% in the urine, and 85·8–97·2% of 14C from 2,6-dichlorobenzo-[14C]nitrile is excreted, 72·3–80·7% in the urine. Only 0·37±0·03% of the dose of [14C]Prefix and 0·25±0·03% of the dose of 2,6-dichlorobenzo[14C]nitrile are present in the carcass plus viscera after removal of the gut. Rats do not show sex differences in the pattern of elimination of the respective metabolites of the two herbicides. The rates of elimination of 14C from the two compounds in the 24hr. and 48hr. urines are not significantly different (P >0·05) from one another. 2. After oral administration to dogs, 85·9–106·1% of 14C from [14C]Prefix is excreted, 66·6–80·9% in the urine, and 86·8–92·5% of 14C from 2,6-dichlorobenzo[14C]nitrile is excreted, 60·0–70·1% in the urine. Dogs do not show sex differences in the pattern of eliminating the metabolites of either Prefix or 2,6-dichlorobenzonitrile. 3. Dogs and rats do not show species differences in the patterns of elimination of the two herbicides. 4. Prefix and 2,6-dichlorobenzonitrile are completely metabolized; unchanged Prefix and 2,6-dichlorobenzonitrile are absent from the urine and faeces, and from the carcasses when elimination is complete. In the hydrolysed urine of rats dosed with either [14C]Prefix or 2,6-dichlorobenzo[14C]nitrile, 2,6-dichloro-3-hydroxybenzonitrile accounts for approx. 42% of the 14C, a further 10–11% is accounted for by 2,6-dichlorobenzamide, 2,6-dichlorobenzoic acid, 2,6-dichloro-3- and -4-hydroxybenzoic acid and 2,6-dichloro-4-hydroxybenzonitrile collectively, and 25–30% by six polar constituents, of which two are sulphur-containing amino acids. 5. In the unhydrolysed urines of rats dosed with either [14C]Prefix or 2,6-dichlorobenzo[14C]nitrile, there are present free 2,6-dichloro-3- and -4-hydroxybenzonitrile, their glucuronide conjugates, ester glucuronides of the principal aromatic acids that are present in the hydrolysed urines, and two sulphur-containing metabolites analogous to mercapturic acids or premercapturic acids. 6. Prefix is thus extensively transformed into 2,6-dichlorobenzonitrile: R·CS·NH2→R·CN+H2S, where R=C6H3Cl2. However, the competitive reaction: R·CS·NH2+H2O→R·CO·NH2+H2S takes place to a very limited extent. PMID:5911525
Multiprocessor architectural study

NASA Technical Reports Server (NTRS)

Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.

1972-01-01

An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.
A Simple GPU-Accelerated Two-Dimensional MUSCL-Hancock Solver for Ideal Magnetohydrodynamics

NASA Technical Reports Server (NTRS)

Bard, Christopher; Dorelli, John C.

2013-01-01

We describe our experience using NVIDIA's CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU's faster shared memory), we achieved a maximum speed-up of approx. = 126 for a sq 1024 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.
Multithreaded implicitly dealiased convolutions

NASA Astrophysics Data System (ADS)

Roberts, Malcolm; Bowman, John C.

2018-03-01

Implicit dealiasing is a method for computing in-place linear convolutions via fast Fourier transforms that decouples work memory from input data. It offers easier memory management and, for long one-dimensional input sequences, greater efficiency than conventional zero-padding. Furthermore, for convolutions of multidimensional data, the segregation of data and work buffers can be exploited to reduce memory usage and execution time significantly. This is accomplished by processing and discarding data as it is generated, allowing work memory to be reused, for greater data locality and performance. A multithreaded implementation of implicit dealiasing that accepts an arbitrary number of input and output vectors and a general multiplication operator is presented, along with an improved one-dimensional Hermitian convolution that avoids the loop dependency inherent in previous work. An alternate data format that can accommodate a Nyquist mode and enhance cache efficiency is also proposed.

Runtime support for parallelizing data mining algorithms

NASA Astrophysics Data System (ADS)

Jin, Ruoming; Agrawal, Gagan

2002-03-01

With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Ultra-fast three terminal perpendicular spin-orbit torque MRAM (Presentation Recording)

NASA Astrophysics Data System (ADS)

Boulle, Olivier; Cubukcu, Murat; Hamelin, Claire; Lamard, Nathalie; Buda-Prejbeanu, Liliana; Mikuszeit, Nikolai; Garello, Kevin; Gambardella, Pietro; Langer, Juergen; Ocker, Berthold; Miron, Mihai; Gaudin, Gilles

2015-09-01

The discovery that a current flowing in a heavy metal can exert a torque on a neighboring ferromagnet has opened a new way to manipulate the magnetization at the nanoscale. This "spin orbit torque" (SOT) has been demonstrated in ultrathin magnetic multilayers with structural inversion asymmetry (SIA) and high spin orbit coupling, such as Pt/Co/AlOx multilayers. We have shown that this torque can lead to the magnetization switching of a perpendicularly magnetized nanomagnet by an in-plane current injection. The manipulation of magnetization by SOT has led to a novel concept of magnetic RAM memory, the SOT-MRAM, which combines non volatility, high speed, reliability and large endurance. These features make the SOT-MRAM a good candidate to replace SRAM for non-volatile cache memory application. We will present the proof of concept of a perpendicular SOT-MRAM cell composed of a Ta/FeCoB/MgO/FeCoB magnetic tunnel junction and demonstrate ultra-fast (down to 300 ps) deterministic bipolar magnetization switching. Macrospin and micromagnetic simulations including SOT cannot reproduce the experimental results, which suggests that additional physical mechanisms are at stacks. Our results show that SOT-MRAM is fast, reliable and low power, which is promising for non-volatile cache memory application. We will also discuss recent experiments of magnetization reversal in ultrathin multilayers Pt/Co/AlOx by very short (<200 ps) current pulses. We will show that in this material, the Dzyaloshinskii-Moryia interaction plays a key role in the reversal process.
Adaptive Backoff Synchronization Techniques

DTIC Science & Technology

1989-07-01

The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6] F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation

DOE PAGES

Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.; ...

2017-01-03

A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
A performance model for GPUs with caches

DOE PAGES

Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...

2014-06-24

To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
Interference Lattice-based Loop Nest Tilings for Stencil Computations

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Frumkin, Michael

2000-01-01

A common method for improving performance of stencil operations on structured multi-dimensional discretization grids is loop tiling. Tile shapes and sizes are usually determined heuristically, based on the size of the primary data cache. We provide a lower bound on the numbers of cache misses that must be incurred by any tiling, and a close achievable bound using a particular tiling based on the grid interference lattice. The latter tiling is used to derive highly efficient loop orderings. The total number of cache misses of a code is the sum of (necessary) cold misses and misses caused by elements being dropped from the cache between successive loads (replacement misses). Maximizing temporal locality is equivalent to minimizing replacement misses. Temporal locality of loop nests implementing stencil operations is optimized by tilings that avoid data conflicts. We divide the loop nest iteration space into conflict-free tiles, derived from the cache miss equation. The tiling involves the definition of the grid interference lattice an equivalence class of grid points whose images in main memory map to the same location in the cache-and the construction of a special basis for the lattice. Conflicts only occur on the boundaries of the tiles, unless the tiles are too thin. We show that the surface area of the tiles is bounded for grids of any dimensionality, and for caches of any associativity, provided the eccentricity of the fundamental parallelepiped (the tile spanned by the basis) of the lattice is bounded. Eccentricity is determined by two factors, aspect ratio and skewness. The aspect ratio of the parallelepiped can be bounded by appropriate array padding. The skewness can be bounded by the choice of a proper basis. Combining these two strategies ensures that pathologically thin tiles are avoided. They do not, however, minimize replacement misses per se. The reason is that tile visitation order influences the number of data conflicts on the tile boundaries. If two adjacent tiles are visited successively, there will be no replacement misses on the shared boundary. The iteration space may be covered with pencils larger than the size of the cache while avoiding data conflicts if the pencils are traversed by a scanning-face method. Replacement misses are incurred only on the boundaries of the pencils, and the number of misses is minimized by maximizing the volume of the scanning face, not the volume of the tile. We present an algorithm for constructing the most efficient scanning face for a given grid and stencil operator. In two dimensions it is based on a continued fraction algorithm. In three dimensions it follows Voronoi's successive minima algorithm. We show experimental results of using the scanning face, and compare with canonical loop orderings.
Dynamic storage in resource-scarce browsing multimedia applications

NASA Astrophysics Data System (ADS)

Elenbaas, Herman; Dimitrova, Nevenka

1998-10-01

In the convergence of information and entertainment there is a conflict between the consumer's expectation of fast access to high quality multimedia content through narrow bandwidth channels versus the size of this content. During the retrieval and information presentation of a multimedia application there are two problems that have to be solved: the limited bandwidth during transmission of the retrieved multimedia content and the limited memory for temporary caching. In this paper we propose an approach for latency optimization in information browsing applications. We proposed a method for flattening hierarchically linked documents in a manner convenient for network transport over slow channels to minimize browsing latency. Flattening of the hierarchy involves linearization, compression and bundling of the document nodes. After the transfer, the compressed hierarchy is stored on a local device where it can be partly unbundled to fit the caching limits at the local site while giving the user availability to the content.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Janjusic, Tommy; Kartsaklis, Christos

Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understand- ing, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage.more » Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec bench- marks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.« less
Dementia diagnoses from clinical and neuropsychological data compared: the Cache County study.

PubMed

Tschanz, J T; Welsh-Bohmer, K A; Skoog, I; West, N; Norton, M C; Wyse, B W; Nickles, R; Breitner, J C

2000-03-28

To validate a neuropsychological algorithm for dementia diagnosis. We developed a neuropsychological algorithm in a sample of 1,023 elderly residents of Cache County, UT. We compared algorithmic and clinical dementia diagnoses both based on DSM-III-R criteria. The algorithm diagnosed dementia when there was impairment in memory and at least one other cognitive domain. We also tested a variant of the algorithm that incorporated functional measures that were based on structured informant reports. Of 1,023 participants, 87% could be classified by the basic algorithm, 94% when functional measures were considered. There was good concordance between basic psychometric and clinical diagnoses (79% agreement, kappa = 0.57). This improved after incorporating functional measures (90% agreement, kappa = 0.76). Neuropsychological algorithms may reasonably classify individuals on dementia status across a range of severity levels and ages and may provide a useful adjunct to clinical diagnoses in population studies.
49 CFR 821.3 - Description of docket numbering system.

Code of Federal Regulations, 2010 CFR

2010-10-01

... handled by the Board will receive a letter prefix. These letter prefixes reflect the case type: “SE” for... of civil penalties; “NA” for cases in which a petition for review or appeal is not accepted because...
14 CFR 1260.15 - Format and numbering.

Code of Federal Regulations, 2013 CFR

2013-01-01

... Fiscal Year 2004 would be NNC04AA01H and NNC04AA02H. (7) The Catalog of Federal Domestic Assistance (CFDA... will be applied as follows: (1) Agency prefix. NASA's agency prefix shall be represented by the...
Systems and methods for rapid processing and storage of data

DOEpatents

Stalzer, Mark A.

2017-01-24

Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
Adaptive Backoff Synchronization Techniques

DTIC Science & Technology

1989-06-01

The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6J F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International Conference...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
What-where-when memory in magpies (Pica pica).

PubMed

Zinkivskay, Ann; Nazir, Farrah; Smulders, Tom V

2009-01-01

Some animals have been shown to be able to remember which type of food they hoarded or encountered in which location and how long ago (what-where-when memory). In this study, we test whether magpies (Pica pica) also show evidence of remembering these different aspects of a past episode. Magpies hid red- and blue-dyed pellets of scrambled eggs in a large tray containing wood shavings. They were allowed to make as many caches as they wanted. The birds were then returned either the same day or the next day to retrieve the pellets. If they returned the same day, one colour of pellets was replaced with wooden beads of similar size and colour, while if they returned the next day this would happen to the other colour. Over just a few trials, the birds learned to only search for the food pellets, and ignore the beads, of the appropriate colour for the given retention interval. A probe trial in which all items were removed showed that the birds persisted in searching for the pellets and not the beads. This shows that magpies can remember which food item they hoarded where, and when, even if the food items only differ from each other in their colour and are dispersed throughout a continuous caching substrate.
Virtual data

NASA Astrophysics Data System (ADS)

Bjorklund, E.

1994-12-01

In the 1970s, when computers were memory limited, operating system designers created the concept of "virtual memory", which gave users the ability to address more memory than physically existed. In the 1990s, many large control systems have the potential of becoming data limited. We propose that many of the principles behind virtual memory systems (working sets, locality, caching and clustering) can also be applied to data-limited systems, creating, in effect, "virtual data systems". At the Los Alamos National Laboratory's Clinton P. Anderson Meson Physics Facility (LAMPF), we have applied these principles to a moderately sized (10 000 data points) data acquisition and control system. To test the principles, we measured the system's performance during tune-up, production, and maintenance periods. In this paper, we present a general discussion of the principles of a virtual data system along with some discussion of our own implementation and the results of our performance measurements.
Radiation-Hardened Solid-State Drive

NASA Technical Reports Server (NTRS)

Sheldon, Douglas J.

2010-01-01

A method is provided for a radiationhardened (rad-hard) solid-state drive for space mission memory applications by combining rad-hard and commercial off-the-shelf (COTS) non-volatile memories (NVMs) into a hybrid architecture. The architecture is controlled by a rad-hard ASIC (application specific integrated circuit) or a FPGA (field programmable gate array). Specific error handling and data management protocols are developed for use in a rad-hard environment. The rad-hard memories are smaller in overall memory density, but are used to control and manage radiation-induced errors in the main, and much larger density, non-rad-hard COTS memory devices. Small amounts of rad-hard memory are used as error buffers and temporary caches for radiation-induced errors in the large COTS memories. The rad-hard ASIC/FPGA implements a variety of error-handling protocols to manage these radiation-induced errors. The large COTS memory is triplicated for protection, and CRC-based counters are calculated for sub-areas in each COTS NVM array. These counters are stored in the rad-hard non-volatile memory. Through monitoring, rewriting, regeneration, triplication, and long-term storage, radiation-induced errors in the large NV memory are managed. The rad-hard ASIC/FPGA also interfaces with the external computer buses.
Clark’s Nutcrackers (Nucifraga columbiana) Flexibly Adapt Caching Behavior to a Cooperative Context

PubMed Central

Clary, Dawson; Kelly, Debbie M.

2016-01-01

Corvids recognize when their caches are at risk of being stolen by others and have developed strategies to protect these caches from pilferage. For instance, Clark’s nutcrackers will suppress the number of caches they make if being observed by a potential thief. However, cache protection has most often been studied using competitive contexts, so it is unclear whether corvids can adjust their caching in beneficial ways to accommodate non-competitive situations. Therefore, we examined whether Clark’s nutcrackers, a non-social corvid, would flexibly adapt their caching behaviors to a cooperative context. To do so, birds were given a caching task during which caches made by one individual were reciprocally exchanged for the caches of a partner bird over repeated trials. In this scenario, if caching behaviors can be flexibly deployed, then the birds should recognize the cooperative nature of the task and maintain or increase caching levels over time. However, if cache protection strategies are applied independent of social context and simply in response to cache theft, then cache suppression should occur. In the current experiment, we found that the birds maintained caching throughout the experiment. We report that males increased caching in response to a manipulation in which caches were artificially added, suggesting the birds could adapt to the cooperative nature of the task. Additionally, we show that caching decisions were not solely due to motivational factors, instead showing an additional influence attributed to the behavior of the partner bird. PMID:27826273
Memory, mental time travel and The Moustachio Quartet

PubMed Central

Wilkins, Clive

2017-01-01

Mental time travel allows us to revisit our memories and imagine future scenarios, and this is why memories are not only about the past, but they are also prospective. These episodic memories are not a fixed store of what happened, however, they are reassessed each time they are revisited and depend on the sequence in which events unfold. In this paper, we shall explore the complex relationships between memory and human experience, including through a series of novels ‘The Moustachio Quartet’ that can be read in any order. To do so, we shall integrate evidences from science and the arts to explore the subjective nature of memory and mental time travel, and argue that it has evolved primarily for prospection as opposed to retrospection. Furthermore, we shall question the notion that mental time travel is a uniquely human construct, and argue that some of the best evidence for the evolution of mental time travel comes from our distantly related cousins, the corvids, that cache food for the future and rely on long-lasting and highly accurate memories of what, where and when they stored their stashes of food. PMID:28479980
Memory, mental time travel and The Moustachio Quartet.

PubMed

Clayton, Nicola; Wilkins, Clive

2017-06-06

Mental time travel allows us to revisit our memories and imagine future scenarios, and this is why memories are not only about the past, but they are also prospective. These episodic memories are not a fixed store of what happened, however, they are reassessed each time they are revisited and depend on the sequence in which events unfold. In this paper, we shall explore the complex relationships between memory and human experience, including through a series of novels 'The Moustachio Quartet' that can be read in any order. To do so, we shall integrate evidences from science and the arts to explore the subjective nature of memory and mental time travel, and argue that it has evolved primarily for prospection as opposed to retrospection. Furthermore, we shall question the notion that mental time travel is a uniquely human construct, and argue that some of the best evidence for the evolution of mental time travel comes from our distantly related cousins, the corvids, that cache food for the future and rely on long-lasting and highly accurate memories of what, where and when they stored their stashes of food.
ASA-FTL: An adaptive separation aware flash translation layer for solid state drives

DOE PAGES

Xie, Wei; Chen, Yong; Roth, Philip C

2016-11-03

Here, the flash-memory based Solid State Drive (SSD) presents a promising storage solution for increasingly critical data-intensive applications due to its low latency (high throughput), high bandwidth, and low power consumption. Within an SSD, its Flash Translation Layer (FTL) is responsible for exposing the SSD’s flash memory storage to the computer system as a simple block device. The FTL design is one of the dominant factors determining an SSD’s lifespan and performance. To reduce the garbage collection overhead and deliver better performance, we propose a new, low-cost, adaptive separation-aware flash translation layer (ASA-FTL) that combines sampling, data clustering and selectivemore » caching of recency information to accurately identify and separate hot/cold data while incurring minimal overhead. We use sampling for light-weight identification of separation criteria, and our dedicated selective caching mechanism is designed to save the limited RAM resource in contemporary SSDs. Using simulations of ASA-FTL with both real-world and synthetic workloads, we have shown that our proposed approach reduces the garbage collection overhead by up to 28% and the overall response time by 15% compared to one of the most advanced existing FTLs. We find that the data clustering using a small sample size provides significant performance benefit while only incurring a very small computation and memory cost. In addition, our evaluation shows that ASA-FTL is able to adapt to the changes in the access pattern of workloads, which is a major advantage comparing to existing fixed data separation methods.« less

Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schreiber, Robert

Summary of technical results of Blackcomb Memory Devices We explored various different memory technologies (STTRAM, PCRAM, FeRAM, and ReRAM). The progress can be classified into three categories, below. Modeling and Tool Releases Various modeling tools have been developed over the last decade to help in the design of SRAM or DRAM-based memory hierarchies. To explore new design opportunities that NVM technologies can bring to the designers, we have developed similar high-level models for NVM, including PCRAMsim [Dong 2009], NVSim [Dong 2012], and NVMain [Poremba 2012]. NVSim is a circuit-level model for NVM performance, energy, and area estimation, which supports variousmore » NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. On the other side, NVMain is a cycle accurate main memory simulator designed to simulate emerging nonvolatile memories at the architectural level. We have released these models as open source tools and provided contiguous support to them. We also proposed PS3-RAM, which is a fast, portable and scalable statistical STT-RAM reliability analysis model [Wen 2012]. Design Space Exploration and Optimization With the support of these models, we explore different device/circuit optimization techniques. For example, in [Niu 2012a] we studied the power reduction technique for the application of ECC scheme in ReRAM designs and proposed to use ECC code to relax the BER (Bit Error Rate) requirement of a single memory to improve the write energy consumption and latency for both 1T1R and cross-point ReRAM designs. In [Xu 2011], we proposed a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We also studied the tradeoffs in building a reliable crosspoint ReRAM array [Niu 2012b]. We have conducted an in depth analysis of the circuit and system level design implications of multi-layer cross-point Resistive RAM (MLCReRAM) from performance, power and reliability perspectives [Xu 2013]. The objective of this study is to understand the design trade-offs of this technology with respect to the MLC Phase Change Memory (MLCPCM).Our MLC ReRAM design at the circuit and system levels indicates that different resistance allocation schemes, programming strategies, peripheral designs, and material selections profoundly affect the area, latency, power, and reliability of MLC ReRAM. Based on this analysis, we conduct two case studies: first we compare MLC ReRAM design against MLC phase-change memory (PCM) and multi-layer cross-point ReRAM design, and point out why multi-level ReRAM is appealing; second we further explore the design space for MLC ReRAM. Architecture and Application We explored hybrid checkpointing using phase-change memory for future exascale systems [Dong 2011] and showed that the use of nonvolatile memory for local checkpointing significantly increases the number of faults covered by local checkpoints and reduces the probability of a global failure in the middle of a global checkpoint to less than 1%. We also proposed a technique called i2WAP to mitigate the write variations in NVM-based last-level cache for the improvement of the NVM lifetime [Wang 2013]. Our wear leveling technique attempts to work around the limitations of write endurance by arranging data access so that write operations can be distributed evenly across all the storage cells. During our intensive research on fault-tolerant NVM design, we found that ECC cannot effectively tolerate hard errors from limited write endurance and process imperfection. Therefore, we devised a novel Point and Discard (PAD) architecture in in [ 2012] as a hard-error-tolerant architecture for ReRAM-based Last Level Caches. PAD improves the lifetime of ReRAM caches by 1.6X-440X under different process variations without performance overhead in the system's early life. We have investigated the applicability of NVM for persistent memory design [Zhao 2013]. New byte addressable NVM enables fast persistent memory that allows in-memory persistent data objects to be updated with much higher throughput. Despite the significant improvement, the performance of these designs is only 50% of the native system with no persistence support, due to the logging or copy-on-write mechanisms used to update the persistent memory. A challenge in this approach is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. We have designed a persistent memory system, called Klin, that can provide performance as close as that of the native system. The Klin design adopts a non-volatile cache and a non-volatile main memory for constructing a multi-versioned durable memory system, enabling atomic updates without logging or copy-on-write. Our evaluation shows that the proposed Kiln mechanism can achieve up to 2X of performance improvement to NVRAM-based persistent memory employing write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and no redundant writes to slow non-volatile storage media. The work was published in MICRO 2013 and received Best Paper Honorable Mentioned Award.« less
A Tamper-Resistant Programming Language System

DTIC Science & Technology

2006-06-02

www.cs.ucsb.edu/~vigna/listpub.html). [15] Gassend, B ., D. Clarke, M. van Dijk, S. Devadas , and E. Suh, “Caches and Merkle Trees for Efficient Memory...CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 23 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT...winhec/papers03.mspx). [3] Barak, B ., O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K.Yang, “On the (Im)possibility of Obfuscating
Ensuring a C2 Level of Trust and Interoperability in a Networked Windows NT Environment

DTIC Science & Technology

1996-09-01

addition, it should be noted that the device drivers, microkernel , memory manager, and Hardware Abstraction Layer are all hardware dependent. a. The...Executive The executive is further divided into three conceptual layers which are referred to as-the Hardware Abstraction Layer (HAL), the Microkernel , and...Subsystem Executive Subsystems Manager I/O Manager Cache Manager File Systems Microkernel Device Driver Hardware Abstraction Layer F HARDWARE Figure 3
Effects of experience and social context on prospective caching strategies by scrub jays.

PubMed

Emery, N J; Clayton, N S

2001-11-22

Social life has costs associated with competition for resources such as food. Food storing may reduce this competition as the food can be collected quickly and hidden elsewhere; however, it is a risky strategy because caches can be pilfered by others. Scrub jays (Aphelocoma coerulescens) remember 'what', 'where' and 'when' they cached. Like other corvids, they remember where conspecifics have cached, pilfering them when given the opportunity, but may also adjust their own caching strategies to minimize potential pilfering. To test this, jays were allowed to cache either in private (when the other bird's view was obscured) or while a conspecific was watching, and then recover their caches in private. Here we show that jays with prior experience of pilfering another bird's caches subsequently re-cached food in new cache sites during recovery trials, but only when they had been observed caching. Jays without pilfering experience did not, even though they had observed other jays caching. Our results suggest that jays relate information about their previous experience as a pilferer to the possibility of future stealing by another bird, and modify their caching strategy accordingly.
Cache-Cache Comparison for Supporting Meaningful Learning

ERIC Educational Resources Information Center

Wang, Jingyun; Fujino, Seiji

2015-01-01

The paper presents a meaningful discovery learning environment called "cache-cache comparison" for a personalized learning support system. The processing of seeking hidden relations or concepts in "cache-cache comparison" is intended to encourage learners to actively locate new knowledge in their knowledge framework and check…
Optimizing Maintenance of Constraint-Based Database Caches

NASA Astrophysics Data System (ADS)

Klein, Joachim; Braun, Susanne

Caching data reduces user-perceived latency and often enhances availability in case of server crashes or network failures. DB caching aims at local processing of declarative queries in a DBMS-managed cache close to the application. Query evaluation must produce the same results as if done at the remote database backend, which implies that all data records needed to process such a query must be present and controlled by the cache, i. e., to achieve “predicate-specific” loading and unloading of such record sets. Hence, cache maintenance must be based on cache constraints such that “predicate completeness” of the caching units currently present can be guaranteed at any point in time. We explore how cache groups can be maintained to provide the data currently needed. Moreover, we design and optimize loading and unloading algorithms for sets of records keeping the caching units complete, before we empirically identify the costs involved in cache maintenance.
Current desires of conspecific observers affect cache-protection strategies in California scrub-jays and Eurasian jays.

PubMed

Ostojić, Ljerka; Legg, Edward W; Brecht, Katharina F; Lange, Florian; Deininger, Chantal; Mendl, Michael; Clayton, Nicola S

2017-01-23

Many corvid species accurately remember the locations where they have seen others cache food, allowing them to pilfer these caches efficiently once the cachers have left the scene [1]. To protect their caches, corvids employ a suite of different cache-protection strategies that limit the observers' visual or acoustic access to the cache site [2,3]. In cases where an observer's sensory access cannot be reduced it has been suggested that cachers might be able to minimise the risk of pilfering if they avoid caching food the observer is most motivated to pilfer [4]. In the wild, corvids have been reported to pilfer others' caches as soon as possible after the caching event [5], such that the cacher might benefit from adjusting its caching behaviour according to the observer's current desire. In the current study, observers pilfered according to their current desire: they preferentially pilfered food that they were not sated on. Cachers adjusted their caching behaviour accordingly: they protected their caches by selectively caching food that observers were not motivated to pilfer. The same cache-protection behaviour was found when cachers could not see on which food the observers were sated. Thus, the cachers' ability to respond to the observer's desire might have been driven by the observer's behaviour at the time of caching. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Multi-petascale highly efficient parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less
The Effects of Cache Modification on Food Caching and Retrieval Behavior by Rats

ERIC Educational Resources Information Center

McKenzie, T.L.B.; Bird, L.R.; Roberts, W.A.

2005-01-01

Rats cached pieces of cheese on four different arms of an eight-arm radial maze. On a retrieval test given 45min later, rats learned to return to arms where food was cached before arms where food had not been cached. Tests were then performed in which cache sites on one side of the maze were always modified (pilfered or degraded), but cache sites…
Checkpointing in speculative versioning caches

DOEpatents

Eichenberger, Alexandre E; Gara, Alan; Gschwind, Michael K; Ohmacht, Martin

2013-08-27

Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.
Looking for episodic memory in animals and young children: prospects for a new minimalism.

PubMed

Clayton, Nicola S; Russell, James

2009-09-01

Because animals and young children cannot be interrogated about their experiences it is difficult to conduct research into their episodic memories. The approach to this issue adopted by Clayton and Dickinson [Clayton, N. S., & Dickinson, A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395, 272-274] was to take a conceptually minimalist definition of episodic memory, in terms of integrating information about what was done where and when [Tulving, E. (1972). Episodic and semantic memory. In E. Tulving, & W. Donaldson (Eds.), Organisation of memory (pp. 381-403). New York: Academic Press], and to refer to such memories as 'episodic-like'. Some claim, however, that because animals supposedly lack the conceptual abilities necessary for episodic recall one should properly call these memories 'semantic'. We address this debate with a novel approach to episodic memory, which is minimalist insofar as it focuses on the non-conceptual content of a re-experienced situation. It rests on Kantian assumptions about the necessary 'perspectival' features of any objective experience or re-experience. We show how adopting this perspectival approach can render an episodic interpretation of the animal data more plausible and can also reveal patterns in the mosaic of developmental evidence for episodic memory in humans.
The future of memory

NASA Astrophysics Data System (ADS)

Marinella, M.

In the not too distant future, the traditional memory and storage hierarchy of may be replaced by a single Storage Class Memory (SCM) device integrated on or near the logic processor. Traditional magnetic hard drives, NAND flash, DRAM, and higher level caches (L2 and up) will be replaced with a single high performance memory device. The Storage Class Memory paradigm will require high speed (< 100 ns read/write), excellent endurance (> 1012), nonvolatility (retention > 10 years), and low switching energies (< 10 pJ per switch). The International Technology Roadmap for Semiconductors (ITRS) has recently evaluated several potential candidates SCM technologies, including Resistive (or Redox) RAM, Spin Torque Transfer RAM (STT-MRAM), and phase change memory (PCM). All of these devices show potential well beyond that of current flash technologies and research efforts are underway to improve the endurance, write speeds, and scalabilities to be on-par with DRAM. This progress has interesting implications for space electronics: each of these emerging device technologies show excellent resistance to the types of radiation typically found in space applications. Commercially developed, high density storage class memory-based systems may include a memory that is physically radiation hard, and suitable for space applications without major shielding efforts. This paper reviews the Storage Class Memory concept, emerging memory devices, and possible applicability to radiation hardened electronics for space.
A two-level cache for distributed information retrieval in search engines.

PubMed

Zhang, Weizhe; He, Hui; Ye, Jianwei

2013-01-01

To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache.
A Two-Level Cache for Distributed Information Retrieval in Search Engines

PubMed Central

Zhang, Weizhe; He, Hui; Ye, Jianwei

2013-01-01

To improve the performance of distributed information retrieval in search engines, we propose a two-level cache structure based on the queries of the users' logs. We extract the highest rank queries of users from the static cache, in which the queries are the most popular. We adopt the dynamic cache as an auxiliary to optimize the distribution of the cache data. We propose a distribution strategy of the cache data. The experiments prove that the hit rate, the efficiency, and the time consumption of the two-level cache have advantages compared with other structures of cache. PMID:24363621
Way-Scaling to Reduce Power of Cache with Delay Variation

NASA Astrophysics Data System (ADS)

Goudarzi, Maziar; Matsumura, Tadayuki; Ishihara, Tohru

The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all) of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.
Cache Coherence Protocols for Large-Scale Multiprocessors

DTIC Science & Technology

1990-09-01

and is compared with the other protocols for large-scale machines. In later analysis, this coherence method is designated by the acronym OCPD , which...private read misses 2 6 6 ( OCPD ) private write misses 2 6 6 Table 4.2: Transaction Types and Costs. the performance of the memory system. These...methodologies. Figure 4-2 shows the processor utiliza- tions of the Weather program, with special code in the dyn-nic post-mortem sched- 94 OCPD DlrINB
Design Trade-off Between Performance and Fault-Tolerance of Space Onboard Computers

NASA Astrophysics Data System (ADS)

Gorbunov, M. S.; Antonov, A. A.

2017-01-01

It is well known that there is a trade-off between performance and power consumption in onboard computers. The fault-tolerance is another important factor affecting performance, chip area and power consumption. Involving special SRAM cells and error-correcting codes is often too expensive with relation to the performance needed. We discuss the possibility of finding the optimal solutions for modern onboard computer for scientific apparatus focusing on multi-level cache memory design.
FPGA-Based, Self-Checking, Fault-Tolerant Computers

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2004-01-01

A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
Vocabulary: A Major Factor in Reading Comprehension.

ERIC Educational Resources Information Center

Stotsky, Sandra L.

The major purpose of this research was to show how specific theoretical principles and criteria could be integrated and supported by empirical data to provide a rationale for more systematic introduction of vocabulary in middle-grade reading-instruction material. Research was limited to the teaching of prefixes and the use of prefixed words. A…
Vowel Harmony in Palestinian Arabic: A Metrical Perspective.

ERIC Educational Resources Information Center

Abu-Salim, I. M.

1987-01-01

The autosegmental rule of vowel harmony (VH) in Palestinian Arabic is shown to be constrained simultaneously by metrical and segmental boundaries. The indicative prefix bi- is no longer an exception to VH if a structure is assumed that disallows the prefix from sharing a foot with the stem, consequently blocking VH. (Author/LMO)

Common Medical Abbreviations and Terminology: Modularized Instruction for Nurses.

ERIC Educational Resources Information Center

Moseley, James L.

A learning module to introduce nurses to the main medical abbreviations and often-used prefixes and suffixes is presented. Learning objectives of the module are: to provide the definitions of often-used suffixes and prefixes, and to identify definitions of medical abbreviations. The following materials are presented: a pretest consisting of 30…
48 CFR 246.710-70 - Warranty attachment.

Code of Federal Regulations, 2011 CFR

2011-10-01

... Enterprise Identifier Code Type 0-9—GS1 Company Prefix. D—CAGE. LB—ATIS-0322000. LH—EHIBCC. RH—HIBCC. UN—DUNS... Guarantor Enterprise Identifier Code Type 0-9—GS1 Company Prefix. D—CAGE. LB—ATIS-0322000. LH—EHIBCC. RH... returns Name ** Address line 1 ** Address line 2 ** City/county ** State/province ** Postal code...
Algorithmic and user study of an autocompletion algorithm on a large medical vocabulary.

PubMed

Sevenster, Merlijn; van Ommering, Rob; Qian, Yuechen

2012-02-01

Autocompletion supports human-computer interaction in software applications that let users enter textual data. We will be inspired by the use case in which medical professionals enter ontology concepts, catering the ongoing demand for structured and standardized data in medicine. Goal is to give an algorithmic analysis of one particular autocompletion algorithm, called multi-prefix matching algorithm, which suggests terms whose words' prefixes contain all words in the string typed by the user, e.g., in this sense, opt ner me matches optic nerve meningioma. Second we aim to investigate how well it supports users entering concepts from a large and comprehensive medical vocabulary (snomed ct). We give a concise description of the multi-prefix algorithm, and sketch how it can be optimized to meet required response time. Performance will be compared to a baseline algorithm, which gives suggestions that extend the string typed by the user to the right, e.g. optic nerve m gives optic nerve meningioma, but opt ner me does not. We conduct a user experiment in which 12 participants are invited to complete 40 snomed ct terms with the baseline algorithm and another set of 40 snomed ct terms with the multi-prefix algorithm. Our results show that users need significantly fewer keystrokes when supported by the multi-prefix algorithm than when supported by the baseline algorithm. The proposed algorithm is a competitive candidate for searching and retrieving terms from a large medical ontology. Copyright © 2011 Elsevier Inc. All rights reserved.
Zebra finches are able to learn affixation-like patterns.

PubMed

Chen, Jiani; Jansen, Naomi; ten Cate, Carel

2016-01-01

Adding an affix to transform a word is common across the world languages, with the edges of words more likely to carry out such a function. However, detecting affixation patterns is also observed in learning tasks outside the domain of language, suggesting that the underlying mechanism from which affixation patterns have arisen may not be language or even human specific. We addressed whether a songbird, the zebra finch, is able to discriminate between, and generalize, affixation-like patterns. Zebra finches were trained and tested in a Go/Nogo paradigm to discriminate artificial song element sequences resembling prefixed and suffixed 'words.' The 'stems' of the 'words,' consisted of different combinations of a triplet of song elements, to which a fourth element was added as either a 'prefix' or a 'suffix.' After training, the birds were tested with novel stems, consisting of either rearranged familiar element types or novel element types. The birds were able to generalize the affixation patterns to novel stems with both familiar and novel element types. Hence, the discrimination resulting from the training was not based on memorization of individual stimuli, but on a shared property among Go or Nogo stimuli, i.e., affixation patterns. Remarkably, birds trained with suffixation as Go pattern showed clear evidence of using both prefix and suffix, while those trained with the prefix as the Go stimulus used primarily the prefix. This finding illustrates that an asymmetry in attending to different affixations is not restricted to human languages.
New Algorithms and Lower Bounds for Sequential-Access Data Compression

NASA Astrophysics Data System (ADS)

Gagie, Travis

2009-02-01

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.
Parallelization of an Object-Oriented Unstructured Aeroacoustics Solver

NASA Technical Reports Server (NTRS)

Baggag, Abdelkader; Atkins, Harold; Oezturan, Can; Keyes, David

1999-01-01

A computational aeroacoustics code based on the discontinuous Galerkin method is ported to several parallel platforms using MPI. The discontinuous Galerkin method is a compact high-order method that retains its accuracy and robustness on non-smooth unstructured meshes. In its semi-discrete form, the discontinuous Galerkin method can be combined with explicit time marching methods making it well suited to time accurate computations. The compact nature of the discontinuous Galerkin method also makes it well suited for distributed memory parallel platforms. The original serial code was written using an object-oriented approach and was previously optimized for cache-based machines. The port to parallel platforms was achieved simply by treating partition boundaries as a type of boundary condition. Code modifications were minimal because boundary conditions were abstractions in the original program. Scalability results are presented for the SCI Origin, IBM SP2, and clusters of SGI and Sun workstations. Slightly superlinear speedup is achieved on a fixed-size problem on the Origin, due to cache effects.
Importance of balanced architectures in the design of high-performance imaging systems

NASA Astrophysics Data System (ADS)

Sgro, Joseph A.; Stanton, Paul C.

1999-03-01

Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
GPU color space conversion

NASA Astrophysics Data System (ADS)

Chase, Patrick; Vondran, Gary

2011-01-01

Tetrahedral interpolation is commonly used to implement continuous color space conversions from sparse 3D and 4D lookup tables. We investigate the implementation and optimization of tetrahedral interpolation algorithms for GPUs, and compare to the best known CPU implementations as well as to a well known GPU-based trilinear implementation. We show that a 500 NVIDIA GTX-580 GPU is 3x faster than a 1000 Intel Core i7 980X CPU for 3D interpolation, and 9x faster for 4D interpolation. Performance-relevant GPU attributes are explored including thread scheduling, local memory characteristics, global memory hierarchy, and cache behaviors. We consider existing tetrahedral interpolation algorithms and tune based on the structure and branching capabilities of current GPUs. Global memory performance is improved by reordering and expanding the lookup table to ensure optimal access behaviors. Per multiprocessor local memory is exploited to implement optimally coalesced global memory accesses, and local memory addressing is optimized to minimize bank conflicts. We explore the impacts of lookup table density upon computation and memory access costs. Also presented are CPU-based 3D and 4D interpolators, using SSE vector operations that are faster than any previously published solution.
Seedling Establishment of Coast Live Oak in Relation to Seed Caching by Jays

Treesearch

Joe R. McBride; Ed Norberg; Sheauchi Cheng; Ahmad Mossadegh

1991-01-01

The purpose of this study was to simulate the caching of acorns by jays and rodents to see if less costly procedures could be developed for the establishment of coast live oak (Quercus agrifolia). Four treatments [(1) random - single acorn cache, (2) regular - single acorn cache, (3) regular - 5 acorn cache, (4) regular - 10 acorn cache] were planted...
A trace-driven analysis of name and attribute caching in a distributed system

NASA Technical Reports Server (NTRS)

Shirriff, Ken W.; Ousterhout, John K.

1992-01-01

This paper presents the results of simulating file name and attribute caching on client machines in a distributed file system. The simulation used trace data gathered on a network of about 40 workstations. Caching was found to be advantageous: a cache on each client containing just 10 directories had a 91 percent hit rate on name look ups. Entry-based name caches (holding individual directory entries) had poorer performance for several reasons, resulting in a maximum hit rate of about 83 percent. File attribute caching obtained a 90 percent hit rate with a cache on each machine of the attributes for 30 files. The simulations show that maintaining cache consistency between machines is not a significant problem; only 1 in 400 name component look ups required invalidation of a remotely cached entry. Process migration to remote machines had little effect on caching. Caching was less successful in heavily shared and modified directories such as /tmp, but there weren't enough references to /tmp overall to affect the results significantly. We estimate that adding name and attribute caching to the Sprite operating system could reduce server load by 36 percent and the number of network packets by 30 percent.
California scrub-jays reduce visual cues available to potential pilferers by matching food colour to caching substrate.

PubMed

Kelley, Laura A; Clayton, Nicola S

2017-07-01

Some animals hide food to consume later; however, these caches are susceptible to theft by conspecifics and heterospecifics. Caching animals can use protective strategies to minimize sensory cues available to potential pilferers, such as caching in shaded areas and in quiet substrate. Background matching (where object patterning matches the visual background) is commonly seen in prey animals to reduce conspicuousness, and caching animals may also use this tactic to hide caches, for example, by hiding coloured food in a similar coloured substrate. We tested whether California scrub-jays ( Aphelocoma californica ) camouflage their food in this way by offering them caching substrates that either matched or did not match the colour of food available for caching. We also determined whether this caching behaviour was sensitive to social context by allowing the birds to cache when a conspecific potential pilferer could be both heard and seen (acoustic and visual cues present), or unseen (acoustic cues only). When caching events could be both heard and seen by a potential pilferer, birds cached randomly in matching and non-matching substrates. However, they preferentially hid food in the substrate that matched the food colour when only acoustic cues were present. This is a novel cache protection strategy that also appears to be sensitive to social context. We conclude that studies of cache protection strategies should consider the perceptual capabilities of the cacher and potential pilferers. © 2017 The Author(s).
Center for Technology for Advanced Scientific Componet Software (TASCS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Govindaraju, Madhusudhan

Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less
JuxtaView - A tool for interactive visualization of large imagery on scalable tiled displays

USGS Publications Warehouse

Krishnaprasad, N.K.; Vishwanath, V.; Venkataraman, S.; Rao, A.G.; Renambot, L.; Leigh, J.; Johnson, A.E.; Davis, B.

2004-01-01

JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed memory approach for out-of-core montage visualization, using LambdaRAM, a software-based network-level cache system. The ultimate goal of JuxtaView is to enable a user to interactively roam through potentially terabytes of distributed, spatially referenced image data such as those from electron microscopes, satellites and aerial photographs. In working towards this goal, we describe our first prototype implemented over a local area network, where the image is distributed using LambdaRAM, on the memory of all nodes of a PC cluster driving a tiled display wall. Aggressive pre-fetching schemes employed by LambdaRAM help to reduce latency involved in remote memory access. We compare LambdaRAM with a more traditional memory-mapped file approach for out-of-core visualization. ?? 2004 IEEE.
The effect of patterning options on embedded memory cells in logic technologies at iN10 and iN7

NASA Astrophysics Data System (ADS)

Appeltans, Raf; Weckx, Pieter; Raghavan, Praveen; Kim, Ryoung-Han; Kar, Gouri Sankar; Furnémont, Arnaud; Van der Perre, Liesbet; Dehaene, Wim

2017-03-01

Static Random Access Memory (SRAM) cells are used together with logic standard cells as the benchmark to develop the process flow for new logic technologies. In order to achieve successful integration of Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM) as area efficient higher level embedded cache, it also needs to be included as a benchmark. The simple cell structure of STT-MRAM brings extra patterning challenges to achieve high density. The two memory types are compared in terms of minimum area and critical design rules in both the iN10 and iN7 node, with an extra focus on patterning options in iN7. Both the use of Self-Aligned Quadruple Patterning (SAQP) mandrel and spacer engineering, as well as multi-level via's are explored. These patterning options result in large area gains for the STT-MRAM cell and moreover determine which cell variant is the smallest.
Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks

PubMed Central

Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok

2016-01-01

Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user’s quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities. PMID:27347975
Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks.

PubMed

Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok

2016-06-25

Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user's quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities.
Do Current Basal Series Use Clear Explanations and Correct Exemplars in Teaching Prefixes?

ERIC Educational Resources Information Center

Volpe, Myra Elaine

A study (replicating a similar 1977 study by S. Stotsky), examined whether current basal series teach prefixion clearly. Teacher's guides, student texts, and workbooks of nine popular basal reader series were examined to ascertain whether they offered a clear definition of the term "prefix" and whether that definition was reinforced by…
Decomposing Slavic Aspect: The Role of Aspectual Morphology in Polish and Other Slavic Languages

ERIC Educational Resources Information Center

Lazorczyk, Agnieszka Agata

2010-01-01

This dissertation considers the problem of the semantic function of verbal aspectual morphology in Polish and other Slavic languages in the framework of generative syntax and semantics. Three kinds of such morphology are examined: (i) prefixes attaching directly to the root, (ii) "secondary imperfective" suffixes, and (iii) three prefixes that…
Prefix Identification in the Reading of Dutch Bisyllabic Words

ERIC Educational Resources Information Center

Verhoeven, Ludo; Schreuder, Robert; Haarman, Vera

2006-01-01

Two experiments were conducted in order to explore the role of prefix identification in the reading of Dutch bisyllabic words. Although Dutch orthography is highly regular, several deviations from a one-to-one correspondence exist. A case in point is the grapheme E which can represent the vowels epsilon, e and oe in polysyllabic words. In…
Medical Terminology: Using Some Common Prefixes, Suffixes, and Roots. Health Occupations Education Module.

ERIC Educational Resources Information Center

Temple Univ., Philadelphia, PA. Div. of Vocational Education.

This module on medical terminology (using common prefixes, suffixes, and root words) is one of 17 modules designed for individualized instruction in health occupations education programs at both the secondary and postsecondary levels. This module consists of an introduction to the module topic, a list of resources needed, and three learning…

Enhancing a Web Crawler with Arabic Search Capability

DTIC Science & Technology

2010-09-01

7 Figure 2. Monolingual 11-point precision results. From [14]...........................................8 Figure 3. Lucene...libraries (prefixes dictionary , stems dictionary and suffixes dictionary ). If all the word elements (prefix, stem, suffix) are found in their...stemmer improved over 90% in average precision from raw retrieval. The authors concluded that stemming is very effective on Arabic IR. For monolingual
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

NASA Astrophysics Data System (ADS)

Zhao, Huatao; Luo, Xiao; Zhu, Chen; Watanabe, Takahiro; Zhu, Tianbo

2017-07-01

In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Fusion PIC code performance analysis on the Cori KNL system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koskela, Tuomas S.; Deslippe, Jack; Friesen, Brian

We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization ismore » shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.« less
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

PubMed

Newberg, Lee A

2008-08-15

A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
Caching Servers for ATLAS

NASA Astrophysics Data System (ADS)

Gardner, R. W.; Hanushevsky, A.; Vukotic, I.; Yang, W.

2017-10-01

As many LHC Tier-3 and some Tier-2 centers look toward streamlining operations, they are considering autonomously managed storage elements as part of the solution. These storage elements are essentially file caching servers. They can operate as whole file or data block level caches. Several implementations exist. In this paper we explore using XRootD caching servers that can operate in either mode. They can also operate autonomously (i.e. demand driven), be centrally managed (i.e. a Rucio managed cache), or operate in both modes. We explore the pros and cons of various configurations as well as practical requirements for caching to be effective. While we focus on XRootD caches, the analysis should apply to other kinds of caches as well.
The effect of code expanding optimizations on instruction cache design

NASA Technical Reports Server (NTRS)

Chen, William Y.; Chang, Pohua P.; Conte, Thomas M.; Hwu, Wen-Mei W.

1991-01-01

It is shown that code expanding optimizations have strong and non-intuitive implications on instruction cache design. Three types of code expanding optimizations are studied: instruction placement, function inline expansion, and superscalar optimizations. Overall, instruction placement reduces the miss ratio of small caches. Function inline expansion improves the performance for small cache sizes, but degrades the performance of medium caches. Superscalar optimizations increases the cache size required for a given miss ratio. On the other hand, they also increase the sequentiality of instruction access so that a simple load-forward scheme effectively cancels the negative effects. Overall, it is shown that with load forwarding, the three types of code expanding optimizations jointly improve the performance of small caches and have little effect on large caches.
Contrastive Analysis between English and Indonesian Prefixes and Suffixes (A Narative Text Analysis of Legends in Perspective of Morphology)

ERIC Educational Resources Information Center

Pauzan

2016-01-01

This research deals with finding the similarities and differences, and describing the types of the English and Indonesian prefixes and suffixes for the narrative text of Legends. In this research, writer used descriptive qualitative research and contrastive methodology to find out the valid data. After investigating the data, writer found some…
Organizing the pantry: cache management improves quality of overwinter food stores in a montane mammal

USGS Publications Warehouse

Jakopak, Rhiannon P.; Hall, L. Embere; Chalfoun, Anna D.

2017-01-01

Many mammals create food stores to enhance overwinter survival in seasonal environments. Strategic arrangement of food within caches may facilitate the physical integrity of the cache or improve access to high-quality food to ensure that cached resources meet future nutritional demands. We used the American pika (Ochotona princeps), a food-caching lagomorph, to evaluate variation in haypile (cache) structure (i.e., horizontal layering by plant functional group) in Wyoming, United States. Fifty-five percent of 62 haypiles contained at least 2 discrete layers of vegetation. Adults and juveniles layered haypiles in similar proportions. The probability of layering increased with haypile volume, but not haypile number per individual or nearby forage diversity. Vegetation cached in layered haypiles was also higher in nitrogen compared to vegetation in unlayered piles. We found that American pikas frequently structured their food caches, structured caches were larger, and the cached vegetation in structured piles was of higher nutritional quality. Improving access to stable, high-quality vegetation in haypiles, a critical overwinter food resource, may allow individuals to better persist amidst harsh conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Zhang, Zhao; Vetter, Jeffrey S

Recent trends of CMOS scaling and use of large last level caches (LLCs) have led to significant increase in the leakage energy consumption of LLCs and hence, managing their energy consumption has become extremely important in modern processor design. The conventional cache energy saving techniques require offline profiling or provide only coarse granularity of cache allocation. We present FlexiWay, a cache energy saving technique which uses dynamic cache reconfiguration. FlexiWay logically divides the cache sets into multiple (e.g. 16) modules and dynamically turns off suitable and possibly different number of cache ways in each module. FlexiWay has very small implementationmore » overhead and it provides fine-grain cache allocation even with caches of typical associativity, e.g. an 8-way cache. Microarchitectural simulations have been performed using an x86-64 simulator and workloads from SPEC2006 suite. Also, FlexiWay has been compared with two conventional energy saving techniques. The results show that FlexiWay provides largest energy saving and incurs only small loss in performance. For single, dual and quad core systems, the average energy saving using FlexiWay are 26.2%, 25.7% and 22.4%, respectively.« less
Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Bolosky, William Joseph

1993-01-01

Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Millar, A. P.; Baranova, T.; Behrmann, G.

For over a decade, dCache has been synonymous with large-capacity, fault-tolerant storage using commodity hardware that supports seamless data migration to and from tape. In this paper we provide some recent news of changes within dCache and the community surrounding it. We describe the flexible nature of dCache that allows both externally developed enhancements to dCache facilities and the adoption of new technologies. Finally, we present information about avenues the dCache team is exploring for possible future improvements in dCache.
A Distributed Cache Update Deployment Strategy in CDN

NASA Astrophysics Data System (ADS)

E, Xinhua; Zhu, Binjie

2018-04-01

The CDN management system distributes content objects to the edge of the internet to achieve the user's near access. Cache strategy is an important problem in network content distribution. A cache strategy was designed in which the content effective diffusion in the cache group, so more content was storage in the cache, and it improved the group hit rate.
NAFFS: network attached flash file system for cloud storage on portable consumer electronics

NASA Astrophysics Data System (ADS)

Han, Lin; Huang, Hao; Xie, Changsheng

Cloud storage technology has become a research hotspot in recent years, while the existing cloud storage services are mainly designed for data storage needs with stable high speed Internet connection. Mobile Internet connections are often unstable and the speed is relatively low. These native features of mobile Internet limit the use of cloud storage in portable consumer electronics. The Network Attached Flash File System (NAFFS) presented the idea of taking the portable device built-in NAND flash memory as the front-end cache of virtualized cloud storage device. Modern portable devices with Internet connection have built-in more than 1GB NAND Flash, which is quite enough for daily data storage. The data transfer rate of NAND flash device is much higher than mobile Internet connections[1], and its non-volatile feature makes it very suitable as the cache device of Internet cloud storage on portable device, which often have unstable power supply and intermittent Internet connection. In the present work, NAFFS is evaluated with several benchmarks, and its performance is compared with traditional network attached file systems, such as NFS. Our evaluation results indicate that the NAFFS achieves an average accessing speed of 3.38MB/s, which is about 3 times faster than directly accessing cloud storage by mobile Internet connection， and offers a more stable interface than that of directly using cloud storage API. Unstable Internet connection and sudden power off condition are tolerable, and no data in cache will be lost in such situation.
Smart caching based on mobile agent of power WebGIS platform.

PubMed

Wang, Xiaohui; Wu, Kehe; Chen, Fei

2013-01-01

Power information construction is developing towards intensive, platform, distributed direction with the expansion of power grid and improvement of information technology. In order to meet the trend, power WebGIS was designed and developed. In this paper, we first discuss the architecture and functionality of power WebGIS, and then we study caching technology in detail, which contains dynamic display cache model, caching structure based on mobile agent, and cache data model. We have designed experiments of different data capacity to contrast performance between WebGIS with the proposed caching model and traditional WebGIS. The experimental results showed that, with the same hardware environment, the response time of WebGIS with and without caching model increased as data capacity growing, while the larger the data was, the higher the performance of WebGIS with proposed caching model improved.
Population-based analysis of Alzheimer's disease risk alleles implicates genetic interactions.

PubMed

Ebbert, Mark T W; Ridge, Perry G; Wilson, Andrew R; Sharp, Aaron R; Bailey, Matthew; Norton, Maria C; Tschanz, JoAnn T; Munger, Ronald G; Corcoran, Christopher D; Kauwe, John S K

2014-05-01

Reported odds ratios and population attributable fractions (PAF) for late-onset Alzheimer's disease (LOAD) risk loci (BIN1, ABCA7, CR1, MS4A4E, CD2AP, PICALM, MS4A6A, CD33, and CLU) come from clinically ascertained samples. Little is known about the combined PAF for these LOAD risk alleles and the utility of these combined markers for case-control prediction. Here we evaluate these loci in a large population-based sample to estimate PAF and explore the effects of additive and nonadditive interactions on LOAD status prediction performance. 2419 samples from the Cache County Memory Study were genotyped for APOE and nine LOAD risk loci from AlzGene.org. We used logistic regression and receiver operator characteristic analysis to assess the LOAD status prediction performance of these loci using additive and nonadditive models and compared odds ratios and PAFs between AlzGene.org and Cache County. Odds ratios were comparable between Cache County and AlzGene.org when identical single nucleotide polymorphisms were genotyped. PAFs from AlzGene.org ranged from 2.25% to 37%; those from Cache County ranged from .05% to 20%. Including non-APOE alleles significantly improved LOAD status prediction performance (area under the curve = .80) over APOE alone (area under the curve = .78) when not constrained to an additive relationship (p < .03). We identified potential allelic interactions (p values uncorrected): CD33-MS4A4E (synergy factor = 5.31; p < .003) and CLU-MS4A4E (synergy factor = 3.81; p < .016). Although nonadditive interactions between loci significantly improve diagnostic ability, the improvement does not reach the desired sensitivity or specificity for clinical use. Nevertheless, these results suggest that understanding gene-gene interactions may be important in resolving Alzheimer's disease etiology. Copyright © 2014 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Determinants of seed removal distance by scatter-hoarding rodents in deciduous forests.

PubMed

Moore, Jeffrey E; McEuen, Amy B; Swihart, Robert K; Contreras, Thomas A; Steele, Michael A

2007-10-01

Scatter-hoarding rodents should space food caches to maximize cache recovery rate (to minimize loss to pilferers) relative to the energetic cost of carrying food items greater distances. Optimization models of cache spacing make two predictions. First, spacing of caches should be greater for food items with greater energy content. Second, the mean distance between caches should increase with food abundance. However, the latter prediction fails to account for the effect of food abundance on the behavior of potential pilferers or on the ability of caching individuals to acquire food by means other than recovering their own caches. When considering these factors, shorter cache distances may be predicted in conditions of higher food abundance. We predicted that seed caching distances would be greater for food items of higher energy content and during lower ambient food abundance and that the effect of seed type on cache distance variation would be lower during higher food abundance. We recorded distances moved for 8636 seeds of five seed types at 15 locations in three forested sites in Pennsylvania, USA, and 29 forest fragments in Indiana, U.S.A., across five different years. Seed production was poor in three years and high in two years. Consistent with previous studies, seeds with greater energy content were moved farther than less profitable food items. Seeds were dispersed less far in seed-rich years than in seed-poor years, contrary to predictions of conventional models. Interactions were important, with seed type effects more evident in seed-poor years. These results suggest that, when food is superabundant, optimal cache distances are more strongly determined by minimizing energy cost of caching than by minimizing pilfering rates and that cache loss rates may be more strongly density-dependent in times of low seed abundance.
Learning about Science Graphs and Word Games. Superific Science Book V. A Good Apple Science Activity Book for Grades 5-8+.

ERIC Educational Resources Information Center

Conway, Lorraine

This packet of student materials contains a variety of worksheet activities dealing with science graphs and science word games. These reproducible materials deal with: (1) bar graphs; (2) line graphs; (3) circle graphs; (4) pictographs; (5) histograms; (6) artgraphs; (7) designing your own graphs; (8) medical prefixes; (9) color prefixes; (10)…
Entropy-Based Bounds On Redundancies Of Huffman Codes

NASA Technical Reports Server (NTRS)

Smyth, Padhraic J.

1992-01-01

Report presents extension of theory of redundancy of binary prefix code of Huffman type which includes derivation of variety of bounds expressed in terms of entropy of source and size of alphabet. Recent developments yielded bounds on redundancy of Huffman code in terms of probabilities of various components in source alphabet. In practice, redundancies of optimal prefix codes often closer to 0 than to 1.
Symbolic Dynamics and Grammatical Complexity

NASA Astrophysics Data System (ADS)

Hao, Bai-Lin; Zheng, Wei-Mou

The following sections are included: * Formal Languages and Their Complexity * Formal Language * Chomsky Hierarchy of Grammatical Complexity * The L-System * Regular Language and Finite Automaton * Finite Automaton * Regular Language * Stefan Matrix as Transfer Function for Automaton * Beyond Regular Languages * Feigenbaum and Generalized Feigenbaum Limiting Sets * Even and Odd Fibonacci Sequences * Odd Maximal Primitive Prefixes and Kneading Map * Even Maximal Primitive Prefixes and Distinct Excluded Blocks * Summary of Results

A Survey of Architectural Techniques For Improving Cache Power Efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-worldmore » processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.« less
Smart Caching Based on Mobile Agent of Power WebGIS Platform

PubMed Central

Wang, Xiaohui; Wu, Kehe; Chen, Fei

2013-01-01

Power information construction is developing towards intensive, platform, distributed direction with the expansion of power grid and improvement of information technology. In order to meet the trend, power WebGIS was designed and developed. In this paper, we first discuss the architecture and functionality of power WebGIS, and then we study caching technology in detail, which contains dynamic display cache model, caching structure based on mobile agent, and cache data model. We have designed experiments of different data capacity to contrast performance between WebGIS with the proposed caching model and traditional WebGIS. The experimental results showed that, with the same hardware environment, the response time of WebGIS with and without caching model increased as data capacity growing, while the larger the data was, the higher the performance of WebGIS with proposed caching model improved. PMID:24288504
An Effective Cache Algorithm for Heterogeneous Storage Systems

PubMed Central

Li, Yong; Feng, Dan

2013-01-01

Modern storage environment is commonly composed of heterogeneous storage devices. However, traditional cache algorithms exhibit performance degradation in heterogeneous storage systems because they were not designed to work with the diverse performance characteristics. In this paper, we present a new cache algorithm called HCM for heterogeneous storage systems. The HCM algorithm partitions the cache among the disks and adopts an effective scheme to balance the work across the disks. Furthermore, it applies benefit-cost analysis to choose the best allocation of cache block to improve the performance. Conducting simulations with a variety of traces and a wide range of cache size, our experiments show that HCM significantly outperforms the existing state-of-the-art storage-aware cache algorithms. PMID:24453890
Cache Scheme Based on Pre-Fetch Operation in ICN

PubMed Central

Duan, Jie; Wang, Xiong; Xu, Shizhong; Liu, Yuanni; Xu, Chuan; Zhao, Guofeng

2016-01-01

Many recent researches focus on ICN (Information-Centric Network), in which named content becomes the first citizen instead of end-host. In ICN, Named content can be further divided into many small sized chunks, and chunk-based communication has merits over content-based communication. The universal in-network cache is one of the fundamental infrastructures for ICN. In this work, a chunk-level cache mechanism based on pre-fetch operation is proposed. The main idea is that, routers with cache store should pre-fetch and cache the next chunks which may be accessed in the near future according to received requests and cache policy for reducing the users’ perceived latency. Two pre-fetch driven modes are present to answer when and how to pre-fetch. The LRU (Least Recently Used) is employed for the cache replacement. Simulation results show that the average user perceived latency and hops can be decreased by employed this cache mechanism based on pre-fetch operation. Furthermore, we also demonstrate that the results are influenced by many factors, such as the cache capacity, Zipf parameters and pre-fetch window size. PMID:27362478
Phonology without universal grammar

PubMed Central

Archangeli, Diana; Pulleyblank, Douglas

2015-01-01

The question of identifying the properties of language that are specific human linguistic abilities, i.e., Universal Grammar, lies at the center of linguistic research. This paper argues for a largely Emergent Grammar in phonology, taking as the starting point that memory, categorization, attention to frequency, and the creation of symbolic systems are all nonlinguistic characteristics of the human mind. The articulation patterns of American English rhotics illustrate categorization and systems; the distribution of vowels in Bantu vowel harmony uses frequencies of particular sequences to argue against Universal Grammar and in favor of Emergent Grammar; prefix allomorphy in Esimbi illustrates the Emergent symbolic system integrating phonological and morphological generalizations. The Esimbi case has been treated as an example of phonological opacity in a Universal Grammar account; the Emergent analysis resolves the pattern without opacity concerns. PMID:26388791
Clomp

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gylenhaal, J.; Bronevetsky, G.

2007-05-25

CLOMP is the C version of the Livermore OpenMP benchmark deeloped to measure OpenMP overheads and other performance impacts due to threading (like NUMA memory layouts, memory contention, cache effects, etc.) in order to influence future system design. Current best-in-class implementations of OpenMP have overheads at least ten times larger than is required by many of our applications for effective use of OpenMP. This benchmark shows the significant negative performance impact of these relatively large overheads and of other thread effects. The CLOMP benchmark highly configurable to allow a variety of problem sizes and threading effects to be studied andmore » it carefully checks its results to catch many common threading errors. This benchmark is expected to be included as part of the Sequoia Benchmark suite for the Sequoia procurement.« less
The Linked Neighbour List (LNL) method for fast off-lattice Monte Carlo simulations of fluids

NASA Astrophysics Data System (ADS)

Mazzeo, M. D.; Ricci, M.; Zannoni, C.

2010-03-01

We present a new algorithm, called linked neighbour list (LNL), useful to substantially speed up off-lattice Monte Carlo simulations of fluids by avoiding the computation of the molecular energy before every attempted move. We introduce a few variants of the LNL method targeted to minimise memory footprint or augment memory coherence and cache utilisation. Additionally, we present a few algorithms which drastically accelerate neighbour finding. We test our methods on the simulation of a dense off-lattice Gay-Berne fluid subjected to periodic boundary conditions observing a speedup factor of about 2.5 with respect to a well-coded implementation based on a conventional link-cell. We provide several implementation details of the different key data structures and algorithms used in this work.
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack

In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less
List based prefetch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boyle, Peter; Christ, Norman; Gara, Alan

A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the listmore » address.« less
List based prefetch

DOEpatents

Boyle, Peter [Edinburgh, GB; Christ, Norman [Irvington, NY; Gara, Alan [Yorktown Heights, NY; Kim,; Changhoan, [San Jose, CA; Mawhinney, Robert [New York, NY; Ohmacht, Martin [Yorktown Heights, NY; Sugavanam, Krishnan [Yorktown Heights, NY

2012-08-28

A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
Analysis of DNS Cache Effects on Query Distribution

PubMed Central

2013-01-01

This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally. PMID:24396313
Analysis of DNS cache effects on query distribution.

PubMed

Wang, Zheng

2013-01-01

This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally.
Corvids Outperform Pigeons and Primates in Learning a Basic Concept.

PubMed

Wright, Anthony A; Magnotti, John F; Katz, Jeffrey S; Leonard, Kevin; Vernouillet, Alizée; Kelly, Debbie M

2017-04-01

Corvids (birds of the family Corvidae) display intelligent behavior previously ascribed only to primates, but such feats are not directly comparable across species. To make direct species comparisons, we used a same/different task in the laboratory to assess abstract-concept learning in black-billed magpies ( Pica hudsonia). Concept learning was tested with novel pictures after training. Concept learning improved with training-set size, and test accuracy eventually matched training accuracy-full concept learning-with a 128-picture set; this magpie performance was equivalent to that of Clark's nutcrackers (a species of corvid) and monkeys (rhesus, capuchin) and better than that of pigeons. Even with an initial 8-item picture set, both corvid species showed partial concept learning, outperforming both monkeys and pigeons. Similar corvid performance refutes the hypothesis that nutcrackers' prolific cache-location memory accounts for their superior concept learning, because magpies rely less on caching. That corvids with "primitive" neural architectures evolved to equal primates in full concept learning and even to outperform them on the initial 8-item picture test is a testament to the shared (convergent) survival importance of abstract-concept learning.
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning

DOE PAGES

Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron; ...

2017-11-21

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Multiprocessor system with multiple concurrent modes of execution

DOEpatents

Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

2013-12-31

A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Multiprocessor system with multiple concurrent modes of execution

DOEpatents

Ahn, Daniel; Ceze, Luis H.; Chen, Dong Chen; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

2016-11-22

A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Cache Energy Optimization Techniques For Modern Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

2013-01-01

Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In thismore » book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both newcomers and veterans in the field of cache power management. It will help graduate students, CAD tool developers and designers in understanding the need of energy efficiency in modern computing systems. Further, it will be useful for researchers in gaining insights into algorithms and techniques for micro-architectural and system-level energy optimization using dynamic cache reconfiguration. We sincerely believe that the ``food for thought'' presented in this book will inspire the readers to develop even better ideas for designing ``green'' processors of tomorrow.« less
A Refreshable, On-line Cache for HST Data Retrieval

NASA Astrophysics Data System (ADS)

Fraquelli, Dorothy A.; Ellis, Tracy A.; Ridgaway, Michael; DPAS Team

2016-01-01

We discuss upgrades to the HST Data Processing System, with an emphasis on the changes Hubble Space Telescope (HST) Archive users will experience. In particular, data are now held on-line (in a cache) removing the need to reprocess the data every time they are requested from the Archive. OTFR (on the fly reprocessing) has been replaced by a reprocessing system, which runs in the background. Data in the cache are automatically placed in the reprocessing queue when updated calibration reference files are received or when an improved calibration algorithm is installed. Data in the on-line cache are expected to be the most up to date version. These changes were phased in throughout 2015 for all active instruments.The on-line cache was populated instrument by instrument over the course of 2015. As data were placed in the cache, the flag that triggers OTFR was reset so that OTFR no longer runs on these data. "Hybrid" requests to the Archive are handled transparently, with data not yet in the cache provided via OTFR and the remaining data provided from the cache. Users do not need to make separate requests.Users of the MAST Portal will be able to download data from the cache immediately. For data not in the cache, the Portal will send the user to the standard "Retrieval Options Page," allowing the user to direct the Archive to process and deliver the data.The classic MAST Search and Retrieval interface has the same look and feel as previously. Minor changes, unrelated to the cache, have been made to the format of the Retrieval Options Page.
Index of NASA prefixed forms

NASA Technical Reports Server (NTRS)

1992-01-01

This Handbook sets forth information for the guidance of all users of the NASA Forms Management Program System. It is issued in accordance with the Federal Information Resources Management Regulation (FIRMR), Subpart 201-9.1. This Handbook sets forth an alpha-functional index of NASA-prefixed forms by title, identifying number, and unit of issue. The automated processing two-letter code (NF) has been substituted for the spelling out of the NASA form-prefix preceding the form number. To indicate a description in lieu of a distinct title, the entire reference under the Form Title/Description column has been enclosed in parentheses. A list of current forms, shown by number and page, is included for cross-reference and to preclude the ordering of those forms which have been deleted from the system. This Handbook will be updated, as appropriate. NHB 1420.2H dated July 1986, is cancelled.

Sorting permutations by prefix and suffix rearrangements.

PubMed

Lintzmayer, Carla Negri; Fertin, Guillaume; Dias, Zanoni

2017-02-01

Some interesting combinatorial problems have been motivated by genome rearrangements, which are mutations that affect large portions of a genome. When we represent genomes as permutations, the goal is to transform a given permutation into the identity permutation with the minimum number of rearrangements. When they affect segments from the beginning (respectively end) of the permutation, they are called prefix (respectively suffix) rearrangements. This paper presents results for rearrangement problems that involve prefix and suffix versions of reversals and transpositions considering unsigned and signed permutations. We give 2-approximation and ([Formula: see text])-approximation algorithms for these problems, where [Formula: see text] is a constant divided by the number of breakpoints (pairs of consecutive elements that should not be consecutive in the identity permutation) in the input permutation. We also give bounds for the diameters concerning these problems and provide ways of improving the practical results of our algorithms.
An Analysis of Instruction-Cached SIMD Computer Architecture

DTIC Science & Technology

1993-12-01

ASSEBLE SIMULATE SCHEDULE VERIFY :t og ... . .. ... V~JSRUCTONSFOR PECIIEDCOMPARE ASSEMBLEI SIMULATE Ift*U1II ~ ~ SCHEDULEIinw ;. & VERIFY...Cache to Place Blocks ................. 70 4.5.4 Step 4: Schedule Cache Blocks ............................. 70 4.5.5 Step 5: Store Cache Blocks...167 B.4 Scheduler .............................................. 167 B.4.1 Basic Block Definition
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

PubMed Central

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.

2011-01-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.

PubMed

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W

2012-06-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro

Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
A study of the relationship between the performance and dependability of a fault-tolerant computer

NASA Technical Reports Server (NTRS)

Goswami, Kumar K.

1994-01-01

This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated.
Testing episodic memory in animals: a new approach.

PubMed

Griffiths, D P; Clayton, N S

2001-08-01

Episodic memory involves the encoding and storage of memories concerned with unique personal experiences and their subsequent recall, and it has long been the subject of intensive investigation in humans. According to Tulving's classical definition, episodic memory "receives and stores information about temporally dated episodes or events and temporal-spatial relations among these events." Thus, episodic memory provides information about the 'what' and 'when' of events ('temporally dated experiences') and about 'where' they happened ('temporal-spatial relations'). The storage and subsequent recall of this episodic information was thought to be beyond the memory capabilities of nonhuman animals. Although there are many laboratory procedures for investigating memory for discrete past episodes, until recently there were no previous studies that fully satisfied the criteria of Tulving's definition: they can all be explained in much simpler terms than episodic memory. However, current studies of memory for cache sites in food-storing jays provide an ethologically valid model for testing episodic-like memory in animals, thereby bridging the gap between human and animal studies memory. There is now a pressing need to adapt these experimental tests of episodic memory for other animals. Given the potential power of transgenic and knock-out procedures for investigating the genetic and molecular bases of learning and memory in laboratory rodents, not to mention the wealth of knowledge about the neuroanatomy and neurophysiology of the rodent hippocampus (a brain area heavily implicated in episodic memory), an obvious next step is to develop a rodent model of episodic-like memory based on the food-storing bird paradigm. The development of a rodent model system could make an important contribution to our understanding of the neural, molecular, and behavioral mechanisms of mammalian episodic memory.
Effects of simulated mountain lion caching on decomposition of ungulate carcasses

USGS Publications Warehouse

Bischoff-Mattson, Z.; Mattson, D.

2009-01-01

Caching of animal remains is common among carnivorous species of all sizes, yet the effects of caching on larger prey are unstudied. We conducted a summer field experiment designed to test the effects of simulated mountain lion (Puma concolor) caching on mass loss, relative temperature, and odor dissemination of 9 prey-like carcasses. We deployed all but one of the carcasses in pairs, with one of each pair exposed and the other shaded and shallowly buried (cached). Caching substantially reduced wastage during dry and hot (drought) but not wet and cool (monsoon) periods, and it also reduced temperature and discernable odor to some degree during both seasons. These results are consistent with the hypotheses that caching serves to both reduce competition from arthropods and microbes and reduce odds of detection by larger vertebrates such as bears (Ursus spp.), wolves (Canis lupus), or other lions.
A Survey Of Techniques for Managing and Leveraging Caches in GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

2014-09-01

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges ofmore » cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.« less
Cache coherency without line exclusivity in MP systems having store-in caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pomerene, J.H.; Puzak, T.R.; Rechtschaffen, R.N.

1983-11-01

By modifying the function of the storage control unit, a multiprocessor (MP) system having store-in caches is enabled to operate with the same versatility as an MP system having store-through caches, thereby eliminating the requirement for line exclusivity and greatly reducing the occurrence of cross-interrogates.
Nature as a treasure map! Teaching geoscience with the help of earth caches?!

NASA Astrophysics Data System (ADS)

Zecha, Stefanie; Schiller, Thomas

2015-04-01

This presentation looks at how earth caches are influence the learning process in the field of geo science in non-formal education. The development of mobile technologies using Global Positioning System (GPS) data to point geographical location together with the evolving Web 2.0 supporting the creation and consumption of content, suggest a potential for collaborative informal learning linked to location. With the help of the GIS in smartphones you can go directly in nature, search for information by your smartphone, and learn something about nature. Earth caches are a very good opportunity, which are organized and supervised geocaches with special information about physical geography high lights. Interested people can inform themselves about aspects in geoscience area by earth caches. The main question of this presentation is how these caches are created in relation to learning processes. As is not possible, to analyze all existing earth caches, there was focus on Bavaria and a certain feature of earth caches. At the end the authors show limits and potentials for the use of earth caches and give some remark for the future.
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

PubMed Central

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-01-01

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

PubMed

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-03-22

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.
Value-Based Caching in Information-Centric Wireless Body Area Networks

PubMed Central

Al-Turjman, Fadi M.; Imran, Muhammad; Vasilakos, Athanasios V.

2017-01-01

We propose a resilient cache replacement approach based on a Value of sensed Information (VoI) policy. To resolve and fetch content when the origin is not available due to isolated in-network nodes (fragmentation) and harsh operational conditions, we exploit a content caching approach. Our approach depends on four functional parameters in sensory Wireless Body Area Networks (WBANs). These four parameters are: age of data based on periodic request, popularity of on-demand requests, communication interference cost, and the duration for which the sensor node is required to operate in active mode to capture the sensed readings. These parameters are considered together to assign a value to the cached data to retain the most valuable information in the cache for prolonged time periods. The higher the value, the longer the duration for which the data will be retained in the cache. This caching strategy provides significant availability for most valuable and difficult to retrieve data in the WBANs. Extensive simulations are performed to compare the proposed scheme against other significant caching schemes in the literature while varying critical aspects in WBANs (e.g., data popularity, cache size, publisher load, connectivity-degree, and severe probabilities of node failures). These simulation results indicate that the proposed VoI-based approach is a valid tool for the retrieval of cached content in disruptive and challenging scenarios, such as the one experienced in WBANs, since it allows the retrieval of content for a long period even while experiencing severe in-network node failures. PMID:28106817
Identification of VaD and AD prodromes: the Cache County Study.

PubMed

Hayden, K M; Warren, L H; Pieper, C F; Østbye, T; Tschanz, J T; Norton, M C; Breitner, J C S; Welsh-Bohmer, K A

2005-07-01

It is unclear whether vascular dementia (VaD) has a cognitive prodrome, akin to the mild cognitive impairment (MCI) prodrome to Alzheimer's dementia (AD). To evaluate whether VaD has a cognitive prodrome, and if it can be differentiated from prodromal AD, we examined neuropsychological test performance of participants in a nested case-control study within a population-based cohort aged 65 or older. Participants (n = 485) were identified from the Cache County Study, a large population-based study of aging and dementia. After an average of 3 years of follow-up, a total of 62 incident dementia cases were identified (14 VaD, 48 AD). We identified a number of neuropsychological tests (executive and memory) that discriminated between diagnosed VaD and AD cases. Multivariate analyses sought to differentiate between these same groups 3 years before clinical diagnosis. The Consortium to Establish a Registry for Alzheimer's Disease Word List Recognition Test correct recognition of foils (mean difference, 1.25; 95% confidence interval [CI], 0.42 to 2.07; p < 0.01), Logical Memory I (mean difference, 7.16; 95% CI, 0.78 to 13.55, p < 0.05), Logical Memory II delayed recall (mean difference, 8.67; 95% CI, 1.59 to 15.74, p < 0.05), and percent savings (mean difference, 51.07; 95% CI, 32.58 to 69.56, p < 0.0001) differentiated VaD from AD cases after adjustment for age, sex, education, and dementia severity. Three years before dementia diagnosis, word list recognition ("no" responses mean difference, 1.40; 95% CI, 0.64 to 2.17; p < 0.001, and "yes" responses mean difference, -1.14; 95% CI, -2.14 to -0.13; p < 0.03) discriminated between prodromal VaD and AD. These results suggest that VaD has a prodromal syndrome, the cognitive features of which are distinguishable from the cognitive prodrome of AD.
On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

NASA Astrophysics Data System (ADS)

Krishnappa, Dilip Kumar; Khemmarat, Samamon; Gao, Lixin; Zink, Michael

Lately researchers are looking at ways to reduce the delay on video playback through mechanisms like prefetching and caching for Video-on-Demand (VoD) services. The usage of prefetching and caching also has the potential to reduce the amount of network bandwidth usage, as most popular requests are served from a local cache rather than the server containing the original content. In this paper, we investigate the advantages of having such a prefetching and caching scheme for a free hosting service of professionally created video (movies and TV shows) named "hulu". We look into the advantages of using a prefetching scheme where the most popular videos of the week, as provided by the hulu website, are prefetched and compare this approach with a conventional LRU caching scheme with limited storage space and a combined scheme of prefetching and caching. Results from our measurement and analysis shows that employing a basic caching scheme at the proxy yields a hit ratio of up to 77.69%, but requires storage of about 236GB. Further analysis shows that a prefetching scheme where the top-100 popular videos of the week are downloaded to the proxy yields a hit ratio of 44% with a storage requirement of 10GB. A LRU caching scheme with a storage limitation of 20GB can achieve a hit ratio of 55% but downloads 4713 videos to achieve such high hit ratio compared to 100 videos in prefetching scheme, whereas a scheme with both prefetching and caching with the same storage yields a hit ratio of 59% with download requirement of 4439 videos. We find that employing a scheme of prefetching along with caching with trade-off on the storage will yield a better hit ratio and bandwidth saving than individual caching or prefetching schemes.
CoNNeCT Baseband Processor Module

NASA Technical Reports Server (NTRS)

Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

2011-01-01

A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
A brief metacognition questionnaire for the elderly: comparison with cognitive performance and informant ratings the Cache County Study.

PubMed

Buckley, Trevor; Norton, Maria C; Deberard, M Scott; Welsh-Bohmer, Kathleen A; Tschanz, JoAnn T

2010-07-01

To examine the utility of a brief, metacognition questionnaire by examining its association with objective cognitive testing and informant ratings. We hypothesized that the association between self-ratings of change and both outcomes would be greater among individuals without dementia than among those with dementia. Participants were 535 persons without dementia and 152 with dementia from the Cache County Memory Study who had completed a metacognition questionnaire, two administrations of the Modified Mini-Mental State Exam (3 MS) and who had data on the Informant Questionnaire of Cognitive Decline in the Elderly (IQCODE). Cronbach's alpha was calculated as a measure of internal consistency of the metacognition questionnaire. Multiple regression was used to examine the relationship between metacognition and 3 MS change. Logistic regression was used to examine the relationship between metacognition and IQCODE ratings (no change vs. worse). Cronbach's alpha was 0.75. Among individuals without dementia, metacognition significantly predicted 3 MS change (p = .027) and IQCODE ratings (OR = 4.0, 95% CI = 1.2-13.8, p = .029), suggesting consistency among measures. For those with dementia, there was a weak, inverse relationship between 3 MS change and metacognition (r = -0.16, p = .056). IQCODE ratings were not significantly associated with metacognition (p = .729). Degree of dementia severity did not modify the relationship between metacognition and either outcome (p > .05). We demonstrated adequate internal consistency and evidence for validity of a brief metacognition questionnaire. The questionnaire may provide a useful adjunct to memory and functional assessments for assessing anosognosia in elderly populations. (c) 2009 John Wiley & Sons, Ltd.
Structural Analysis via Generalized Interactive Graphics - STAGING. Volume III. System Manual.

DTIC Science & Technology

1979-09-01

DISTRIBUTION UNLIMITED 17 DISTRIBUTION ST ATEMENT (of the abnsrct entered in Block 20. it different from, Report) IS SJPPLEMENTARY NOTES I9 KEY WORDS ILCI-lue on...prefixMENUDRIVER and the menu data base itself is cataloged as prefixMENU. The maintanance of the STAGING Material Property Data Base (MPDB...Property Data Base System, and conversion routines as describe,: -n Section 1.2 through 1.6. If any difficulties arise due to differences in operatire

Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
44 CFR 208.24 - Purchase and maintenance of items not listed on Equipment Cache List.

Code of Federal Regulations, 2011 CFR

2011-10-01

... items not listed on Equipment Cache List. 208.24 Section 208.24 Emergency Management and Assistance... of items not listed on Equipment Cache List. (a) Requests for purchase or maintenance of equipment and supplies not appearing on the Equipment Cache List, or that exceed the number specified in the...
Memory Benchmarks for SMP-Based High Performance Parallel Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, A B; de Supinski, B; Mueller, F

2001-11-20

As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Forest resources of the Wasatch-Cache National Forest

Treesearch

Renee A. O' Brien; Jesse Pope

1997-01-01

The 1,215,219 acres in the Wasatch-Cache National Forest encompass 863,906 acres of forest land, made up of 90 percent (776,239 acres) "timberland" and 10 percent (87,667 acres) "woodland." The other 351,313 acres of the Wasatch-Cache are nonforest or water (fig. 1). This report discusses forest land only. In the Wasatch-Cache, 26 percent...
A Morphometric Assessment of the Intended Function of Cached Clovis Points

PubMed Central

Buchanan, Briggs; Kilby, J. David; Huckell, Bruce B.; O'Brien, Michael J.; Collard, Mark

2012-01-01

A number of functions have been proposed for cached Clovis points. The least complicated hypothesis is that they were intended to arm hunting weapons. It has also been argued that they were produced for use in rituals or in connection with costly signaling displays. Lastly, it has been suggested that some cached Clovis points may have been used as saws. Here we report a study in which we morphometrically compared Clovis points from caches with Clovis points recovered from kill and camp sites to test two predictions of the hypothesis that cached Clovis points were intended to arm hunting weapons: 1) cached points should be the same shape as, but generally larger than, points from kill/camp sites, and 2) cached points and points from kill/camp sites should follow the same allometric trajectory. The results of the analyses are consistent with both predictions and therefore support the hypothesis. A follow-up review of the fit between the results of the analyses and the predictions of the other hypotheses indicates that the analyses support only the hunting equipment hypothesis. We conclude from this that cached Clovis points were likely produced with the intention of using them to arm hunting weapons. PMID:22348012
WATCHMAN: A Data Warehouse Intelligent Cache Manager

NASA Technical Reports Server (NTRS)

Scheuermann, Peter; Shim, Junho; Vingralek, Radek

1996-01-01

Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.
Novel dynamic caching for hierarchically distributed video-on-demand systems

NASA Astrophysics Data System (ADS)

Ogo, Kenta; Matsuda, Chikashi; Nishimura, Kazutoshi

1998-02-01

It is difficult to simultaneously serve the millions of video streams that will be needed in the age of 'Mega-Media' networks by using only one high-performance server. To distribute the service load, caching servers should be location near users. However, in previously proposed caching mechanisms, the grade of service depends on whether the data is already cached at a caching server. To make the caching servers transparent to the users, the ability to randomly access the large volume of data stored in the central server should be supported, and the operational functions of the provided service should not be narrowly restricted. We propose a mechanism for constructing a video-stream-caching server that is transparent to the users and that will always support all special playback functions for all available programs to all the contents with a latency of only 1 or 2 seconds. This mechanism uses Variable-sized-quantum-segment- caching technique derived from an analysis of the historical usage log data generated by a line-on-demand-type service experiment and based on the basic techniques used by a time- slot-based multiple-stream video-on-demand server.
Department of Defense Dictionary Of Military and Associated Terms

DTIC Science & Technology

2010-12-31

designated United States Naval Ships and use the prefix “USNS” with the ship name and the letter “T” as a prefix to the ship classification (e.g., T- AKR ...T- AKR ). See also Military Sealift Command; United States Naval Ship. (JP 3-02.2) gradient — The rate of inclination to horizontal expressed as a ...Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a
The Effect of Knowledge Management Systems on Organizational Performance. Do Soldier and Unit Counterinsurgency Knowledge and Performance Improve Following ’Push’ or ’Adaptive-Push’ Training

DTIC Science & Technology

2009-01-01

theory ...................24 Table 3.1 Five techniques of pretest - posttest analysis (from Bonate, 2000) ....................34 Table 3.2 Main Effects...Davenport, 1998). In a pretest / posttest design study by Schlomer, Anderson, and Shaw (1997) no statistically significant differences in KR outcomes were... Pretest / posttest design overview ____________ 54 Question numbers ranged from 1 to 55. The prefix a represents a pretest question and the prefix b
FloCon 2008 Proceedings

DTIC Science & Technology

2008-01-01

anomalous traffic of the node ÷ total anomalous traffic – Make parent nodes by merging child node information. prefix/length coverage/collateral...A T A U P L O A D F L O W S E N S O R 5. DATA DOWNLOAD FLOW SENSOR 1. RECON SNORT: KICKASS_PORN DRAGON: PORN HARDCORE SOURCEDEST SOURCE SOURCE...traffic • Make parent nodes by merging child node information. prefix/length coverage/collateral 0.0.0.0/0 100/100 depth=0 non divided 0.0.0.0/1 50/30
Multicast for savings in cache-based video distribution

NASA Astrophysics Data System (ADS)

Griwodz, Carsten; Zink, Michael; Liepert, Michael; On, Giwon; Steinmetz, Ralf

1999-12-01

Internet video-on-demand (VoD) today streams videos directly from server to clients, because re-distribution is not established yet. Intranet solutions exist but are typically managed centrally. Caching may overcome these management needs, however existing web caching strategies are not applicable because they work in different conditions. We propose movie distribution by means of caching, and study the feasibility from the service providers' point of view. We introduce the combination of our reliable multicast protocol LCRTP for caching hierarchies combined with our enhancement to the patching technique for bandwidth friendly True VoD, not depending on network resource guarantees.
HPC Profiling with the Sun Studio™ Performance Tools

NASA Astrophysics Data System (ADS)

Itzkowitz, Marty; Maruyama, Yukon

In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.
Study of cache performance in distributed environment for data processing

NASA Astrophysics Data System (ADS)

Makatun, Dzmitry; Lauret, Jérôme; Šumbera, Michal

2014-06-01

Processing data in distributed environment has found its application in many fields of science (Nuclear and Particle Physics (NPP), astronomy, biology to name only those). Efficiently transferring data between sites is an essential part of such processing. The implementation of caching strategies in data transfer software and tools, such as the Reasoner for Intelligent File Transfer (RIFT) being developed in the STAR collaboration, can significantly decrease network load and waiting time by reusing the knowledge of data provenance as well as data placed in transfer cache to further expand on the availability of sources for files and data-sets. Though, a great variety of caching algorithms is known, a study is needed to evaluate which one can deliver the best performance in data access considering the realistic demand patterns. Records of access to the complete data-sets of NPP experiments were analyzed and used as input for computer simulations. Series of simulations were done in order to estimate the possible cache hits and cache hits per byte for known caching algorithms. The simulations were done for cache of different sizes within interval 0.001 - 90% of complete data-set and low-watermark within 0-90%. Records of data access were taken from several experiments and within different time intervals in order to validate the results. In this paper, we will discuss the different data caching strategies from canonical algorithms to hybrid cache strategies, present the results of our simulations for the diverse algorithms, debate and identify the choice for the best algorithm in the context of Physics Data analysis in NPP. While the results of those studies have been implemented in RIFT, they can also be used when setting up cache in any other computational work-flow (Cloud processing for example) or managing data storages with partial replicas of the entire data-set.
Prolonging the arctic pulse: long-term exploitation of cached eggs by arctic foxes when lemmings are scarce.

PubMed

Samelius, Gustaf; Alisauskas, Ray T; Hobson, Keith A; Larivière, Serge

2007-09-01

1. Many ecosystems are characterized by pulses of dramatically higher than normal levels of foods (pulsed resources) to which animals often respond by caching foods for future use. However, the extent to which animals use cached foods and how this varies in relation to fluctuations in other foods is poorly understood in most animals. 2. Arctic foxes Alopex lagopus (L.) cache thousands of eggs annually at large goose colonies where eggs are often superabundant during the nesting period by geese. We estimated the contribution of cached eggs to arctic fox diets in spring and autumn, when geese were not present in the study area, by comparing stable isotope ratios (delta(13)C and delta(15)N) of fox tissues with those of their foods using a multisource mixing model in Program IsoSource. 3. The contribution of cached eggs to arctic fox diets was inversely related to collared lemming Dicrostonyx groenlandicus (Traill) abundance; the contribution of cached eggs to overall fox diets increased from < 28% in years when collared lemmings were abundant to 30-74% in years when collared lemmings were scarce. 4. Further, arctic foxes used cached eggs well into the following spring (almost 1 year after eggs were acquired) - a pattern that differs from that of carnivores generally storing foods for only a few days before consumption. 5. This study showed that long-term use of eggs that were cached when geese were superabundant at the colony in summer varied with fluctuations in collared lemming abundance (a key component in arctic fox diets throughout most of their range) and suggests that cached eggs functioned as a buffer when collared lemmings were scarce.
A GPU-Accelerated Approach for Feature Tracking in Time-Varying Imagery Datasets.

PubMed

Peng, Chao; Sahani, Sandip; Rushing, John

2017-10-01

We propose a novel parallel connected component labeling (CCL) algorithm along with efficient out-of-core data management to detect and track feature regions of large time-varying imagery datasets. Our approach contributes to the big data field with parallel algorithms tailored for GPU architectures. We remove the data dependency between frames and achieve pixel-level parallelism. Due to the large size, the entire dataset cannot fit into cached memory. Frames have to be streamed through the memory hierarchy (disk to CPU main memory and then to GPU memory), partitioned, and processed as batches, where each batch is small enough to fit into the GPU. To reconnect the feature regions that are separated due to data partitioning, we present a novel batch merging algorithm to extract the region connection information across multiple batches in a parallel fashion. The information is organized in a memory-efficient structure and supports fast indexing on the GPU. Our experiment uses a commodity workstation equipped with a single GPU. The results show that our approach can efficiently process a weather dataset composed of terabytes of time-varying radar images. The advantages of our approach are demonstrated by comparing to the performance of an efficient CPU cluster implementation which is being used by the weather scientists.
Using ecology to guide the study of cognitive and neural mechanisms of different aspects of spatial memory in food-hoarding animals.

PubMed

Smulders, Tom V; Gould, Kristy L; Leaver, Lisa A

2010-03-27

Understanding the survival value of behaviour does not tell us how the mechanisms that control this behaviour work. Nevertheless, understanding survival value can guide the study of these mechanisms. In this paper, we apply this principle to understanding the cognitive mechanisms that support cache retrieval in scatter-hoarding animals. We believe it is too simplistic to predict that all scatter-hoarding animals will outperform non-hoarding animals on all tests of spatial memory. Instead, we argue that we should look at the detailed ecology and natural history of each species. This understanding of natural history then allows us to make predictions about which aspects of spatial memory should be better in which species. We use the natural hoarding behaviour of the three best-studied groups of scatter-hoarding animals to make predictions about three aspects of their spatial memory: duration, capacity and spatial resolution, and we test these predictions against the existing literature. Having laid out how ecology and natural history can be used to predict detailed cognitive abilities, we then suggest using this approach to guide the study of the neural basis of these abilities. We believe that this complementary approach will reveal aspects of memory processing that would otherwise be difficult to discover.
Effective Padding of Multi-Dimensional Arrays to Avoid Cache Conflict Misses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Changwan; Bao, Wenlei; Cohen, Albert

Caches are used to significantly improve performance. Even with high degrees of set-associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity, causing conflict misses and lowered performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays for a set associative cache for arbitrary tile sizes, In addition, we develop the first solution tomore » padding for nested tiles and multi-level caches. The techniques are in implemented in PAdvisor tool. Experimental results with multiple benchmarks demonstrate significant performance improvement from use of PAdvisor for padding.« less
Advantages of masting in European beech: timing of granivore satiation and benefits of seed caching support the predator dispersal hypothesis.

PubMed

Zwolak, Rafał; Bogdziewicz, Michał; Wróbel, Aleksandra; Crone, Elizabeth E

2016-03-01

The predator satiation and predator dispersal hypotheses provide alternative explanations for masting. Both assume satiation of seed-eating vertebrates. They differ in whether satiation occurs before or after seed removal and caching by granivores (predator satiation and predator dispersal, respectively). This difference is largely unrecognized, but it is demographically important because cached seeds are dispersed and often have a microsite advantage over nondispersed seeds. We conducted rodent exclosure experiments in two mast and two nonmast years to test predictions of the predator dispersal hypothesis in our study system of yellow-necked mice (Apodemus flavicollis) and European beech (Fagus sylvatica). Specifically, we tested whether the fraction of seeds removed from the forest floor is similar during mast and nonmast years (i.e., lack of satiation before seed caching), whether masting decreases the removal of cached seeds (i.e., satiation after seed storage), and whether seed caching increases the probability of seedling emergence. We found that masting did not result in satiation at the seed removal stage. However, masting decreased the removal of cached seeds, and seed caching dramatically increased the probability of seedling emergence relative to noncached seeds. European beech thus benefits from masting through the satiation of scatterhoarders that occurs only after seeds are removed and cached. Although these findings do not exclude other evolutionary advantages of beech masting, they indicate that fitness benefits of masting extend beyond the most commonly considered advantages of predator satiation and increased pollination efficiency.
Different Scalable Implementations of Collision and Streaming for Optimal Computational Performance of Lattice Boltzmann Simulations

NASA Astrophysics Data System (ADS)

Geneva, Nicholas; Wang, Lian-Ping

2015-11-01

In the past 25 years, the mesoscopic lattice Boltzmann method (LBM) has become an increasingly popular approach to simulate incompressible flows including turbulent flows. While LBM solves more solution variables compared to the conventional CFD approach based on the macroscopic Navier-Stokes equation, it also offers opportunities for more efficient parallelization. In this talk we will describe several different algorithms that have been developed over the past 10 plus years, which can be used to represent the two core steps of LBM, collision and streaming, more effectively than standard approaches. The application of these algorithms spans LBM simulations ranging from basic channel to particle laden flows. We will cover the essential detail on the implementation of each algorithm for simple 2D flows, to the challenges one faces when using a given algorithm for more complex simulations. The key is to explore the best use of data structure and cache memory. Two basic data structures will be discussed and the importance of effective data storage to maximize a CPU's cache will be addressed. The performance of a 3D turbulent channel flow simulation using these different algorithms and data structures will be compared along with important hardware related issues.
An adverse event screening tool based on routinely collected hospital-acquired diagnoses.

PubMed

Brand, Caroline; Tropea, Joanne; Gorelik, Alexandra; Jolley, Damien; Scott, Ian; Sundararajan, Vijaya

2012-06-01

The aim was to develop an electronic adverse event (AE) screening tool applicable to acute care hospital episodes for patients admitted with chronic heart failure (CHF) and pneumonia. Consensus building using a modified Delphi method and descriptive analysis of hospital discharge data. Consultant physicians in general medicine (n = 38). In-hospital acquired (C-prefix) diagnoses associated with CHF and pneumonia admissions to 230 hospitals in Victoria, Australia, were extracted from the Victorian Admitted Episodes Data Set between July 2004 and June 2007. A 9-point rating scale was used to prioritize diagnoses acquired during hospitalization (routinely coded as a 'C-prefix' diagnosis to distinguish from diagnoses present on admission) for inclusion within an AE screening tool. Diagnoses rated a group median score between 7 and 9 by the physician panel were included. Selection of C-prefix diagnoses with a group median rating of 7-9 in a screening tool, and the level of physician agreement, as assessed using the Interpercentile Range Adjusted for Symmetry. Of 697 initial C-prefix diagnoses, there were high levels of agreement to include 113 (16.2%) in the AE screening tool. Using these selected diagnoses, a potential AE was flagged in 14% of all admissions for the two index conditions. Intra-rater reliability for each clinician ranged from kappa 0.482 to 1.0. A high level of physician agreement was obtained in selecting in-hospital diagnoses for inclusion in an AE screening tool based on routinely collected data. These results support further tool validation.

Parallel design of JPEG-LS encoder on graphics processing units

NASA Astrophysics Data System (ADS)

Duan, Hao; Fang, Yong; Huang, Bormin

2012-01-01

With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
Considering User's Access Pattern in Multimedia File Systems

NASA Astrophysics Data System (ADS)

Cho, KyoungWoon; Ryu, YeonSeung; Won, Youjip; Koh, Kern

2002-12-01

Legacy buffer cache management schemes for multimedia server are grounded at the assumption that the application sequentially accesses the multimedia file. However, user access pattern may not be sequential in some circumstances, for example, in distance learning application, where the user may exploit the VCR-like function(rewind and play) of the system and accesses the particular segments of video repeatedly in the middle of sequential playback. Such a looping reference can cause a significant performance degradation of interval-based caching algorithms. And thus an appropriate buffer cache management scheme is required in order to deliver desirable performance even under the workload that exhibits looping reference behavior. We propose Adaptive Buffer cache Management(ABM) scheme which intelligently adapts to the file access characteristics. For each opened file, ABM applies either the LRU replacement or the interval-based caching depending on the Looping Reference Indicator, which indicates that how strong temporally localized access pattern is. According to our experiment, ABM exhibits better buffer cache miss ratio than interval-based caching or LRU, especially when the workload exhibits not only sequential but also looping reference property.
Design issues and caching strategies for CD-ROM-based multimedia storage

NASA Astrophysics Data System (ADS)

Shastri, Vijnan; Rajaraman, V.; Jamadagni, H. S.; Venkat-Rangan, P.; Sampath-Kumar, Srihari

1996-03-01

CD-ROMs have proliferated as a distribution media for desktop machines for a large variety of multimedia applications (targeted for a single-user environment) like encyclopedias, magazines and games. With CD-ROM capacities up to 3 GB being available in the near future, they will form an integral part of Video on Demand (VoD) servers to store full-length movies and multimedia. In the first section of this paper we look at issues related to the single- user desktop environment. Since these multimedia applications are highly interactive in nature, we take a pragmatic approach, and have made a detailed study of the multimedia application behavior in terms of the I/O request patterns generated to the CD-ROM subsystem by tracing these patterns. We discuss prefetch buffer design and seek time characteristics in the context of the analysis of these traces. We also propose an adaptive main-memory hosted cache that receives caching hints from the application to reduce the latency when the user moves from one node of the hyper graph to another. In the second section we look at the use of CD-ROM in a VoD server and discuss the problem of scheduling multiple request streams and buffer management in this scenario. We adapt the C-SCAN (Circular SCAN) algorithm to suit the CD-ROM drive characteristics and prove that it is optimal in terms of buffer size management. We provide computationally inexpensive relations by which this algorithm can be implemented. We then propose an admission control algorithm which admits new request streams without disrupting the continuity of playback of the previous request streams. The algorithm also supports operations such as fast forward and replay. Finally, we discuss the problem of optimal placement of MPEG streams on CD-ROMs in the third section.
Ten dimensions of health and their relationships with overall self-reported health and survival in a predominately religiously active elderly population: the cache county memory study.

PubMed

Østbye, Truls; Krause, Katrina M; Norton, Maria C; Tschanz, JoAnn; Sanders, Linda; Hayden, Kathleen; Pieper, Carl; Welsh-Bohmer, Kathleen A

2006-02-01

To document the extent of healthy aging along 10 different dimensions in a population known for its longevity. A cohort study with baseline measures of overall self-reported health and health along 10 specific dimensions; analyses investigated the 10 dimensions as predictors of self-reported health and 10-year mortality. Cache County, Utah, which is among the areas with the highest conditional life expectancy at age 65 in the United States. Inhabitants of Cache County aged 65 and older (January 1, 1995). Self-reported overall health and 10 specific dimensions of healthy aging: independent living, vision, hearing, activities of daily living, instrumental activities of daily living, absence of physical illness, cognition, healthy mood, social support and participation, and religious participation and spirituality. This elderly population was healthy overall. With few exceptions, 80% to 90% of persons aged 65 to 75 were healthy according to each measure used. Prevalence of excellent and good self-reported health decreased with age, to approximately 60% in those aged 85 and older. Even in the oldest old, the majority of respondents were independent in activities of daily living. Although vision, hearing, and mood were significant predictors of overall self-reported health in the final models, age, sex, and cognition were significant only in the final survival models. This population has a high prevalence of most factors representing healthy aging. The predictors of overall self-reported health are distinct from the predictors of survival in this age group and, being potentially modifiable, are amenable to clinical and public health efforts.
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

DOE PAGES

Mittal, Sparsh; Vetter, Jeffrey S.

2015-04-24

Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
Generating Performance Models for Irregular Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Friese, Ryan D.; Tallent, Nathan R.; Vishnu, Abhinav

2017-05-30

Many applications have irregular behavior --- non-uniform input data, input-dependent solvers, irregular memory accesses, unbiased branches --- that cannot be captured using today's automated performance modeling techniques. We describe new hierarchical critical path analyses for the \\Palm model generation tool. To create a model's structure, we capture tasks along representative MPI critical paths. We create a histogram of critical tasks with parameterized task arguments and instance counts. To model each task, we identify hot instruction-level sub-paths and model each sub-path based on data flow, instruction scheduling, and data locality. We describe application models that generate accurate predictions for strong scalingmore » when varying CPU speed, cache speed, memory speed, and architecture. We present results for the Sweep3D neutron transport benchmark; Page Rank on multiple graphs; Support Vector Machine with pruning; and PFLOTRAN's reactive flow/transport solver with domain-induced load imbalance.« less
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh; Vetter, Jeffrey S.

Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
Nutritional deficits during early development affect hippocampal structure and spatial memory later in life.

PubMed

Pravosudov, Vladimir V; Lavenex, Pierre; Omanska, Alicja

2005-10-01

Development rates vary among individuals, often as a result of direct competition for food. Survival of young might depend on their learning abilities, but it remains unclear whether learning abilities are affected by nutrition during development. The authors demonstrated that compared with controls, 1-year-old Western scrub jays (Aphelocoma californica) that experienced nutritional deficits during early posthatching development had smaller hippocampi with fewer neurons and performed worse in a cache recovery task and in a spatial version of an associative learning task. In contrast, performance of nutritionally deprived birds was similar to that of controls in 2 color versions of an associative learning task. These findings suggest that nutritional deficits during early development have long-term consequences for hippocampal structure and spatial memory, which, in turn, are likely to have a strong impact on animals' future fitness.
Hardware/software codesign for embedded RISC core

NASA Astrophysics Data System (ADS)

Liu, Peng

2001-12-01

This paper describes hardware/software codesign method of the extendible embedded RISC core VIRGO, which based on MIPS-I instruction set architecture. VIRGO is described by Verilog hardware description language that has five-stage pipeline with shared 32-bit cache/memory interface, and it is controlled by distributed control scheme. Every pipeline stage has one small controller, which controls the pipeline stage status and cooperation among the pipeline phase. Since description use high level language and structure is distributed, VIRGO core has highly extension that can meet the requirements of application. We take look at the high-definition television MPEG2 MPHL decoder chip, constructed the hardware/software codesign virtual prototyping machine that can research on VIRGO core instruction set architecture, and system on chip memory size requirements, and system on chip software, etc. We also can evaluate the system on chip design and RISC instruction set based on the virtual prototyping machine platform.
A novel cache mechanism

NASA Technical Reports Server (NTRS)

Gunawardena, J. A.

1992-01-01

This cache mechanism is transparent but does not contain associative circuits. It does not rely on locality of reference of instructions or data. No redundant instructions or data are encached. Items in the cache are accessed without address arithmetic. A cache miss is detected by the simplest test; compare two bits. These features would result in faster access, higher hit rate, reduced chip area, and less power dissipation in comparison with associative systems of similar size.
Load Balancing in Distributed Web Caching: A Novel Clustering Approach

NASA Astrophysics Data System (ADS)

Tiwari, R.; Kumar, K.; Khan, G.

2010-11-01

The World Wide Web suffers from scaling and reliability problems due to overloaded and congested proxy servers. Caching at local proxy servers helps, but cannot satisfy more than a third to half of requests; more requests are still sent to original remote origin servers. In this paper we have developed an algorithm for Distributed Web Cache, which incorporates cooperation among proxy servers of one cluster. This algorithm uses Distributed Web Cache concepts along with static hierarchies with geographical based clusters of level one proxy server with dynamic mechanism of proxy server during the congestion of one cluster. Congestion and scalability problems are being dealt by clustering concept used in our approach. This results in higher hit ratio of caches, with lesser latency delay for requested pages. This algorithm also guarantees data consistency between the original server objects and the proxy cache objects.
Version pressure feedback mechanisms for speculative versioning caches

DOEpatents

Eichenberger, Alexandre E.; Gara, Alan; O& #x27; Brien, Kathryn M.; Ohmacht, Martin; Zhuang, Xiaotong

2013-03-12

Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE PAGES

Wang, Bei; Ethier, Stephane; Tang, William; ...

2017-06-29

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Bei; Ethier, Stephane; Tang, William

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Mitochondrial genomic variation associated with higher mitochondrial copy number: the Cache County Study on Memory Health and Aging.

PubMed

Ridge, Perry G; Maxwell, Taylor J; Foutz, Spencer J; Bailey, Matthew H; Corcoran, Christopher D; Tschanz, JoAnn T; Norton, Maria C; Munger, Ronald G; O'Brien, Elizabeth; Kerber, Richard A; Cawthon, Richard M; Kauwe, John S K

2014-01-01

The mitochondria are essential organelles and are the location of cellular respiration, which is responsible for the majority of ATP production. Each cell contains multiple mitochondria, and each mitochondrion contains multiple copies of its own circular genome. The ratio of mitochondrial genomes to nuclear genomes is referred to as mitochondrial copy number. Decreases in mitochondrial copy number are known to occur in many tissues as people age, and in certain diseases. The regulation of mitochondrial copy number by nuclear genes has been studied extensively. While mitochondrial variation has been associated with longevity and some of the diseases known to have reduced mitochondrial copy number, the role that the mitochondrial genome itself has in regulating mitochondrial copy number remains poorly understood. We analyzed the complete mitochondrial genomes from 1007 individuals randomly selected from the Cache County Study on Memory Health and Aging utilizing the inferred evolutionary history of the mitochondrial haplotypes present in our dataset to identify sequence variation and mitochondrial haplotypes associated with changes in mitochondrial copy number. Three variants belonging to mitochondrial haplogroups U5A1 and T2 were significantly associated with higher mitochondrial copy number in our dataset. We identified three variants associated with higher mitochondrial copy number and suggest several hypotheses for how these variants influence mitochondrial copy number by interacting with known regulators of mitochondrial copy number. Our results are the first to report sequence variation in the mitochondrial genome that causes changes in mitochondrial copy number. The identification of these variants that increase mtDNA copy number has important implications in understanding the pathological processes that underlie these phenotypes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyakh, Dmitry I.

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
Conditions Database for the Belle II Experiment

NASA Astrophysics Data System (ADS)

Wood, L.; Elsethagen, T.; Schram, M.; Stephan, E.

2017-10-01

The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II conditions database was designed with a straightforward goal: make it as easily maintainable as possible. To this end, HEP-specific software tools were avoided as much as possible and industry standard tools used instead. HTTP REST services were selected as the application interface, which provide a high-level interface to users through the use of standard libraries such as curl. The application interface itself is written in Java and runs in an embedded Payara-Micro Java EE application server. Scalability at the application interface is provided by use of Hazelcast, an open source In-Memory Data Grid (IMDG) providing distributed in-memory computing and supporting the creation and clustering of new application interface instances as demand increases. The IMDG provides fast and efficient access to conditions data via in-memory caching.
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

NASA Astrophysics Data System (ADS)

Lyakh, Dmitry I.

2015-04-01

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typically appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the naïve scattering algorithm (no memory access optimization). The tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).

Applications Performance on NAS Intel Paragon XP/S - 15#

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

1994-01-01

The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.
The Development of Caching and Object Permanence in Western Scrub-Jays (Aphelocoma californica): Which Emerges First?

PubMed Central

Salwiczek, Lucie H.; Schlinger, Barney; Emery, Nathan J.; Clayton, Nicola S.

2010-01-01

Recent studies on the food-caching behavior of corvids have revealed complex physical and social skills, yet little is known about the ontogeny of food caching in relation to the development of cognitive capacities. Piagetian object permanence is the understanding that objects continue to exist even when they are no longer visible. Here, the authors focus on Piagetian Stages 3 and 4, because they are hallmarks in the cognitive development of both young children and animals. Our aim is to determine in a food-caching corvid, the Western scrub-jay, whether (1) Piagetian Stage 4 competence and tentative caching (i.e., hiding an item invisibly and retrieving it without delay), emerge concomitantly or consecutively; (2) whether experiencing the reappearance of hidden objects enhances the timing of the appearance of object permanence; and (3) discuss how the development of object permanence is related to behavioral development and sensorimotor intelligence. Our findings suggest that object permanence Stage 4 emerges before tentative caching, and independent of environmental influences, but that once the birds have developed simple object-permanence, then social learning might advance the interval after which tentative caching commences. PMID:19685971
The development of caching and object permanence in Western scrub-jays (Aphelocoma californica): which emerges first?

PubMed

Salwiczek, Lucie H; Emery, Nathan J; Schlinger, Barney; Clayton, Nicola S

2009-08-01

Recent studies on the food-caching behavior of corvids have revealed complex physical and social skills, yet little is known about the ontogeny of food caching in relation to the development of cognitive capacities. Piagetian object permanence is the understanding that objects continue to exist even when they are no longer visible. Here, the authors focus on Piagetian Stages 3 and 4, because they are hallmarks in the cognitive development of both young children and animals. Our aim is to determine in a food-caching corvid, the Western scrub-jay, whether (1) Piagetian Stage 4 competence and tentative caching (i.e., hiding an item invisibly and retrieving it without delay), emerge concomitantly or consecutively; (2) whether experiencing the reappearance of hidden objects enhances the timing of the appearance of object permanence; and (3) discuss how the development of object permanence is related to behavioral development and sensorimotor intelligence. Our findings suggest that object permanence Stage 4 emerges before tentative caching, and independent of environmental influences, but that once the birds have developed simple object-permanence, then social learning might advance the interval after which tentative caching commences. Copyright 2009 APA, all rights reserved.
76 FR 26981 - Proposed Flood Elevation Determinations

Federal Register 2010, 2011, 2012, 2013, 2014

2011-05-10

... table provided here represents the flooding sources, location of referenced elevations, effective and.... Specifically, it addresses the following flooding sources: Cache Creek, Cache Creek Left Bank Overflow, and... ``Unincorporated Areas of Yolo County, California'' addressed the flooding source Cache Creek Settling Basin. That...
Xrootd in dCache - design and experiences

NASA Astrophysics Data System (ADS)

Behrmann, Gerd; Ozerov, Dmitry; Zangerl, Thomas

2011-12-01

dCache is a well established distributed storage solution used in both high energy physics computing and other disciplines. An overview of the implementation of the xrootd data access protocol within dCache is presented. The performance of various access mechanisms is studied and compared and it is concluded that our implementation is as perfomant as other protocols. This makes dCache a compelling alternative to the Scalla software suite implementation of xrootd, with added value from broad protocol support, including the IETF approved NFS 4.1 protocol.
k(+)-buffer: An Efficient, Memory-Friendly and Dynamic k-buffer Framework.

PubMed

Vasilakis, Andreas-Alexandros; Papaioannou, Georgios; Fudos, Ioannis

2015-06-01

Depth-sorted fragment determination is fundamental for a host of image-based techniques which simulates complex rendering effects. It is also a challenging task in terms of time and space required when rasterizing scenes with high depth complexity. When low graphics memory requirements are of utmost importance, k-buffer can objectively be considered as the most preferred framework which advantageously ensures the correct depth order on a subset of all generated fragments. Although various alternatives have been introduced to partially or completely alleviate the noticeable quality artifacts produced by the initial k-buffer algorithm in the expense of memory increase or performance downgrade, appropriate tools to automatically and dynamically compute the most suitable value of k are still missing. To this end, we introduce k(+)-buffer, a fast framework that accurately simulates the behavior of k-buffer in a single rendering pass. Two memory-bounded data structures: (i) the max-array and (ii) the max-heap are developed on the GPU to concurrently maintain the k-foremost fragments per pixel by exploring pixel synchronization and fragment culling. Memory-friendly strategies are further introduced to dynamically (a) lessen the wasteful memory allocation of individual pixels with low depth complexity frequencies, (b) minimize the allocated size of k-buffer according to different application goals and hardware limitations via a straightforward depth histogram analysis and (c) manage local GPU cache with a fixed-memory depth-sorting mechanism. Finally, an extensive experimental evaluation is provided demonstrating the advantages of our work over all prior k-buffer variants in terms of memory usage, performance cost and image quality.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Yao; Balaprakash, Prasanna; Meng, Jiayuan

We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list ofmore » architectural scaling options for future processor designs.« less
Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

PubMed

Nair, Pradeep S; John, Eugene B

2007-01-01

Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.
A random rule model of surface growth

NASA Astrophysics Data System (ADS)

Mello, Bernardo A.

2015-02-01

Stochastic models of surface growth are usually based on randomly choosing a substrate site to perform iterative steps, as in the etching model, Mello et al. (2001) [5]. In this paper I modify the etching model to perform sequential, instead of random, substrate scan. The randomicity is introduced not in the site selection but in the choice of the rule to be followed in each site. The change positively affects the study of dynamic and asymptotic properties, by reducing the finite size effect and the short-time anomaly and by increasing the saturation time. It also has computational benefits: better use of the cache memory and the possibility of parallel implementation.
Local rollback for fault-tolerance in parallel computing systems

DOEpatents

Blumrich, Matthias A [Yorktown Heights, NY; Chen, Dong [Yorktown Heights, NY; Gara, Alan [Yorktown Heights, NY; Giampapa, Mark E [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugavanam, Krishnan [Yorktown Heights, NY

2012-01-24

A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
Locality in Search Engine Queries and Its Implications for Caching

DTIC Science & Technology

2001-05-01

in the question of whether caching might be effective for search engines as well. They study two real search engine traces by examining query...locality and its implications for caching. The two search engines studied are Vivisimo and Excite. Their trace analysis results show that queries have
Predictive Caching Using the TDAG Algorithm

NASA Technical Reports Server (NTRS)

Laird, Philip; Saul, Ronald

1992-01-01

We describe how the TDAG algorithm for learning to predict symbol sequences can be used to design a predictive cache store. A model of a two-level mass storage system is developed and used to calculate the performance of the cache under various conditions. Experimental simulations provide good confirmation of the model.
Mammal caching of oak acorns in a red pine and a mixed oak stand

Treesearch

E.R. Thorn; W.M. Tzilkowski

1991-01-01

Small mammal caching of oak (Quercus spp.) acorns in adjacent red pine (Pinus resinosa) and mixed-oak stands was investigated at The Penn State Experimental Forest, Huntingdon Co., Pennsylvania. Gray squirrels (Sciurus carolinensis) and mice (Peromyscus spp.) were the most common acorn-caching...
Evaluating the effect of online data compression on the disk cache of a mass storage system

NASA Technical Reports Server (NTRS)

Pentakalos, Odysseas I.; Yesha, Yelena

1994-01-01

A trace driven simulation of the disk cache of a mass storage system was used to evaluate the effect of an online compression algorithm on various performance measures. Traces from the system at NASA's Center for Computational Sciences were used to run the simulation and disk cache hit ratios, number of files and bytes migrating to tertiary storage were measured. The measurements were performed for both an LRU and a size based migration algorithm. In addition to seeing the effect of online data compression on the disk cache performance measure, the simulation provided insight into the characteristics of the interactive references, suggesting that hint based prefetching algorithms are the only alternative for any future improvements to the disk cache hit ratio.
Population substructure in Cache County, Utah: the Cache County study

PubMed Central

2014-01-01

Background Population stratification is a key concern for genetic association analyses. In addition, extreme homogeneity of ethnic origins of a population can make it difficult to interpret how genetic associations in that population may translate into other populations. Here we have evaluated the genetic substructure of samples from the Cache County study relative to the HapMap Reference populations and data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Results Our findings show that the Cache County study is similar in ethnic diversity to the self-reported "Whites" in the ADNI sample and less homogenous than the HapMap CEU population. Conclusions We conclude that the Cache County study is genetically representative of the general European American population in the USA and is an appropriate population for conducting broadly applicable genetic studies. PMID:25078123
The storage system of PCM based on random access file system

NASA Astrophysics Data System (ADS)

Han, Wenbing; Chen, Xiaogang; Zhou, Mi; Li, Shunfen; Li, Gezi; Song, Zhitang

2016-10-01

Emerging memory technologies such as Phase change memory (PCM) tend to offer fast, random access to persistent storage with better scalability. It's a hot topic of academic and industrial research to establish PCM in storage hierarchy to narrow the performance gap. However, the existing file systems do not perform well with the emerging PCM storage, which access storage medium via a slow, block-based interface. In this paper, we propose a novel file system, RAFS, to bring about good performance of PCM, which is built in the embedded platform. We attach PCM chips to the memory bus and build RAFS on the physical address space. In the proposed file system, we simplify traditional system architecture to eliminate block-related operations and layers. Furthermore, we adopt memory mapping and bypassed page cache to reduce copy overhead between the process address space and storage device. XIP mechanisms are also supported in RAFS. To the best of our knowledge, we are among the first to implement file system on real PCM chips. We have analyzed and evaluated its performance with IOZONE benchmark tools. Our experimental results show that the RAFS on PCM outperforms Ext4fs on SDRAM with small record lengths. Based on DRAM, RAFS is significantly faster than Ext4fs by 18% to 250%.
Fast maximum intensity projections of large medical data sets by exploiting hierarchical memory architectures.

PubMed

Kiefer, Gundolf; Lehmann, Helko; Weese, Jürgen

2006-04-01

Maximum intensity projections (MIPs) are an important visualization technique for angiographic data sets. Efficient data inspection requires frame rates of at least five frames per second at preserved image quality. Despite the advances in computer technology, this task remains a challenge. On the one hand, the sizes of computed tomography and magnetic resonance images are increasing rapidly. On the other hand, rendering algorithms do not automatically benefit from the advances in processor technology, especially for large data sets. This is due to the faster evolving processing power and the slower evolving memory access speed, which is bridged by hierarchical cache memory architectures. In this paper, we investigate memory access optimization methods and use them for generating MIPs on general-purpose central processing units (CPUs) and graphics processing units (GPUs), respectively. These methods can work on any level of the memory hierarchy, and we show that properly combined methods can optimize memory access on multiple levels of the hierarchy at the same time. We present performance measurements to compare different algorithm variants and illustrate the influence of the respective techniques. On current hardware, the efficient handling of the memory hierarchy for CPUs improves the rendering performance by a factor of 3 to 4. On GPUs, we observed that the effect is even larger, especially for large data sets. The methods can easily be adjusted to different hardware specifics, although their impact can vary considerably. They can also be used for other rendering techniques than MIPs, and their use for more general image processing task could be investigated in the future.
Improving Internet Archive Service through Proxy Cache.

ERIC Educational Resources Information Center

Yu, Hsiang-Fu; Chen, Yi-Ming; Wang, Shih-Yong; Tseng, Li-Ming

2003-01-01

Discusses file transfer protocol (FTP) servers for downloading archives (files with particular file extensions), and the change to HTTP (Hypertext transfer protocol) with increased Web use. Topics include the Archie server; proxy cache servers; and how to improve the hit rate of archives by a combination of caching and better searching mechanisms.…
Winter prey caching by northern hawk owls in Minnesota

Treesearch

Richard R. Schaefer; D. Craig Rudolph; Jesse F. Fagan

2007-01-01

Northern Hawk Owls (Surnia ulula) have been reported to cache prey during the breeding season for later consumption, but detailed reports of prey caching during the non-breeding season are comparatively rare. We provided prey to four individual Northern Hawk Owls in wintering areas in northeastern Minnesota during 2001 and 2005 and observed their...
Visits, Hits, Caching and Counting on the World Wide Web: Old Wine in New Bottles?

ERIC Educational Resources Information Center

Berthon, Pierre; Pitt, Leyland; Prendergast, Gerard

1997-01-01

Although web browser caching speeds up retrieval, reduces network traffic, and decreases the load on servers and browser's computers, an unintended consequence for marketing research is that Web servers undercount hits. This article explores counting problems, caching, proxy servers, trawler software and presents a series of correction factors…

A measurement-based study of concurrency in a multiprocessor

NASA Technical Reports Server (NTRS)

Mcguire, Patrick John

1987-01-01

A systematic measurement-based methodology for characterizing the amount of concurrency present in a workload, and the effect of concurrency on system performance indices such as cache miss rate and bus activity are developed. Hardware and software instrumentation of an Alliant FX/8 was used to obtain data from a real workload environment. Results show that 35% of the workload is concurrent, with the concurrent periods typically using all available processors. Measurements of periods of change in concurrency show uneven usage of processors during these times. Other system measures, including cache miss rate and processor bus activity, are analyzed with respect to the concurrency measures. Probability of a cache miss is seen to increase with concurrency. The change in cache miss rate is much more sensitive to the fraction of concurrent code in the worklaod than the number of processors active during concurrency. Regression models are developed to quantify the relationships between cache miss rate, bus activity, and the concurrency measures. The model for cache miss rate predicts an increase in the median miss rate value as much as 300% for a 100% increase in concurrency in the workload.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Learn, Mark Walter

Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not availablemore » to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.« less
Mountain chickadees from different elevations sing different songs: acoustic adaptation, temporal drift or signal of local adaptation?

PubMed Central

Branch, Carrie L.; Pravosudov, Vladimir V.

2015-01-01

Song in songbirds is widely thought to function in mate choice and male–male competition. Song is also phenotypically plastic and typically learned from local adults; therefore, it varies across geographical space and can serve as a cue for an individual's location of origin, with females commonly preferring males from their respective location. Geographical variation in song dialect may reflect acoustic adaptation to different environments and/or serve as a signal of local adaptation. In montane environments, environmental differences can occur over an elevation gradient, favouring local adaptations across small spatial scales. We tested whether food caching mountain chickadees, known to exhibit elevation-related differences in food caching intensity, spatial memory and the hippocampus, also sing different dialects despite continuous distribution and close proximity. Male songs were collected from high and low elevations at two different mountains (separated by 35 km) to test whether song differs between elevations and/or between adjacent populations at each mountain. Song structure varied significantly between high and low elevation adjacent populations from the same mountain and between populations from different mountains at the same elevations, despite a continuous distribution across each mountain slope. These results suggest that elevation-related differences in song structure in chickadees might serve as a signal for local adaptation. PMID:26064641
Antioxidant intake and cognitive function of elderly men and women: the Cache County Study.

PubMed

Wengreen, H J; Munger, R G; Corcoran, C D; Zandi, P; Hayden, K M; Fotuhi, M; Skoog, I; Norton, M C; Tschanz, J; Breitner, J C S; Welsh-Bohmer, K A

2007-01-01

We prospectively examined associations between intakes of antioxidants (vitamins C, vitamin E, and carotene) and cognitive function and decline among elderly men and women of the Cache County Study on Memory and Aging in Utah. In 1995, 3831 residents 65 years of age or older completed a baseline survey that included a food frequency questionnaire and cognitive assessment. Cognitive function was assessed using an adapted version of the Modified Mini-Mental State examination (3MS) at baseline and at three subsequent follow-up interviews spanning approximately 7 years. Multivariable-mixed models were used to estimate antioxidant nutrient effects on average 3MS score over time. Increasing quartiles of vitamin C intake alone and combined with vitamin E were associated with higher baseline average 3MS scores (p-trend = 0.013 and 0.02 respectively); this association appeared stronger for food sources compared to supplement or food and supplement sources combined. Study participants with lower levels of intake of vitamin C, vitamin E and carotene had a greater acceleration of the rate of 3MS decline over time compared to those with higher levels of intake. High antioxidant intake from food and supplement sources of vitamin C, vitamin E, and carotene may delay cognitive decline in the elderly.
A weak electric field-assisted ultrafast electrical switching dynamics in In3SbTe2 phase-change memory devices

NASA Astrophysics Data System (ADS)

Pandey, Shivendra Kumar; Manivannan, Anbarasu

2017-07-01

Prefixing a weak electric field (incubation) might enhance the crystallization speed via pre-structural ordering and thereby achieving faster programming of phase change memory (PCM) devices. We employed a weak electric field, equivalent to a constant small voltage (that is incubation voltage, Vi of 0.3 V) to the applied voltage pulse, VA (main pulse) for a systematic understanding of voltage-dependent rapid threshold switching characteristics and crystallization (set) process of In3SbTe2 (IST) PCM devices. Our experimental results on incubation-assisted switching elucidate strikingly one order faster threshold switching, with an extremely small delay time, td of 300 ps, as compared with no incubation voltage (Vi = 0 V) for the same VA. Also, the voltage dependent characteristics of incubation-assisted switching dynamics confirm that the initiation of threshold switching occurs at a lower voltage of 0.82 times of VA. Furthermore, we demonstrate an incubation assisted ultrafast set process of IST device for a low VA of 1.7 V (˜18 % lesser compared to without incubation) within a short pulse-width of 1.5 ns (full width half maximum, FWHM). These findings of ultrafast switching, yet low power set process would immensely be helpful towards designing high speed PCM devices with low power operation.
dCache on Steroids - Delegated Storage Solutions

DOE PAGES

Mkrtchyan, Tigran; Adeyemi, F.; Ashish, A.; ...

2017-11-23

For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH andmore » others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. As a result, we will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.« less
dCache on Steroids - Delegated Storage Solutions

NASA Astrophysics Data System (ADS)

Mkrtchyan, T.; Adeyemi, F.; Ashish, A.; Behrmann, G.; Fuhrmann, P.; Litvintsev, D.; Millar, P.; Rossi, A.; Sahakyan, M.; Starek, J.

2017-10-01

For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH and others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. We will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.
dCache on Steroids - Delegated Storage Solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mkrtchyan, Tigran; Adeyemi, F.; Ashish, A.

For over a decade, dCache.org has delivered a robust software used at more than 80 Universities and research institutes around the world, allowing these sites to provide reliable storage services for the WLCG experiments as well as many other scientific communities. The flexible architecture of dCache allows running it in a wide variety of configurations and platforms - from a SoC based all-in-one Raspberry-Pi up to hundreds of nodes in a multipetabyte installation. Due to lack of managed storage at the time, dCache implemented data placement, replication and data integrity directly. Today, many alternatives are available: S3, GlusterFS, CEPH andmore » others. While such solutions position themselves as scalable storage systems, they cannot be used by many scientific communities out of the box. The absence of community-accepted authentication and authorization mechanisms, the use of product specific protocols and the lack of namespace are some of the reasons that prevent wide-scale adoption of these alternatives. Most of these limitations are already solved by dCache. By delegating low-level storage management functionality to the above-mentioned new systems and providing the missing layer through dCache, we provide a solution which combines the benefits of both worlds - industry standard storage building blocks with the access protocols and authentication required by scientific communities. In this paper, we focus on CEPH, a popular software for clustered storage that supports file, block and object interfaces. CEPH is often used in modern computing centers, for example as a backend to OpenStack services. We will show prototypes of dCache running with a CEPH backend and discuss the benefits and limitations of such an approach. As a result, we will also outline the roadmap for supporting ‘delegated storage’ within the dCache releases.« less
Efficient Cache use for Stencil Operations on Structured Discretization Grids

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob F.

2001-01-01

We derive tight bounds on the cache misses for evaluation of explicit stencil operators on structured grids. Our lower bound is based on the isoperimetrical property of the discrete octahedron. Our upper bound is based on a good surface to volume ratio of a parallelepiped spanned by a reduced basis of the interference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by a factor of three, relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattice have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on a MIPS R10000 processor we show a good correlation between abnormally high numbers of cache misses and unfavorable three-dimensional grids.
Willingness to pay for a 4% chlorhexidine (7.1% chlorhexidine digluconate) product for umbilical cord care in rural Bangladesh: a contingency valuation study

PubMed Central

2013-01-01

Background Recent trials in Bangladesh, Nepal, and Pakistan have shown that chlorhexidine is an effective antiseptic for umbilical cord care compared to existing community-based cord care practices. Because of the aggregate reduction in neonatal mortality in these trials, interest is high in introducing a 7.1% chlorhexidine digluconate liquid or gel that delivers 4% chlorhexidine for umbilical cord care in Bangladesh and elsewhere. Methods In 2010, we conducted a household survey applying a contingent valuation method with 1717 eligible couples (pregnant women or women with a first child younger than 6 months old, and their husbands) in the rural subdistricts of Abhoynagar and Mirsarai in Bangladesh to assess their willingness to pay for three types of umbilical cord care products at different price points. Each respondent was asked about willingness to pay prefixed prices for any one of three 7.1% chlorhexidine digluconate products: 1) a single-dose liquid, 2) a multi-dose liquid, or 3) a gel formulation. Each also reported the maximum price they were independently willing to pay for their selected product. We compared participant willingness-to-pay responses to the prefixed prices with their independently reported maximum prices for each type of the product separately. The comparison identified to what extent the respondents’ positive responses to the prefixed prices matched their independently reported maximum prices. Results This cross matching revealed that willingness to pay the prefixed prices was 41% for the single-dose liquid, 33% for the multi-dose liquid, and 31% for the gel formulation. Although the majority of the respondents were unwilling to pay the prefixed prices, all were willing to pay some amount and reported they could borrow money if necessary. Subsequent analysis of responses to the multi-dose liquid showed borrowing money would not be required if the unit price was Bangladeshi taka 15–25. Conclusions A unit price of Bangladeshi taka 15–25 (US$0.21–0.35) for multi-dose 7.1% chlorhexidine digluconate liquid would be affordable to the primary target population in Bangladesh. Although a large market demand could be generated if the product were available at this price point, subsidization may be required to achieve optimal coverage, especially among poorer families. PMID:24139384
Willingness to pay for a 4% chlorhexidine (7.1% chlorhexidine digluconate) product for umbilical cord care in rural Bangladesh: a contingency valuation study.

PubMed

Coffey, Patricia S; Metzler, Mutsumi; Islam, Ziaul; Koehlmoos, Tracey P

2013-10-18

Recent trials in Bangladesh, Nepal, and Pakistan have shown that chlorhexidine is an effective antiseptic for umbilical cord care compared to existing community-based cord care practices. Because of the aggregate reduction in neonatal mortality in these trials, interest is high in introducing a 7.1% chlorhexidine digluconate liquid or gel that delivers 4% chlorhexidine for umbilical cord care in Bangladesh and elsewhere. In 2010, we conducted a household survey applying a contingent valuation method with 1717 eligible couples (pregnant women or women with a first child younger than 6 months old, and their husbands) in the rural subdistricts of Abhoynagar and Mirsarai in Bangladesh to assess their willingness to pay for three types of umbilical cord care products at different price points. Each respondent was asked about willingness to pay prefixed prices for any one of three 7.1% chlorhexidine digluconate products: 1) a single-dose liquid, 2) a multi-dose liquid, or 3) a gel formulation. Each also reported the maximum price they were independently willing to pay for their selected product. We compared participant willingness-to-pay responses to the prefixed prices with their independently reported maximum prices for each type of the product separately. The comparison identified to what extent the respondents' positive responses to the prefixed prices matched their independently reported maximum prices. This cross matching revealed that willingness to pay the prefixed prices was 41% for the single-dose liquid, 33% for the multi-dose liquid, and 31% for the gel formulation. Although the majority of the respondents were unwilling to pay the prefixed prices, all were willing to pay some amount and reported they could borrow money if necessary. Subsequent analysis of responses to the multi-dose liquid showed borrowing money would not be required if the unit price was Bangladeshi taka 15-25. A unit price of Bangladeshi taka 15-25 (US$0.21-0.35) for multi-dose 7.1% chlorhexidine digluconate liquid would be affordable to the primary target population in Bangladesh. Although a large market demand could be generated if the product were available at this price point, subsidization may be required to achieve optimal coverage, especially among poorer families.
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

DOE PAGES

Lyakh, Dmitry I.

2015-01-05

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
Rendering of 3D-wavelet-compressed concentric mosaic scenery with progressive inverse wavelet synthesis (PIWS)

NASA Astrophysics Data System (ADS)

Wu, Yunnan; Luo, Lin; Li, Jin; Zhang, Ya-Qin

2000-05-01

The concentric mosaics offer a quick solution to the construction and navigation of a virtual environment. To reduce the vast data amount of the concentric mosaics, a compression scheme based on 3D wavelet transform has been proposed in a previous paper. In this work, we investigate the efficient implementation of the renderer. It is preferable not to expand the compressed bitstream as a whole, so that the memory consumption of the renderer can be reduced. Instead, only the data necessary to render the current view are accessed and decoded. The progressive inverse wavelet synthesis (PIWS) algorithm is proposed to provide the random data access and to reduce the calculation for the data access requests to a minimum. A mixed cache is used in PIWS, where the entropy decoded wavelet coefficient, intermediate result of lifting and fully synthesized pixel are all stored at the same memory unit because of the in- place calculation property of the lifting implementation. PIWS operates with a finite state machine, where each memory unit is attached with a state to indicate what type of content is currently stored. The computational saving achieved by PIWS is demonstrated with extensive experiment results.
Killing and caching of an adult White-tailed deer, Odocoileus virginianus, by a single Gray Wolf, Canis lupus

USGS Publications Warehouse

Nelson, Michael E.

2011-01-01

A single Gray Wolf (Canis lupus) killed an adult male White-tailed Deer (Odocoileus virginianus) and cached the intact carcass in 76 cm of snow. The carcass was revisited and entirely consumed between four and seven days later. This is the first recorded observation of a Gray Wolf caching an entire adult deer.
A search game model of the scatter hoarder's problem

PubMed Central

Alpern, Steve; Fokkink, Robbert; Lidbetter, Thomas; Clayton, Nicola S.

2012-01-01

Scatter hoarders are animals (e.g. squirrels) who cache food (nuts) over a number of sites for later collection. A certain minimum amount of food must be recovered, possibly after pilfering by another animal, in order to survive the winter. An optimal caching strategy is one that maximizes the survival probability, given worst case behaviour of the pilferer. We modify certain ‘accumulation games’ studied by Kikuta & Ruckle (2000 J. Optim. Theory Appl.) and Kikuta & Ruckle (2001 Naval Res. Logist.), which modelled the problem of optimal diversification of resources against catastrophic loss, to include the depth at which the food is hidden at each caching site. Optimal caching strategies can then be determined as equilibria in a new ‘caching game’. We show how the distribution of food over sites and the site-depths of the optimal caching varies with the animal's survival requirements and the amount of pilfering. We show that in some cases, ‘decoy nuts’ are required to be placed above other nuts that are buried further down at the same site. Methods from the field of search games are used. Some empirically observed behaviour can be shown to be optimal in our model. PMID:22012971
Image matrix processor for fast multi-dimensional computations

DOEpatents

Roberson, George P.; Skeate, Michael F.

1996-01-01

An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
Joshua tree (Yucca brevifolia) seeds are dispersed by seed-caching rodents

USGS Publications Warehouse

Vander Wall, S.B.; Esque, T.; Haines, D.; Garnett, M.; Waitman, B.A.

2006-01-01

Joshua tree (Yucca brevifolia) is a distinctive and charismatic plant of the Mojave Desert. Although floral biology and seed production of Joshua tree and other yuccas are well understood, the fate of Joshua tree seeds has never been studied. We tested the hypothesis that Joshua tree seeds are dispersed by seed-caching rodents. We radioactively labelled Joshua tree seeds and followed their fates at five source plants in Potosi Wash, Clark County, Nevada, USA. Rodents made a mean of 30.6 caches, usually within 30 m of the base of source plants. Caches contained a mean of 5.2 seeds buried 3-30 nun deep. A variety of rodent species appears to have prepared the caches. Three of the 836 Joshua tree seeds (0.4%) cached germinated the following spring. Seed germination using rodent exclosures was nearly 15%. More than 82% of seeds in open plots were removed by granivores, and neither microsite nor supplemental water significantly affected germination. Joshua tree produces seeds in indehiscent pods or capsules, which rodents dismantle to harvest seeds. Because there is no other known means of seed dispersal, it is possible that the Joshua tree-rodent seed dispersal interaction is an obligate mutualism for the plant.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shoopman, J. D.

This report documents Livermore Computing (LC) activities in support of ASC L2 milestone 5589: Modernization and Expansion of LLNL Archive Disk Cache, due March 31, 2016. The full text of the milestone is included in Attachment 1. The description of the milestone is: Description: Configuration of archival disk cache systems will be modernized to reduce fragmentation, and new, higher capacity disk subsystems will be deployed. This will enhance archival disk cache capability for ASC archive users, enabling files written to the archives to remain resident on disk for many (6–12) months, regardless of file size. The milestone was completed inmore » three phases. On August 26, 2015 subsystems with 6PB of disk cache were deployed for production use in LLNL’s unclassified HPSS environment. Following that, on September 23, 2015 subsystems with 9 PB of disk cache were deployed for production use in LLNL’s classified HPSS environment. On January 31, 2016, the milestone was fully satisfied when the legacy Data Direct Networks (DDN) archive disk cache subsystems were fully retired from production use in both LLNL’s unclassified and classified HPSS environments, and only the newly deployed systems were in use.« less
Minimizing Cache Misses Using Minimum-Surface Bodies

NASA Technical Reports Server (NTRS)

Frumkin, Michael; VanderWijngaart, Rob; Biegel, Bryan (Technical Monitor)

2002-01-01

A number of known techniques for improving cache performance in scientific computations involve the reordering of the iteration space. Some of these reorderings can be considered as coverings of the iteration space with the sets having good surface-to-volume ratio. Use of such sets reduces the number of cache misses in computations of local operators having the iteration space as a domain. First, we derive lower bounds which any algorithm must suffer while computing a local operator on a grid. Then we explore coverings of iteration spaces represented by structured and unstructured grids which allow us to approach these lower bounds. For structured grids we introduce a covering by successive minima tiles of the interference lattice of the grid. We show that the covering has low surface-to-volume ratio and present a computer experiment showing actual reduction of the cache misses achieved by using these tiles. For planar unstructured grids we show existence of a covering which reduces the number of cache misses to the level of structured grids. On the other hand, we present a triangulation of a 3-dimensional cube such that any local operator on the corresponding grid has significantly larger number of cache misses than a similar operator on a structured grid.
Is bigger always better? A critical appraisal of the use of volumetric analysis in the study of the hippocampus.

PubMed

Roth, Timothy C; Brodin, Anders; Smulders, Tom V; LaDage, Lara D; Pravosudov, Vladimir V

2010-03-27

A well-developed spatial memory is important for many animals, but appears especially important for scatter-hoarding species. Consequently, the scatter-hoarding system provides an excellent paradigm in which to study the integrative aspects of memory use within an ecological and evolutionary framework. One of the main tenets of this paradigm is that selection for enhanced spatial memory for cache locations should specialize the brain areas involved in memory. One such brain area is the hippocampus (Hp). Many studies have examined this adaptive specialization hypothesis, typically relating spatial memory to Hp volume. However, it is unclear how the volume of the Hp is related to its function for spatial memory. Thus, the goal of this article is to evaluate volume as a main measurement of the degree of morphological and physiological adaptation of the Hp as it relates to memory. We will briefly review the evidence for the specialization of memory in food-hoarding animals and discuss the philosophy behind volume as the main currency. We will then examine the problems associated with this approach, attempting to understand the advantages and limitations of using volume and discuss alternatives that might yield more specific hypotheses. Overall, there is strong evidence that the Hp is involved in the specialization of spatial memory in scatter-hoarding animals. However, volume may be only a coarse proxy for more relevant and subtle changes in the structure of the brain underlying changes in behaviour. To better understand the nature of this brain/memory relationship, we suggest focusing on more specific and relevant features of the Hp, such as the number or size of neurons, variation in connectivity depending on dendritic and axonal arborization and the number of synapses. These should generate more specific hypotheses derived from a solid theoretical background and should provide a better understanding of both neural mechanisms of memory and their evolution.

Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Sourouri, Mohammed; Birger Raknes, Espen

2017-04-01

In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture

NASA Astrophysics Data System (ADS)

Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.

2017-12-01

A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.
Engineering the CernVM-Filesystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

NASA Astrophysics Data System (ADS)

Dykstra, D.; Bockelman, B.; Blomer, J.; Herner, K.; Levshina, T.; Slyz, M.

2015-12-01

A common use pattern in the computing models of particle physics experiments is running many distributed applications that read from a shared set of data files. We refer to this data is auxiliary data, to distinguish it from (a) event data from the detector (which tends to be different for every job), and (b) conditions data about the detector (which tends to be the same for each job in a batch of jobs). Relatively speaking, conditions data also tends to be relatively small per job where both event data and auxiliary data are larger per job. Unlike event data, auxiliary data comes from a limited working set of shared files. Since there is spatial locality of the auxiliary data access, the use case appears to be identical to that of the CernVM- Filesystem (CVMFS). However, we show that distributing auxiliary data through CVMFS causes the existing CVMFS infrastructure to perform poorly. We utilize a CVMFS client feature called "alien cache" to cache data on existing local high-bandwidth data servers that were engineered for storing event data. This cache is shared between the worker nodes at a site and replaces caching CVMFS files on both the worker node local disks and on the site's local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and the Hadoop Distributed File System (HDFS) FUSE interface, and measured performance. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache HTTP interface. As a result, we have a design for an end-to-end high-bandwidth distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.
Engineering the CernVM-Filesystem as a High Bandwidth Distributed Filesystem for Auxiliary Physics Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dykstra, D.; Bockelman, B.; Blomer, J.

A common use pattern in the computing models of particle physics experiments is running many distributed applications that read from a shared set of data files. We refer to this data is auxiliary data, to distinguish it from (a) event data from the detector (which tends to be different for every job), and (b) conditions data about the detector (which tends to be the same for each job in a batch of jobs). Relatively speaking, conditions data also tends to be relatively small per job where both event data and auxiliary data are larger per job. Unlike event data, auxiliarymore » data comes from a limited working set of shared files. Since there is spatial locality of the auxiliary data access, the use case appears to be identical to that of the CernVM- Filesystem (CVMFS). However, we show that distributing auxiliary data through CVMFS causes the existing CVMFS infrastructure to perform poorly. We utilize a CVMFS client feature called 'alien cache' to cache data on existing local high-bandwidth data servers that were engineered for storing event data. This cache is shared between the worker nodes at a site and replaces caching CVMFS files on both the worker node local disks and on the site's local squids. We have tested this alien cache with the dCache NFSv4.1 interface, Lustre, and the Hadoop Distributed File System (HDFS) FUSE interface, and measured performance. In addition, we use high-bandwidth data servers at central sites to perform the CVMFS Stratum 1 function instead of the low-bandwidth web servers deployed for the CVMFS software distribution function. We have tested this using the dCache HTTP interface. As a result, we have a design for an end-to-end high-bandwidth distributed caching read-only filesystem, using existing client software already widely deployed to grid worker nodes and existing file servers already widely installed at grid sites. Files are published in a central place and are soon available on demand throughout the grid and cached locally on the site with a convenient POSIX interface. This paper discusses the details of the architecture and reports performance measurements.« less
Replication Strategy for Spatiotemporal Data Based on Distributed Caching System

PubMed Central

Xiong, Lian; Tao, Yang; Xu, Juan; Zhao, Lun

2018-01-01

The replica strategy in distributed cache can effectively reduce user access delay and improve system performance. However, developing a replica strategy suitable for varied application scenarios is still quite challenging, owing to differences in user access behavior and preferences. In this paper, a replication strategy for spatiotemporal data (RSSD) based on a distributed caching system is proposed. By taking advantage of the spatiotemporal locality and correlation of user access, RSSD mines high popularity and associated files from historical user access information, and then generates replicas and selects appropriate cache node for placement. Experimental results show that the RSSD algorithm is simple and efficient, and succeeds in significantly reducing user access delay. PMID:29342897
Using Minimum-Surface Bodies for Iteration Space Partitioning

NASA Technical Reports Server (NTRS)

Frumlin, Michael; VanderWijngaart, Rob F.; Biegel, Bryan (Technical Monitor)

2001-01-01

A number of known techniques for improving cache performance in scientific computations involve the reordering of the iteration space. Some of these reorderings can be considered as coverings of the iteration space with the sets having good surface-to-volume ratio. Use of such sets reduces the number of cache misses in computations of local operators having the iteration space as a domain. We study coverings of iteration spaces represented by structured and unstructured grids. For structured grids we introduce a covering based on successive minima tiles of the interference lattice of the grid. We show that the covering has good surface-to-volume ratio and present a computer experiment showing actual reduction of the cache misses achieved by using these tiles. For unstructured grids no cache efficient covering can be guaranteed. We present a triangulation of a 3-dimensional cube such that any local operator on the corresponding grid has significantly larger number of cache misses than a similar operator on a structured grid.
An IPv6 routing lookup algorithm using weight-balanced tree based on prefix value for virtual router

NASA Astrophysics Data System (ADS)

Chen, Lingjiang; Zhou, Shuguang; Zhang, Qiaoduo; Li, Fenghua

2016-10-01

Virtual router enables the coexistence of different networks on the same physical facility and has lately attracted a great deal of attention from researchers. As the number of IPv6 addresses is rapidly increasing in virtual routers, designing an efficient IPv6 routing lookup algorithm is of great importance. In this paper, we present an IPv6 lookup algorithm called weight-balanced tree (WBT). WBT merges Forwarding Information Bases (FIBs) of virtual routers into one spanning tree, and compresses the space cost. WBT's average time complexity and the worst case time complexity of lookup and update process are both O(logN) and space complexity is O(cN) where N is the size of routing table and c is a constant. Experiments show that WBT helps reduce more than 80% Static Random Access Memory (SRAM) cost in comparison to those separation schemes. WBT also achieves the least average search depth comparing with other homogeneous algorithms.
A hybrid magnetic/complementary metal oxide semiconductor three-context memory bit cell for non-volatile circuit design

NASA Astrophysics Data System (ADS)

Jovanović, B.; Brum, R. M.; Torres, L.

2014-04-01

After decades of continued scaling to the beat of Moore's law, it now appears that conventional silicon based devices are approaching their physical limits. In today's deep-submicron nodes, a number of short-channel and quantum effects are emerging that affect the manufacturing process, as well as, the functionality of the microelectronic systems-on-chip. Spintronics devices that exploit both the intrinsic spin of the electron and its associated magnetic moment, in addition to its fundamental electronic charge, are promising solutions to circumvent these scaling threats. Being compatible with the CMOS technology, such devices offer a promising synergy of radiation immunity, infinite endurance, non-volatility, increased density, etc. In this paper, we present a hybrid (magnetic/CMOS) cell that is able to store and process data both electrically and magnetically. The cell is based on perpendicular spin-transfer torque magnetic tunnel junctions (STT-MTJs) and is suitable for use in magnetic random access memories and reprogrammable computing (non-volatile registers, processor cache memories, magnetic field-programmable gate arrays, etc). To demonstrate the potential our hybrid cell, we physically implemented a small hybrid memory block using 45 nm × 45 nm round MTJs for the magnetic part and 28 nm fully depleted silicon on insulator (FD-SOI) technology for the CMOS part. We also report the cells measured performances in terms of area, robustness, read/write speed and energy consumption.
Image matrix processor for fast multi-dimensional computations

DOEpatents

Roberson, G.P.; Skeate, M.F.

1996-10-15

An apparatus for multi-dimensional computation is disclosed which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination. 10 figs.
A Novel Two-Tier Cooperative Caching Mechanism for the Optimization of Multi-Attribute Periodic Queries in Wireless Sensor Networks

PubMed Central

Zhou, ZhangBing; Zhao, Deng; Shu, Lei; Tsang, Kim-Fung

2015-01-01

Wireless sensor networks, serving as an important interface between physical environments and computational systems, have been used extensively for supporting domain applications, where multiple-attribute sensory data are queried from the network continuously and periodically. Usually, certain sensory data may not vary significantly within a certain time duration for certain applications. In this setting, sensory data gathered at a certain time slot can be used for answering concurrent queries and may be reused for answering the forthcoming queries when the variation of these data is within a certain threshold. To address this challenge, a popularity-based cooperative caching mechanism is proposed in this article, where the popularity of sensory data is calculated according to the queries issued in recent time slots. This popularity reflects the possibility that sensory data are interested in the forthcoming queries. Generally, sensory data with the highest popularity are cached at the sink node, while sensory data that may not be interested in the forthcoming queries are cached in the head nodes of divided grid cells. Leveraging these cooperatively cached sensory data, queries are answered through composing these two-tier cached data. Experimental evaluation shows that this approach can reduce the network communication cost significantly and increase the network capability. PMID:26131665
Respiratory hospital admissions associated with PM10 pollution in Utah, Salt Lake, and Cache Valleys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pope CA, I.I.I.

This study assessed the association between respiratory hospital admissions and PM10 pollution in Utah, Salt Lake, and Cache valleys during April 1985 through March 1989. Utah and Salt Lake valleys had high levels of PM10 pollution that violated both the annual and 24-h standards issued by the Environmental Protection Agency (EPA). Much lower PM10 levels occurred in the Cache Valley. Utah Valley experienced the intermittent operation of its primary source of PM10 pollution: an integrated steel mill. Bronchitis and asthma admissions for preschool-age children were approximately twice as frequent in Utah Valley when the steel mill was operating versus whenmore » it was not. Similar differences were not observed in Salt Lake or Cache valleys. Even though Cache Valley had higher smoking rates and lower temperatures in winter than did Utah Valley, per capita bronchitis and asthma admissions for all ages were approximately twice as high in Utah Valley. During the period when the steel mill was closed, differences in per capita admissions between Utah and Cache valleys narrowed considerably. Regression analysis also demonstrated a statistical association between respiratory hospital admissions and PM10 pollution. The results suggest that PM10 pollution plays a role in the incidence and severity of respiratory disease.« less
Ecosystem services from keystone species: diversionary seeding and seed-caching desert rodents can enhance Indian ricegrass seedling establishment

USGS Publications Warehouse

Longland, William; Ostoja, Steven M.

2013-01-01

Seeds of Indian ricegrass (Achnatherum hymenoides), a native bunchgrass common to sandy soils on arid western rangelands, are naturally dispersed by seed-caching rodent species, particularly Dipodomys spp. (kangaroo rats). These animals cache large quantities of seeds when mature seeds are available on or beneath plants and recover most of their caches for consumption during the remainder of the year. Unrecovered seeds in caches account for the vast majority of Indian ricegrass seedling recruitment. We applied three different densities of white millet (Panicum miliaceum) seeds as “diversionary foods” to plots at three Great Basin study sites in an attempt to reduce rodents' over-winter cache recovery so that more Indian ricegrass seeds would remain in soil seedbanks and potentially establish new seedlings. One year after diversionary seed application, a moderate level of Indian ricegrass seedling recruitment occurred at two of our study sites in western Nevada, although there was no recruitment at the third site in eastern California. At both Nevada sites, the number of Indian ricegrass seedlings sampled along transects was significantly greater on all plots treated with diversionary seeds than on non-seeded control plots. However, the density of diversionary seeds applied to plots had a marginally non-significant effect on seedling recruitment, and it was not correlated with recruitment patterns among plots. Results suggest that application of a diversionary seed type that is preferred by seed-caching rodents provides a promising passive restoration strategy for target plant species that are dispersed by these rodents.
Mitochondrial genomic analysis of late onset Alzheimer's disease reveals protective haplogroups H6A1A/H6A1B: the Cache County Study on Memory in Aging.

PubMed

Ridge, Perry G; Maxwell, Taylor J; Corcoran, Christopher D; Norton, Maria C; Tschanz, Joann T; O'Brien, Elizabeth; Kerber, Richard A; Cawthon, Richard M; Munger, Ronald G; Kauwe, John S K

2012-01-01

Alzheimer's disease (AD) is the most common cause of dementia and AD risk clusters within families. Part of the familial aggregation of AD is accounted for by excess maternal vs. paternal inheritance, a pattern consistent with mitochondrial inheritance. The role of specific mitochondrial DNA (mtDNA) variants and haplogroups in AD risk is uncertain. We determined the complete mitochondrial genome sequence of 1007 participants in the Cache County Study on Memory in Aging, a population-based prospective cohort study of dementia in northern Utah. AD diagnoses were made with a multi-stage protocol that included clinical examination and review by a panel of clinical experts. We used TreeScanning, a statistically robust approach based on haplotype networks, to analyze the mtDNA sequence data. Participants with major mitochondrial haplotypes H6A1A and H6A1B showed a reduced risk of AD (p=0.017, corrected for multiple comparisons). The protective haplotypes were defined by three variants: m.3915G>A, m.4727A>G, and m.9380G>A. These three variants characterize two different major haplogroups. Together m.4727A>G and m.9380G>A define H6A1, and it has been suggested m.3915G>A defines H6A. Additional variants differentiate H6A1A and H6A1B; however, none of these variants had a significant relationship with AD case-control status. Our findings provide evidence of a reduced risk of AD for individuals with mtDNA haplotypes H6A1A and H6A1B. These findings are the results of the largest study to date with complete mtDNA genome sequence data, yet the functional significance of the associated haplotypes remains unknown and replication in others studies is necessary.
Mitochondrial Genomic Analysis of Late Onset Alzheimer’s Disease Reveals Protective Haplogroups H6A1A/H6A1B: The Cache County Study on Memory in Aging

PubMed Central

Ridge, Perry G.; Maxwell, Taylor J.; Corcoran, Christopher D.; Norton, Maria C.; Tschanz, JoAnn T.; O’Brien, Elizabeth; Kerber, Richard A.; Cawthon, Richard M.; Munger, Ronald G.; Kauwe, John S. K.

2012-01-01

Background Alzheimer’s disease (AD) is the most common cause of dementia and AD risk clusters within families. Part of the familial aggregation of AD is accounted for by excess maternal vs. paternal inheritance, a pattern consistent with mitochondrial inheritance. The role of specific mitochondrial DNA (mtDNA) variants and haplogroups in AD risk is uncertain. Methodology/Principal Findings We determined the complete mitochondrial genome sequence of 1007 participants in the Cache County Study on Memory in Aging, a population-based prospective cohort study of dementia in northern Utah. AD diagnoses were made with a multi-stage protocol that included clinical examination and review by a panel of clinical experts. We used TreeScanning, a statistically robust approach based on haplotype networks, to analyze the mtDNA sequence data. Participants with major mitochondrial haplotypes H6A1A and H6A1B showed a reduced risk of AD (p = 0.017, corrected for multiple comparisons). The protective haplotypes were defined by three variants: m.3915G>A, m.4727A>G, and m.9380G>A. These three variants characterize two different major haplogroups. Together m.4727A>G and m.9380G>A define H6A1, and it has been suggested m.3915G>A defines H6A. Additional variants differentiate H6A1A and H6A1B; however, none of these variants had a significant relationship with AD case-control status. Conclusions/Significance Our findings provide evidence of a reduced risk of AD for individuals with mtDNA haplotypes H6A1A and H6A1B. These findings are the results of the largest study to date with complete mtDNA genome sequence data, yet the functional significance of the associated haplotypes remains unknown and replication in others studies is necessary. PMID:23028804
Neuropsychiatric symptoms as risk factors for progression from CIND to dementia: the Cache County Study.

PubMed

Peters, M E; Rosenberg, P B; Steinberg, M; Norton, M C; Welsh-Bohmer, K A; Hayden, K M; Breitner, J; Tschanz, J T; Lyketsos, C G

2013-11-01

To examine the association of neuropsychiatric symptom (NPS) severity with risk of transition to all-cause dementia, Alzheimer disease (AD), and vascular dementia (VaD). Survival analysis of time to dementia, AD, or VaD onset. Population-based study. 230 participants diagnosed with cognitive impairment, no dementia (CIND) from the Cache County Study of Memory Health and Aging were followed for a mean of 3.3 years. The Neuropsychiatric Inventory (NPI) was used to quantify the presence, frequency, and severity of NPS. Chi-squared statistics, t-tests, and Cox proportional hazard ratios were used to assess associations. The conversion rate from CIND to all-cause dementia was 12% per year, with risk factors including an APOE ε4 allele, lower Mini-Mental State Examination, lower 3MS, and higher CDR sum-of-boxes. The presence of at least one NPS was a risk factor for all-cause dementia, as was the presence of NPS with mild severity. Nighttime behaviors were a risk factor for all-cause dementia and of AD, whereas hallucinations were a risk factor for VaD. These data confirm that NPS are risk factors for conversion from CIND to dementia. Of special interest is that even NPS of mild severity are a risk for all-cause dementia or AD. Copyright © 2013 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Neuropsychiatric symptoms as risk factors for progression from CIND to dementia: The Cache County Study

PubMed Central

Peters, ME; Rosenberg, PB; Steinberg, M; Norton, MC; Welsh-Bohmer, KA; Hayden, KM; Breitner, J; Tschanz, JT; CG, Lyketsos

2012-01-01

Objectives To examine the association of neuropsychiatric symptom (NPS) severity with risk of transition to all-cause dementia, Alzheimer's disease (AD), and vascular dementia (VaD). Design Survival analysis of time to dementia, AD, or VaD onset. Setting Population-based study. Participants 230 participants diagnosed with cognitive impairment, no dementia (CIND) from the Cache County Study of Memory Health and Aging were followed for a mean of 3.3 years. Measurements The Neuropsychiatric Inventory (NPI) was used to quantify the presence, frequency, and severity of NPS. Chi-square statistics, t-tests, and Cox proportional hazard ratios were used to assess associations. Results The conversion rate from CIND to all-cause dementia was 12% per year, with risk factors including an APOE ε4 allele, lower MMSE, lower 3MS, and higher CDR sum-of-boxes. The presence of at least one NPS was a risk factor for all-cause dementia, as was the presence of NPS with mild severity. Nighttime behaviors were a risk factor for all-cause dementia and of AD, while hallucinations were a risk factor for VaD. Conclusions These data confirm that NPS are risk factors for conversion from CIND to dementia. Of special interest is that even NPS of mild severity are a risk for all-cause dementia or AD. PMID:23567370
Security Enhancement Using Cache Based Reauthentication in WiMAX Based E-Learning System

PubMed Central

Rajagopal, Chithra; Bhuvaneshwaran, Kalaavathi

2015-01-01

WiMAX networks are the most suitable for E-Learning through their Broadcast and Multicast Services at rural areas. Authentication of users is carried out by AAA server in WiMAX. In E-Learning systems the users must be forced to perform reauthentication to overcome the session hijacking problem. The reauthentication of users introduces frequent delay in the data access which is crucial in delaying sensitive applications such as E-Learning. In order to perform fast reauthentication caching mechanism known as Key Caching Based Authentication scheme is introduced in this paper. Even though the cache mechanism requires extra storage to keep the user credentials, this type of mechanism reduces the 50% of the delay occurring during reauthentication. PMID:26351658
Security Enhancement Using Cache Based Reauthentication in WiMAX Based E-Learning System.

PubMed

Rajagopal, Chithra; Bhuvaneshwaran, Kalaavathi

2015-01-01

WiMAX networks are the most suitable for E-Learning through their Broadcast and Multicast Services at rural areas. Authentication of users is carried out by AAA server in WiMAX. In E-Learning systems the users must be forced to perform reauthentication to overcome the session hijacking problem. The reauthentication of users introduces frequent delay in the data access which is crucial in delaying sensitive applications such as E-Learning. In order to perform fast reauthentication caching mechanism known as Key Caching Based Authentication scheme is introduced in this paper. Even though the cache mechanism requires extra storage to keep the user credentials, this type of mechanism reduces the 50% of the delay occurring during reauthentication.

Caching Joint Shortcut Routing to Improve Quality of Service for Information-Centric Networking.

PubMed

Huang, Baixiang; Liu, Anfeng; Zhang, Chengyuan; Xiong, Naixue; Zeng, Zhiwen; Cai, Zhiping

2018-05-29

Hundreds of thousands of ubiquitous sensing (US) devices have provided an enormous number of data for Information-Centric Networking (ICN), which is an emerging network architecture that has the potential to solve a great variety of issues faced by the traditional network. A Caching Joint Shortcut Routing (CJSR) scheme is proposed in this paper to improve the Quality of service (QoS) for ICN. The CJSR scheme mainly has two innovations which are different from other in-network caching schemes: (1) Two routing shortcuts are set up to reduce the length of routing paths. Because of some inconvenient transmission processes, the routing paths of previous schemes are prolonged, and users can only request data from Data Centers (DCs) until the data have been uploaded from Data Producers (DPs) to DCs. Hence, the first kind of shortcut is built from DPs to users directly. This shortcut could release the burden of whole network and reduce delay. Moreover, in the second shortcut routing method, a Content Router (CR) which could yield shorter length of uploading routing path from DPs to DCs is chosen, and then data packets are uploaded through this chosen CR. In this method, the uploading path shares some segments with the pre-caching path, thus the overall length of routing paths is reduced. (2) The second innovation of the CJSR scheme is that a cooperative pre-caching mechanism is proposed so that QoS could have a further increase. Besides being used in downloading routing, the pre-caching mechanism can also be used when data packets are uploaded towards DCs. Combining uploading and downloading pre-caching, the cooperative pre-caching mechanism exhibits high performance in different situations. Furthermore, to address the scarcity of storage size, an algorithm that could make use of storage from idle CRs is proposed. After comparing the proposed scheme with five existing schemes via simulations, experiments results reveal that the CJSR scheme could reduce the total number of processed interest packets by 54.8%, enhance the cache hits of each CR and reduce the number of total hop counts by 51.6% and cut down the length of routing path for users to obtain their interested data by 28.6⁻85.7% compared with the traditional NDN scheme. Moreover, the length of uploading routing path could be decreased by 8.3⁻33.3%.
Cache Sharing and Isolation Tradeoffs in Multicore Mixed-Criticality Systems

DTIC Science & Technology

2015-05-01

of lockdown registers, to provide way-based partitioning. These alternatives are illustrated in Fig. 1 with respect to a quad-core ARM Cortex A9...presented a cache-partitioning scheme that allows multiple tasks to share the same cache partition on a single processor (as we do for Level-A and...sets and determined the fraction that were schedulable on our target hardware platform, the quad-core ARM Cortex A9 machine mentioned earlier, the LLC
Constant time worker thread allocation via configuration caching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E; O'Brien, John K. P.

Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
Cooperation and information replication in wireless networks.

PubMed

Poularakis, Konstantinos; Tassiulas, Leandros

2016-03-06

A significant portion of today's network traffic is due to recurring downloads of a few popular contents. It has been observed that replicating the latter in caches installed at network edges-close to users-can drastically reduce network bandwidth usage and improve content access delay. Such caching architectures are gaining increasing interest in recent years as a way of dealing with the explosive traffic growth, fuelled further by the downward slope in storage space price. In this work, we provide an overview of caching with a particular emphasis on emerging network architectures that enable caching at the radio access network. In this context, novel challenges arise due to the broadcast nature of the wireless medium, which allows simultaneously serving multiple users tuned into a multicast stream, and the mobility of the users who may be frequently handed off from one cell tower to another. Existing results indicate that caching at the wireless edge has a great potential in removing bottlenecks on the wired backbone networks. Taking into consideration the schedule of multicast service and mobility profiles is crucial to extract maximum benefit in network performance. © 2016 The Author(s).
Turbidity and Total Suspended Solids on the Lower Cache River Watershed, AR.

PubMed

Rosado-Berrios, Carlos A; Bouldin, Jennifer L

2016-06-01

The Cache River Watershed (CRW) in Arkansas is part of one of the largest remaining bottomland hardwood forests in the US. Although wetlands are known to improve water quality, the Cache River is listed as impaired due to sedimentation and turbidity. This study measured turbidity and total suspended solids (TSS) in seven sites of the lower CRW; six sites were located on the Bayou DeView tributary of the Cache River. Turbidity and TSS levels ranged from 1.21 to 896 NTU, and 0.17 to 386.33 mg/L respectively and had an increasing trend over the 3-year study. However, a decreasing trend from upstream to downstream in the Bayou DeView tributary was noted. Sediment loading calculated from high precipitation events and mean TSS values indicate that contributions from the Cache River main channel was approximately 6.6 times greater than contributions from Bayou DeView. Land use surrounding this river channel affects water quality as wetlands provide a filter for sediments in the Bayou DeView channel.
Tier 3 batch system data locality via managed caches

NASA Astrophysics Data System (ADS)

Fischer, Max; Giffels, Manuel; Jung, Christopher; Kühn, Eileen; Quast, Günter

2015-05-01

Modern data processing increasingly relies on data locality for performance and scalability, whereas the common HEP approaches aim for uniform resource pools with minimal locality, recently even across site boundaries. To combine advantages of both, the High- Performance Data Analysis (HPDA) Tier 3 concept opportunistically establishes data locality via coordinated caches. In accordance with HEP Tier 3 activities, the design incorporates two major assumptions: First, only a fraction of data is accessed regularly and thus the deciding factor for overall throughput. Second, data access may fallback to non-local, making permanent local data availability an inefficient resource usage strategy. Based on this, the HPDA design generically extends available storage hierarchies into the batch system. Using the batch system itself for scheduling file locality, an array of independent caches on the worker nodes is dynamically populated with high-profile data. Cache state information is exposed to the batch system both for managing caches and scheduling jobs. As a result, users directly work with a regular, adequately sized storage system. However, their automated batch processes are presented with local replications of data whenever possible.
Attitude determination and calibration using a recursive maximum likelihood-based adaptive Kalman filter

NASA Technical Reports Server (NTRS)

Kelly, D. A.; Fermelia, A.; Lee, G. K. F.

1990-01-01

An adaptive Kalman filter design that utilizes recursive maximum likelihood parameter identification is discussed. At the center of this design is the Kalman filter itself, which has the responsibility for attitude determination. At the same time, the identification algorithm is continually identifying the system parameters. The approach is applicable to nonlinear, as well as linear systems. This adaptive Kalman filter design has much potential for real time implementation, especially considering the fast clock speeds, cache memory and internal RAM available today. The recursive maximum likelihood algorithm is discussed in detail, with special attention directed towards its unique matrix formulation. The procedure for using the algorithm is described along with comments on how this algorithm interacts with the Kalman filter.
Indexed triangle strips optimization for real-time visualization using genetic algorithm: preliminary study

NASA Astrophysics Data System (ADS)

Tanaka, Kiyoshi; Takano, Shuichi; Sugimura, Tatsuo

2000-10-01

In this work we focus on the indexed triangle strips that is an extended representation of triangle strips to improve the efficiency for geometrical transformation of vertices, and present a method to construct optimum indexed triangle strips using Genetic Algorithm (GA) for real-time visualization. The main objective of this work is how to optimally construct indexed triangle strips by improving the ratio that reuses the data stored in the cash memory and simultaneously reducing the total index numbers with GA. Simulation results verify that the average index numbers and cache miss ratio per polygon cold be small, and consequently the total visualization time required for the optimum solution obtained by this scheme could be remarkably reduced.
Pragmatic open space box utilization: asteroid survey model using distributed objects management based articulation (DOMBA)

NASA Astrophysics Data System (ADS)

Mohammad, Atif Farid; Straub, Jeremy

2015-05-01

A multi-craft asteroid survey has significant data synchronization needs. Limited communication speeds drive exacting performance requirements. Tables have been used in Relational Databases, which are structure; however, DOMBA (Distributed Objects Management Based Articulation) deals with data in terms of collections. With this, no read/write roadblocks to the data exist. A master/slave architecture is created by utilizing the Gossip protocol. This facilitates expanding a mission that makes an important discovery via the launch of another spacecraft. The Open Space Box Framework facilitates the foregoing while also providing a virtual caching layer to make sure that continuously accessed data is available in memory and that, upon closing the data file, recharging is applied to the data.
Data Resilience in the dCache Storage System

DOE PAGES

Rossi, A. L.; Adeyemi, F.; Ashish, A.; ...

2017-11-23

In this study we discuss design, implementation considerations, and performance of a new Resilience Service in the dCache storage system responsible for file availability and durability functionality.
Alerting prefixes for speech warning messages. [in helicopters

NASA Technical Reports Server (NTRS)

Bucher, N. M.; Voorhees, J. W.; Karl, R. L.; Werner, E.

1984-01-01

A major question posed by the design of an integrated voice information display/warning system for next-generation helicopter cockpits is whether an alerting prefix should precede voice warning messages; if so, the characteristics desirable in such a cue must also be addressed. Attention is presently given to the results of a study which ascertained pilot response time and response accuracy to messages preceded by either neutral cues or the cognitively appropriate semantic cues. Both verbal cues and messages were spoken in direct, phoneme-synthesized speech, and a training manipulation was included to determine the extent to which previous exposure to speech thus produced facilitates these messages' comprehension. Results are discussed in terms of the importance of human factors research in cockpit display design.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid

NASA Astrophysics Data System (ADS)

Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua

2016-10-01

A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
dCache: Big Data storage for HEP communities and beyond

NASA Astrophysics Data System (ADS)

Millar, A. P.; Behrmann, G.; Bernardt, C.; Fuhrmann, P.; Litvintsev, D.; Mkrtchyan, T.; Petersen, A.; Rossi, A.; Schwank, K.

2014-06-01

With over ten years in production use dCache data storage system has evolved to match ever changing lansdcape of continually evolving storage technologies with new solutions to both existing problems and new challenges. In this paper, we present three areas of innovation in dCache: providing efficient access to data with NFS v4.1 pNFS, adoption of CDMI and WebDAV as an alternative to SRM for managing data, and integration with alternative authentication mechanisms.
Wolves, Canis lupus, carry and cache the collars of radio-collared White-tailed Deer, Odocoileus virginianus, they killed

USGS Publications Warehouse

Nelson, Michael E.; Mech, L. David

2011-01-01

Wolves (Canis lupus) in northeastern Minnesota cached six radio-collars (four in winter, two in spring-summer) of 202 radio-collared White-tailed Deer (Odocoileus virginianus) they killed or consumed from 1975 to 2010. A Wolf bedded on top of one collar cached in snow. We found one collar each at a Wolf den and Wolf rendezvous site, 2.5 km and 0.5 km respectively, from each deer's previous locations.
Landscape pattern of seed banks and anthropogenic impacts in forested wetlands of the northern Mississippi River Alluvial Valley

USGS Publications Warehouse

Middleton, B.; Wu, X.B.

2008-01-01

Agricultural development on floodplains contributes to hydrologic alteration and forest fragmentation, which may alter landscape-level processes. These changes may be related to shifts in the seed bank composition of floodplain wetlands. We examined the patterns of seed bank composition across a floodplain watershed by looking at the number of seeds germinating per m2 by species in 60 farmed and intact forested wetlands along the Cache River watershed in Illinois. The seed bank composition was compared above and below a water diversion (position), which artificially subdivides the watershed. Position of these wetlands represented the most variability of Axis I in a Nonmetric Multidimensional Scaling (NMS) analysis of site environmental variables and their relationship to seed bank composition (coefficient of determination for Axis 1: r2 = 0.376; Pearson correlation of position to Axis 1: r = 0.223). The 3 primary axes were also represented by other site environmental variables, including farming status (farmed or unfarmed), distance from the mouth of the river, latitude, and longitude. Spatial analysis based on Mantel correlograms showed that both water-dispersed and wind/water-dispersed seed assemblages had strong spatial structure in the upper Cache (above the water diversion), bur the spatial structure of water-dispersed seed assemblage was diminished in the lower Cache (below the water diversion), which lost floodpulsing. Bearing analysis also Suggested that water-dispersal process had a stronger influence on the overall spatial pattern of seed assemblage in the upper Cache, while wind/water-dispersal process had a stronger influence in the lower Cache. An analysis of the landscapes along the river showed that the mid-lower Cache (below the water diversion) had undergone greater land cover changes associated with agriculture than did the upper Cache watershed. Thus, the combination of forest fragmentation and hydrologic changes in the surrounding landscape may have had an influence on the seed bank composition and spatial distribution of the seed banks of the Cache River watershed. Our study suggests that the spatial pattern of seed bank composition may be influenced by landscape-level factors and processes.
Forest rodents provide directed dispersal of Jeffrey pine seeds

USGS Publications Warehouse

Briggs, J.S.; Wall, S.B.V.; Jenkins, S.H.

2009-01-01

Some species of animals provide directed dispersal of plant seeds by transporting them nonrandomly to microsites where their chances of producing healthy seedlings are enhanced. We investigated whether this mutualistic interaction occurs between granivorous rodents and Jeffrey pine (Pinus jeffreyi) in the eastern Sierra Nevada by comparing the effectiveness of random abiotic seed dispersal with the dispersal performed by four species of rodents: deer mice (Peromyscus maniculatus), yellow-pine and long-eared chipmunks (Tamias amoenus and T. quadrimaculatus), and golden-mantled ground squirrels (Spermophilus lateralis). We conducted two caching studies using radio-labeled seeds, the first with individual animals in field enclosures and the second with a community of rodents in open forest. We used artificial caches to compare the fates of seeds placed at the range of microsites and depths used by animals with the fates of seeds dispersed abiotically. Finally, we examined the distribution and survival of naturally establishing seedlings over an eight-year period.Several lines of evidence suggested that this community of rodents provided directed dispersal. Animals preferred to cache seeds in microsites that were favorable for emergence or survival of seedlings and avoided caching in microsites in which seedlings fared worst. Seeds buried at depths typical of animal caches (5–25 mm) produced at least five times more seedlings than did seeds on the forest floor. The four species of rodents differed in the quality of dispersal they provided. Small, shallow caches made by deer mice most resembled seeds dispersed by abiotic processes, whereas many of the large caches made by ground squirrels were buried too deeply for successful emergence of seedlings. Chipmunks made the greatest number of caches within the range of depths and microsites favorable for establishment of pine seedlings. Directed dispersal is an important element of the population dynamics of Jeffrey pine, a dominant tree species in the eastern Sierra Nevada. Quantifying the occurrence and dynamics of directed dispersal in this and other cases will contribute to better understanding of mutualistic coevolution of plants and animals and to more effective management of ecosystems in which directed dispersal is a keystone process.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Stream Processors

NASA Astrophysics Data System (ADS)

Erez, Mattan; Dally, William J.

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
Support for Diagnosis of Custom Computer Hardware

NASA Technical Reports Server (NTRS)

Molock, Dwaine S.

2008-01-01

The Coldfire SDN Diagnostics software is a flexible means of exercising, testing, and debugging custom computer hardware. The software is a set of routines that, collectively, serve as a common software interface through which one can gain access to various parts of the hardware under test and/or cause the hardware to perform various functions. The routines can be used to construct tests to exercise, and verify the operation of, various processors and hardware interfaces. More specifically, the software can be used to gain access to memory, to execute timer delays, to configure interrupts, and configure processor cache, floating-point, and direct-memory-access units. The software is designed to be used on diverse NASA projects, and can be customized for use with different processors and interfaces. The routines are supported, regardless of the architecture of a processor that one seeks to diagnose. The present version of the software is configured for Coldfire processors on the Subsystem Data Node processor boards of the Solar Dynamics Observatory. There is also support for the software with respect to Mongoose V, RAD750, and PPC405 processors or their equivalents.
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut

File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.