Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Rapid solution of large-scale systems of equations
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
NASA Astrophysics Data System (ADS)
Okamoto, Taro; Takenaka, Hiroshi; Nakamura, Takeshi; Aoki, Takayuki
2010-12-01
We adopted the GPU (graphics processing unit) to accelerate the large-scale finite-difference simulation of seismic wave propagation. The simulation can benefit from the high-memory bandwidth of GPU because it is a "memory intensive" problem. In a single-GPU case we achieved a performance of about 56 GFlops, which was about 45-fold faster than that achieved by a single core of the host central processing unit (CPU). We confirmed that the optimized use of fast shared memory and registers were essential for performance. In the multi-GPU case with three-dimensional domain decomposition, the non-contiguous memory alignment in the ghost zones was found to impose quite long time in data transfer between GPU and the host node. This problem was solved by using contiguous memory buffers for ghost zones. We achieved a performance of about 2.2 TFlops by using 120 GPUs and 330 GB of total memory: nearly (or more than) 2200 cores of host CPUs would be required to achieve the same performance. The weak scaling was nearly proportional to the number of GPUs. We therefore conclude that GPU computing for large-scale simulation of seismic wave propagation is a promising approach as a faster simulation is possible with reduced computational resources compared to CPUs.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Compiler-directed cache management in multiprocessors
NASA Technical Reports Server (NTRS)
Cheong, Hoichi; Veidenbaum, Alexander V.
1990-01-01
The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.
Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra
2016-07-19
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.
Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H
2018-04-01
The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Transactive memory systems scale for couples: development and validation
Hewitt, Lauren Y.; Roberts, Lynne D.
2015-01-01
People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level
Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra
2016-01-01
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678
Shelton, Jill T.; Elliott, Emily M.; Hill, B. D.; Calamia, Matthew R.; Gouvier, Wm. Drew
2010-01-01
The working memory (WM) construct is conceptualized similarly across domains of psychology, yet the methods used to measure WM function vary widely. The present study examined the relationship between WM measures used in the laboratory and those used in applied settings. A large sample of undergraduates completed three laboratory-based WM measures (operation span, listening span, and n-back), as well as the WM subtests from the Wechsler Adult Intelligence Scale-III and the Wechsler Memory Scale-III. Performance on all of the WM subtests of the clinical batteries shared positive correlations with the lab measures; however, the Arithmetic and Spatial Span subtests shared lower correlations than the other WM tests. Factor analyses revealed that a factor comprising scores from the three lab WM measures and the clinical subtest, Letter-Number Sequencing (LNS), provided the best measurement of WM. Additionally, a latent variable approach was taken using fluid intelligence as a criterion construct to further discriminate between the WM tests. The results revealed that the lab measures, along with the LNS task, were the best predictors of fluid abilities. PMID:20161647
On initial Brain Activity Mapping of episodic and semantic memory code in the hippocampus.
Tsien, Joe Z; Li, Meng; Osan, Remus; Chen, Guifen; Lin, Longian; Wang, Phillip Lei; Frey, Sabine; Frey, Julietta; Zhu, Dajiang; Liu, Tianming; Zhao, Fang; Kuang, Hui
2013-10-01
It has been widely recognized that the understanding of the brain code would require large-scale recording and decoding of brain activity patterns. In 2007 with support from Georgia Research Alliance, we have launched the Brain Decoding Project Initiative with the basic idea which is now similarly advocated by BRAIN project or Brain Activity Map proposal. As the planning of the BRAIN project is currently underway, we share our insights and lessons from our efforts in mapping real-time episodic memory traces in the hippocampus of freely behaving mice. We show that appropriate large-scale statistical methods are essential to decipher and measure real-time memory traces and neural dynamics. We also provide an example of how the carefully designed, sometime thinking-outside-the-box, behavioral paradigms can be highly instrumental to the unraveling of memory-coding cell assembly organizing principle in the hippocampus. Our observations to date have led us to conclude that the specific-to-general categorical and combinatorial feature-coding cell assembly mechanism represents an emergent property for enabling the neural networks to generate and organize not only episodic memory, but also semantic knowledge and imagination. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
On Initial Brain Activity Mapping of Associative Memory Code in the Hippocampus
Tsien, Joe Z.; Li, Meng; Osan, Remus; Chen, Guifen; Lin, Longian; Lei Wang, Phillip; Frey, Sabine; Frey, Julietta; Zhu, Dajiang; Liu, Tianming; Zhao, Fang; Kuang, Hui
2013-01-01
It has been widely recognized that the understanding of the brain code would require large-scale recording and decoding of brain activity patterns. In 2007 with support from Georgia Research Alliance, we have launched the Brain Decoding Project Initiative with the basic idea which is now similarly advocated by BRAIN project or Brain Activity Map proposal. As the planning of the BRAIN project is currently underway, we share our insights and lessons from our efforts in mapping real-time episodic memory traces in the hippocampus of freely behaving mice. We show that appropriate large-scale statistical methods are essential to decipher and measure real-time memory traces and neural dynamics. We also provide an example of how the carefully designed, sometime thinking-outside-the-box, behavioral paradigms can be highly instrumental to the unraveling of memory-coding cell assembly organizing principle in the hippocampus. Our observations to date have led us to conclude that the specific-to-general categorical and combinatorial feature-coding cell assembly mechanism represents an emergent property for enabling the neural networks to generate and organize not only episodic memory, but also semantic knowledge and imagination. PMID:23838072
Exploration versus exploitation in space, mind, and society
Hills, Thomas T.; Todd, Peter M.; Lazer, David; Redish, A. David; Couzin, Iain D.
2015-01-01
Search is a ubiquitous property of life. Although diverse domains have worked on search problems largely in isolation, recent trends across disciplines indicate that the formal properties of these problems share similar structures and, often, similar solutions. Moreover, internal search (e.g., memory search) shows similar characteristics to external search (e.g., spatial foraging), including shared neural mechanisms consistent with a common evolutionary origin across species. Search problems and their solutions also scale from individuals to societies, underlying and constraining problem solving, memory, information search, and scientific and cultural innovation. In summary, search represents a core feature of cognition, with a vast influence on its evolution and processes across contexts and requiring input from multiple domains to understand its implications and scope. PMID:25487706
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Memory Network For Distributed Data Processors
NASA Technical Reports Server (NTRS)
Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George
1992-01-01
Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howison, Mark; Bethel, E. Wes; Childs, Hank
2012-01-01
With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites
NASA Technical Reports Server (NTRS)
Sues, R. H.; Lua, Y. J.; Smith, M. D.
1994-01-01
The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Parallel Computation of the Regional Ocean Modeling System (ROMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, P; Song, Y T; Chao, Y
2005-04-05
The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Comparison of two paradigms for distributed shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.
1990-08-01
The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
Strategies for Energy Efficient Resource Management of Hybrid Programming Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Dong; Supinski, Bronis de; Schulz, Martin
2013-01-01
Many scientific applications are programmed using hybrid programming models that use both message-passing and shared-memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared-memory or message-passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybrid models since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoptionmore » of hybrid programming models, we increasingly need improved energy efficiency in hybrid parallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybrid programming models. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74% on average and up to 13.8%) with some performance gain (up to 7.5%) or negligible performance loss.« less
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
Mind-to-mind heteroclinic coordination: Model of sequential episodic memory initiation.
Afraimovich, V S; Zaks, M A; Rabinovich, M I
2018-05-01
Retrieval of episodic memory is a dynamical process in the large scale brain networks. In social groups, the neural patterns, associated with specific events directly experienced by single members, are encoded, recalled, and shared by all participants. Here, we construct and study the dynamical model for the formation and maintaining of episodic memory in small ensembles of interacting minds. We prove that the unconventional dynamical attractor of this process-the nonsmooth heteroclinic torus-is structurally stable within the Lotka-Volterra-like sets of equations. Dynamics on this torus combines the absence of chaos with asymptotic instability of every separate trajectory; its adequate quantitative characteristics are length-related Lyapunov exponents. Variation of the coupling strength between the participants results in different types of sequential switching between metastable states; we interpret them as stages in formation and modification of the episodic memory.
Mind-to-mind heteroclinic coordination: Model of sequential episodic memory initiation
NASA Astrophysics Data System (ADS)
Afraimovich, V. S.; Zaks, M. A.; Rabinovich, M. I.
2018-05-01
Retrieval of episodic memory is a dynamical process in the large scale brain networks. In social groups, the neural patterns, associated with specific events directly experienced by single members, are encoded, recalled, and shared by all participants. Here, we construct and study the dynamical model for the formation and maintaining of episodic memory in small ensembles of interacting minds. We prove that the unconventional dynamical attractor of this process—the nonsmooth heteroclinic torus—is structurally stable within the Lotka-Volterra-like sets of equations. Dynamics on this torus combines the absence of chaos with asymptotic instability of every separate trajectory; its adequate quantitative characteristics are length-related Lyapunov exponents. Variation of the coupling strength between the participants results in different types of sequential switching between metastable states; we interpret them as stages in formation and modification of the episodic memory.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...
2017-01-28
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De
Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation
NASA Astrophysics Data System (ADS)
Wu, Baodong; Li, Shigang; Zhang, Yunquan; Nie, Ningming
2017-02-01
The parallel Kinetic Monte Carlo (KMC) algorithm based on domain decomposition has been widely used in large-scale physical simulations. However, the communication overhead of the parallel KMC algorithm is critical, and severely degrades the overall performance and scalability. In this paper, we present a hybrid optimization strategy to reduce the communication overhead for the parallel KMC simulations. We first propose a communication aggregation algorithm to reduce the total number of messages and eliminate the communication redundancy. Then, we utilize the shared memory to reduce the memory copy overhead of the intra-node communication. Finally, we optimize the communication scheduling using the neighborhood collective operations. We demonstrate the scalability and high performance of our hybrid optimization strategy by both theoretical and experimental analysis. Results show that the optimized KMC algorithm exhibits better performance and scalability than the well-known open-source library-SPPARKS. On 32-node Xeon E5-2680 cluster (total 640 cores), the optimized algorithm reduces the communication time by 24.8% compared with SPPARKS.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
NASA Technical Reports Server (NTRS)
Stehle, Roy H.; Ogier, Richard G.
1993-01-01
Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
Enabling Graph Appliance for Genome Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Rina; Graves, Jeffrey A; Lee, Sangkeun
2015-01-01
In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers developed have used de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to storemore » and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multi-threaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray s Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing a de Bruin graphs as RDF graphs and propose an iterative querying approach for finding Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.« less
Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.
2018-01-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541
Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L
2017-07-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
The force on the flex: Global parallelism and portability
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1986-01-01
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
Shared memories reveal shared structure in neural activity across individuals
Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.
2016-01-01
Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
Wang, Manjie; Saudino, Kimberly J
2013-12-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo, Jacques, Burack, & Frye, 2002) and several memory tasks from the Mental Scale of the BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory.
Wang, Manjie; Saudino, Kimberly J.
2014-01-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo et al., 2002) and several memory tasks from the Mental Scale of BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory. PMID:24098922
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
States of mind: Emotions, body feelings, and thoughts share distributed neural networks
Oosterwijk, Suzanne; Lindquist, Kristen A.; Anderson, Eric; Dautoff, Rebecca; Moriguchi, Yoshiya; Barrett, Lisa Feldman
2012-01-01
Scientists have traditionally assumed that different kinds of mental states (e.g., fear, disgust, love, memory, planning, concentration, etc.) correspond to different psychological faculties that have domain-specific correlates in the brain. Yet, growing evidence points to the constructionist hypothesis that mental states emerge from the combination of domain-general psychological processes that map to large-scale distributed brain networks. In this paper, we report a novel study testing a constructionist model of the mind in which participants generated three kinds of mental states (emotions, body feelings, or thoughts) while we measured activity within large-scale distributed brain networks using fMRI. We examined the similarity and differences in the pattern of network activity across these three classes of mental states. Consistent with a constructionist hypothesis, a combination of large-scale distributed networks contributed to emotions, thoughts, and body feelings, although these mental states differed in the relative contribution of those networks. Implications for a constructionist functional architecture of diverse mental states are discussed. PMID:22677148
Parallelization of KENO-Va Monte Carlo code
NASA Astrophysics Data System (ADS)
Ramón, Javier; Peña, Jorge
1995-07-01
KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
Tuning collective communication for Partitioned Global Address Space programming models
Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...
2011-06-12
Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Computational Nanotechnology of Materials, Devices, and Machines: Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Srivastava, Deepak; Kwak, Dolhan (Technical Monitor)
2000-01-01
The mechanics and chemistry of carbon nanotubes have relevance for their numerous electronic applications. Mechanical deformations such as bending and twisting affect the nanotube's conductive properties, and at the same time they possess high strength and elasticity. Two principal techniques were utilized including the analysis of large scale classical molecular dynamics on a shared memory architecture machine and a quantum molecular dynamics methodology. In carbon based electronics, nanotubes are used as molecular wires with topological defects which are mediated through various means. Nanotubes can be connected to form junctions.
Computational Nanotechnology of Materials, Electronics and Machines: Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Srivastava, Deepak
2001-01-01
This report presents the goals and research of the Integrated Product Team (IPT) on Devices and Nanotechnology. NASA's needs for this technology are discussed and then related to the research focus of the team. The two areas of focus for technique development are: 1) large scale classical molecular dynamics on a shared memory architecture machine; and 2) quantum molecular dynamics methodology. The areas of focus for research are: 1) nanomechanics/materials; 2) carbon based electronics; 3) BxCyNz composite nanotubes and junctions; 4) nano mechano-electronics; and 5) nano mechano-chemistry.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Division of attention as a function of the number of steps, visual shifts, and memory load
NASA Technical Reports Server (NTRS)
Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.
1986-01-01
The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?
ERIC Educational Resources Information Center
Chuderski, Adam; Necka, Edward
2012-01-01
Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
States of mind: emotions, body feelings, and thoughts share distributed neural networks.
Oosterwijk, Suzanne; Lindquist, Kristen A; Anderson, Eric; Dautoff, Rebecca; Moriguchi, Yoshiya; Barrett, Lisa Feldman
2012-09-01
Scientists have traditionally assumed that different kinds of mental states (e.g., fear, disgust, love, memory, planning, concentration, etc.) correspond to different psychological faculties that have domain-specific correlates in the brain. Yet, growing evidence points to the constructionist hypothesis that mental states emerge from the combination of domain-general psychological processes that map to large-scale distributed brain networks. In this paper, we report a novel study testing a constructionist model of the mind in which participants generated three kinds of mental states (emotions, body feelings, or thoughts) while we measured activity within large-scale distributed brain networks using fMRI. We examined the similarity and differences in the pattern of network activity across these three classes of mental states. Consistent with a constructionist hypothesis, a combination of large-scale distributed networks contributed to emotions, thoughts, and body feelings, although these mental states differed in the relative contribution of those networks. Implications for a constructionist functional architecture of diverse mental states are discussed. Copyright © 2012 Elsevier Inc. All rights reserved.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Attention and Visuospatial Working Memory Share the Same Processing Resources
Feng, Jing; Pratt, Jay; Spence, Ian
2012-01-01
Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826
Android Protection Mechanism: A Signed Code Security Mechanism for Smartphone Applications
2011-03-01
status registers, exceptions, endian support, unaligned access support, synchronization primitives , the Jazelle Extension, and saturated integer...supports comprehensive non-blocking shared-memory synchronization primitives that scale for multiple-processor system designs. This is an improvement... synchronization . Memory semaphores can be loaded and altered without interruption because the load and store operations are atomic. Processor
White, Corey N.; Kapucu, Aycan; Bruno, Davide; Rotello, Caren M.; Ratcliff, Roger
2014-01-01
Recognition memory studies often find that emotional items are more likely than neutral items to be labeled as studied. Previous work suggests this bias is driven by increased memory strength/familiarity for emotional items. We explored strength and bias interpretations of this effect with the conjecture that emotional stimuli might seem more familiar because they share features with studied items from the same category. Categorical effects were manipulated in a recognition task by presenting lists with a small, medium, or large proportion of emotional words. The liberal memory bias for emotional words was only observed when a medium or large proportion of categorized words were presented in the lists. Similar, though weaker, effects were observed with categorized words that were not emotional (animal names). These results suggest that liberal memory bias for emotional items may be largely driven by effects of category membership. PMID:24303902
White, Corey N; Kapucu, Aycan; Bruno, Davide; Rotello, Caren M; Ratcliff, Roger
2014-01-01
Recognition memory studies often find that emotional items are more likely than neutral items to be labelled as studied. Previous work suggests this bias is driven by increased memory strength/familiarity for emotional items. We explored strength and bias interpretations of this effect with the conjecture that emotional stimuli might seem more familiar because they share features with studied items from the same category. Categorical effects were manipulated in a recognition task by presenting lists with a small, medium or large proportion of emotional words. The liberal memory bias for emotional words was only observed when a medium or large proportion of categorised words were presented in the lists. Similar, though weaker, effects were observed with categorised words that were not emotional (animal names). These results suggest that liberal memory bias for emotional items may be largely driven by effects of category membership.
Practical Formal Verification of MPI and Thread Programs
NASA Astrophysics Data System (ADS)
Gopalakrishnan, Ganesh; Kirby, Robert M.
Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Method for prefetching non-contiguous data structures
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2009-05-05
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Levy, Scott; Ferreira, Kurt B.; Bridges, Patrick G.; ...
2014-12-09
Building the next-generation of extreme-scale distributed systems will require overcoming several challenges related to system resilience. As the number of processors in these systems grow, the failure rate increases proportionally. One of the most common sources of failure in large-scale systems is memory. In this paper, we propose a novel runtime for transparently exploiting memory content similarity to improve system resilience by reducing the rate at which memory errors lead to node failure. We evaluate the viability of this approach by examining memory snapshots collected from eight high-performance computing (HPC) applications and two important HPC operating systems. Based on themore » characteristics of the similarity uncovered, we conclude that our proposed approach shows promise for addressing system resilience in large-scale systems.« less
Koppel, Jeremy; Goldberg, Terry E; Gordon, Marc L; Huey, Edward; Davies, Peter; Keehlisen, Linda; Huet, Sara; Christen, Erica; Greenwald, Blaine S
2012-11-01
Behavioral disturbances occur in nearly all Alzheimer disease (AD) patients together with an array of cognitive impairments. Prior investigations have failed to demonstrate specific associations between them, suggesting an independent, rather than shared, pathophysiology. The objective of this study was to reexamine this issue using an extensive cognitive battery together with a sensitive neurobehavioral and functional rating scale to correlate behavioral syndromes and cognitive domains across the spectrum of impairment in dementia. Cross-sectional study of comprehensive cognitive and behavioral ratings in subjects with AD and mild cognitive impairment. Memory disorders research center. Fifty subjects with AD and 26 subjects with mild cognitive impairment; and their caregivers. Cognitive rating scales administered included the Mini-Mental State Examination; the Modified Mini-Mental State Examination; the Boston Naming Test; the Benton Visual Retention Test; the Consortium to Establish a Registry for Alzheimer's Disease Neuropsychology Assessment; the Controlled Oral Word Test; the Wechsler Memory Scale logical memory I and logical memory II task; the Wechsler Memory Scale-Revised digit span; the Wechsler Adult Intelligence Scale-Revised digit symbol task; and the Clock Drawing Task together with the Clinical Dementia Rating Scale and the Neuropsychiatric Inventory. Stepwise regression of cognitive domains with symptom domains revealed significant associations of mood with impaired executive function/speed of processing (Δr = 0.22); impaired working memory (Δr = 0.05); impaired visual memory (Δr = 0.07); and worsened Clinical Dementia Rating Scale (Δr = 0.08). Psychosis was significantly associated with impaired working memory (Δr = 0.13). Mood symptoms appear to impact diverse cognitive realms and to compromise functional performance. Among neuropsychological indices, the unique relationship between working memory and psychosis suggests a possible common underlying neurobiology. 2012 American Association for Geriatric Psychiatry
Effects of motor congruence on visual working memory.
Quak, Michel; Pecher, Diane; Zeelenberg, Rene
2014-10-01
Grounded-cognition theories suggest that memory shares processing resources with perception and action. The motor system could be used to help memorize visual objects. In two experiments, we tested the hypothesis that people use motor affordances to maintain object representations in working memory. Participants performed a working memory task on photographs of manipulable and nonmanipulable objects. The manipulable objects were objects that required either a precision grip (i.e., small items) or a power grip (i.e., large items) to use. A concurrent motor task that could be congruent or incongruent with the manipulable objects caused no difference in working memory performance relative to nonmanipulable objects. Moreover, the precision- or power-grip motor task did not affect memory performance on small and large items differently. These findings suggest that the motor system plays no part in visual working memory.
UPC++ Programmer’s Guide (v1.0 2017.9)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, D.
UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, Dan
UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
Li, Yong; Yuan, Gonglin; Wei, Zengxin
2015-01-01
In this paper, a trust-region algorithm is proposed for large-scale nonlinear equations, where the limited-memory BFGS (L-M-BFGS) update matrix is used in the trust-region subproblem to improve the effectiveness of the algorithm for large-scale problems. The global convergence of the presented method is established under suitable conditions. The numerical results of the test problems show that the method is competitive with the norm method.
Visualization in aerospace research with a large wall display system
NASA Astrophysics Data System (ADS)
Matsuo, Yuichi
2002-05-01
National Aerospace Laboratory of Japan has built a large- scale visualization system with a large wall-type display. The system has been operational since April 2001 and comprises a 4.6x1.5-meter (15x5-foot) rear projection screen with 3 BARCO 812 high-resolution CRT projectors. The reason we adopted the 3-gun CRT projectors is support for stereoscopic viewing, ease with color/luminosity matching and accuracy of edge-blending. The system is driven by a new SGI Onyx 3400 server of distributed shared-memory architecture with 32 CPUs, 64Gbytes memory, 1.5TBytes FC RAID disk and 6 IR3 graphics pipelines. Software is another important issue for us to make full use of the system. We have introduced some applications available in a multi- projector environment such as AVS/MPE, EnSight Gold and COVISE, and been developing some software tools that create volumetric images with using SGI graphics libraries. The system is mainly used for visualization fo computational fluid dynamics (CFD) simulation sin aerospace research. Visualized CFD results are of our help for designing an improved configuration of aerospace vehicles and analyzing their aerodynamic performances. These days we also use it for various collaborations among researchers.
Oscillatory mechanisms of process binding in memory.
Klimesch, Wolfgang; Freunberger, Roman; Sauseng, Paul
2010-06-01
A central topic in cognitive neuroscience is the question, which processes underlie large scale communication within and between different neural networks. The basic assumption is that oscillatory phase synchronization plays an important role for process binding--the transient linking of different cognitive processes--which may be considered a special type of large scale communication. We investigate this question for memory processes on the basis of different types of oscillatory synchronization mechanisms. The reviewed findings suggest that theta and alpha phase coupling (and phase reorganization) reflect control processes in two large memory systems, a working memory and a complex knowledge system that comprises semantic long-term memory. It is suggested that alpha phase synchronization may be interpreted in terms of processes that coordinate top-down control (a process guided by expectancy to focus on relevant search areas) and access to memory traces (a process leading to the activation of a memory trace). An analogous interpretation is suggested for theta oscillations and the controlled access to episodic memories. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Memory Transmission in Small Groups and Large Networks: An Agent-Based Model.
Luhmann, Christian C; Rajaram, Suparna
2015-12-01
The spread of social influence in large social networks has long been an interest of social scientists. In the domain of memory, collaborative memory experiments have illuminated cognitive mechanisms that allow information to be transmitted between interacting individuals, but these experiments have focused on small-scale social contexts. In the current study, we took a computational approach, circumventing the practical constraints of laboratory paradigms and providing novel results at scales unreachable by laboratory methodologies. Our model embodied theoretical knowledge derived from small-group experiments and replicated foundational results regarding collaborative inhibition and memory convergence in small groups. Ultimately, we investigated large-scale, realistic social networks and found that agents are influenced by the agents with which they interact, but we also found that agents are influenced by nonneighbors (i.e., the neighbors of their neighbors). The similarity between these results and the reports of behavioral transmission in large networks offers a major theoretical insight by linking behavioral transmission to the spread of information. © The Author(s) 2015.
NASA Astrophysics Data System (ADS)
Zheng, Maoteng; Zhang, Yongjun; Zhou, Shunping; Zhu, Junfeng; Xiong, Xiaodong
2016-07-01
In recent years, new platforms and sensors in photogrammetry, remote sensing and computer vision areas have become available, such as Unmanned Aircraft Vehicles (UAV), oblique camera systems, common digital cameras and even mobile phone cameras. Images collected by all these kinds of sensors could be used as remote sensing data sources. These sensors can obtain large-scale remote sensing data which consist of a great number of images. Bundle block adjustment of large-scale data with conventional algorithm is very time and space (memory) consuming due to the super large normal matrix arising from large-scale data. In this paper, an efficient Block-based Sparse Matrix Compression (BSMC) method combined with the Preconditioned Conjugate Gradient (PCG) algorithm is chosen to develop a stable and efficient bundle block adjustment system in order to deal with the large-scale remote sensing data. The main contribution of this work is the BSMC-based PCG algorithm which is more efficient in time and memory than the traditional algorithm without compromising the accuracy. Totally 8 datasets of real data are used to test our proposed method. Preliminary results have shown that the BSMC method can efficiently decrease the time and memory requirement of large-scale data.
The CA3 Network as a Memory Store for Spatial Representations
ERIC Educational Resources Information Center
Papp, Gergely; Witter, Menno P.; Treves, Alessandro
2007-01-01
Comparative neuroanatomy suggests that the CA3 region of the mammalian hippocampus is directly homologous with the medio-dorsal pallium in birds and reptiles, with which it largely shares the basic organization of primitive cortex. Autoassociative memory models, which are generically applicable to cortical networks, then help assess how well CA3…
Spiking neural network simulation: memory-optimal synaptic event scheduling.
Stewart, Robert D; Gurney, Kevin N
2011-06-01
Spiking neural network simulations incorporating variable transmission delays require synaptic events to be scheduled prior to delivery. Conventional methods have memory requirements that scale with the total number of synapses in a network. We introduce novel scheduling algorithms for both discrete and continuous event delivery, where the memory requirement scales instead with the number of neurons. Superior algorithmic performance is demonstrated using large-scale, benchmarking network simulations.
Social Transmission of False Memory in Small Groups and Large Networks.
Maswood, Raeya; Rajaram, Suparna
2018-05-21
Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.
Li, Chen; Yongbo, Lv; Chi, Chen
2015-01-01
Based on the data from 30 provincial regions in China, an assessment and empirical analysis was carried out on the utilizing and sharing of the large-scale scientific equipment with a comprehensive assessment model established on the three dimensions, namely, equipment, utilization and sharing. The assessment results were interpreted in light of relevant policies. The results showed that on the whole, the overall development level in the provincial regions in eastern and central China is higher than that in western China. This is mostly because of the large gap among the different provincial regions with respect to the equipped level. But in terms of utilizing and sharing, some of the Western provincial regions, such as Ningxia, perform well, which is worthy of our attention. Policy adjustment targeting at the differentiation, elevation of the capacity of the equipment management personnel, perfection of the sharing and cooperation platform, and the promotion of the establishment of open sharing funds, are all important measures to promote the utilization and sharing of the large-scale scientific equipment and to narrow the gap among different regions. PMID:25937850
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer
This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Sensorimotor memory of object weight distribution during multidigit grasp.
Albert, Frederic; Santello, Marco; Gordon, Andrew M
2009-10-09
We studied the ability to transfer three-digit force sharing patterns learned through consecutive lifts of an object with an asymmetric center of mass (CM). After several object lifts, we asked subjects to rotate and translate the object to the contralateral hand and perform one additional lift. This task was performed under two weight conditions (550 and 950 g) to determine the extent to which subjects would be able to transfer weight and CM information. Learning transfer was quantified by measuring the extent to which force sharing patterns and peak object roll on the first post-translation trial resembled those measured on the pre-translation trial with the same CM. We found that the overall gain of fingertip forces was transferred following object rotation, but that the scaling of individual digit forces was specific to the learned digit-object configuration, and thus was not transferred following rotation. As a result, on the first post-translation trial there was a significantly larger object roll following object lift-off than on the pre-translation trial. This suggests that sensorimotor memories for weight, requiring scaling of fingertip force gain, may differ from memories for mass distribution.
Enabling large-scale next-generation sequence assembly with Blacklight
Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.
2014-01-01
Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.
2012-09-01
A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.
MPgrafic: A parallel MPI version of Grafic-1
NASA Astrophysics Data System (ADS)
Prunet, Simon; Pichon, Christophe
2013-04-01
MPgrafic is a parallel MPI version of Grafic-1 which can produce large cosmological initial conditions on a cluster without requiring shared memory. The real Fourier transforms are carried in place using fftw while minimizing the amount of used memory (at the expense of performance) in the spirit of Grafic-1. The writing of the output file is also carried in parallel. In addition to the technical parallelization, it provides three extensions over Grafic-1: it can produce power spectra with baryon wiggles (DJ Eisenstein and W. Hu, Ap. J. 496);it has the optional ability to load a lower resolution noise map corresponding to the low frequency component which will fix the larger scale modes of the simulation (extra flag 0/1 at the end of the input process) in the spirit of Grafic-2;it can be used in conjunction with constrfield, which generates initial conditions phases from a list of local constraints on density, tidal field density gradient and velocity.
NASA Astrophysics Data System (ADS)
Vincenti, Henri; Vay, Jean-Luc
2018-07-01
The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to-scale global very high-order spectral methods. Indeed, FFT-based methods, which were very popular on shared memory computers, have been largely replaced by finite-difference (FD) methods for the solution of many problems, including plasmas simulations with electromagnetic Particle-In-Cell methods. For some problems, such as the modeling of so-called "plasma mirrors" for the generation of high-energy particles and ultra-short radiations, we have shown that the inaccuracies of standard FD-based PIC methods prevent the modeling on present supercomputers at sufficient accuracy. We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors in 3D.
GPU-accelerated computing for Lagrangian coherent structures of multi-body gravitational regimes
NASA Astrophysics Data System (ADS)
Lin, Mingpei; Xu, Ming; Fu, Xiaoyu
2017-04-01
Based on a well-established theoretical foundation, Lagrangian Coherent Structures (LCSs) have elicited widespread research on the intrinsic structures of dynamical systems in many fields, including the field of astrodynamics. Although the application of LCSs in dynamical problems seems straightforward theoretically, its associated computational cost is prohibitive. We propose a block decomposition algorithm developed on Compute Unified Device Architecture (CUDA) platform for the computation of the LCSs of multi-body gravitational regimes. In order to take advantage of GPU's outstanding computing properties, such as Shared Memory, Constant Memory, and Zero-Copy, the algorithm utilizes a block decomposition strategy to facilitate computation of finite-time Lyapunov exponent (FTLE) fields of arbitrary size and timespan. Simulation results demonstrate that this GPU-based algorithm can satisfy double-precision accuracy requirements and greatly decrease the time needed to calculate final results, increasing speed by approximately 13 times. Additionally, this algorithm can be generalized to various large-scale computing problems, such as particle filters, constellation design, and Monte-Carlo simulation.
Keerativittayayut, Ruedeerat; Aoki, Ryuta; Sarabi, Mitra Taghizadeh; Jimura, Koji; Nakahara, Kiyoshi
2018-06-18
Although activation/deactivation of specific brain regions have been shown to be predictive of successful memory encoding, the relationship between time-varying large-scale brain networks and fluctuations of memory encoding performance remains unclear. Here we investigated time-varying functional connectivity patterns across the human brain in periods of 30-40 s, which have recently been implicated in various cognitive functions. During functional magnetic resonance imaging, participants performed a memory encoding task, and their performance was assessed with a subsequent surprise memory test. A graph analysis of functional connectivity patterns revealed that increased integration of the subcortical, default-mode, salience, and visual subnetworks with other subnetworks is a hallmark of successful memory encoding. Moreover, multivariate analysis using the graph metrics of integration reliably classified the brain network states into the period of high (vs. low) memory encoding performance. Our findings suggest that a diverse set of brain systems dynamically interact to support successful memory encoding. © 2018, Keerativittayayut et al.
Multiple-User, Multitasking, Virtual-Memory Computer System
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1993-01-01
Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
NASA Astrophysics Data System (ADS)
Neklyudov, A. A.; Savenkov, V. N.; Sergeyez, A. G.
1984-06-01
Memories are improved by increasing speed or the memory volume on a single chip. The most effective means for increasing speeds in bipolar memories are current control circuits with the lowest extraction times for a specific power consumption (1/4 pJ/bit). The control current circuitry involves multistage current switches and circuits accelerating transient processes in storage elements and links. Circuit principles for the design of bipolar memories with maximum speeds for an assigned minimum of circuit topology are analyzed. Two main classes of storage with current control are considered: the ECL type and super-integrated injection type storage with data capacities of N = 1/4 and N 4/16, respectively. The circuits reduce logic voltage differentials and the volumes of lexical and discharge buses and control circuit buses. The limiting speed is determined by the antiinterference requirements of the memory in storage and extraction modes.
Brandon, Nicole R; Beike, Denise R; Cole, Holly E
2017-07-01
Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
Raising Concerns about Sharing and Reusing Large-Scale Mathematics Classroom Observation Video Data
ERIC Educational Resources Information Center
Ing, Marsha; Samkian, Artineh
2018-01-01
There are great opportunities and challenges to sharing large-scale mathematics classroom observation data. This Research Commentary describes the methodological opportunities and challenges and provides a specific example from a mathematics education research project to illustrate how the research questions and framework drove observational…
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
NASA Technical Reports Server (NTRS)
Horst, R. L.; Nordstrom, M. J.
1972-01-01
The partially populated oligatomic mass memory feasibility model is described and evaluated. A system was desired to verify the feasibility of the oligatomic (mirror) memory approach as applicable to large scale solid state mass memories.
Fault Tolerant Frequent Pattern Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan
FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less
Kim, Tae-Wook; Choi, Hyejung; Oh, Seung-Hwan; Jo, Minseok; Wang, Gunuk; Cho, Byungjin; Kim, Dong-Yu; Hwang, Hyunsang; Lee, Takhee
2009-01-14
The resistive switching characteristics of polyfluorene-derivative polymer material in a sub-micron scale via-hole device structure were investigated. The scalable via-hole sub-microstructure was fabricated using an e-beam lithographic technique. The polymer non-volatile memory devices varied in size from 40 x 40 microm(2) to 200 x 200 nm(2). From the scaling of junction size, the memory mechanism can be attributed to the space-charge-limited current with filamentary conduction. Sub-micron scale polymer memory devices showed excellent resistive switching behaviours such as a large ON/OFF ratio (I(ON)/I(OFF) approximately 10(4)), excellent device-to-device switching uniformity, good sweep endurance, and good retention times (more than 10,000 s). The successful operation of sub-micron scale memory devices of our polyfluorene-derivative polymer shows promise to fabricate high-density polymer memory devices.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
The broadcast of shared attention and its impact on political persuasion.
Shteynberg, Garriy; Bramlett, James M; Fles, Elizabeth H; Cameron, Jaclyn
2016-11-01
In democracies where multitudes yield political influence, so does broadcast media that reaches those multitudes. However, broadcast media may not be powerful simply because it reaches a certain audience, but because each of the recipients is aware of that fact. That is, watching broadcast media can evoke a state of shared attention, or the perception of simultaneous coattention with others. Whereas past research has investigated the effects of shared attention with a few socially close others (i.e., friends, acquaintances, minimal ingroup members), we examine the impact of shared attention with a multitude of unfamiliar others in the context of televised broadcasting. In this paper, we explore whether shared attention increases the psychological impact of televised political speeches, and whether fewer numbers of coattending others diminishes this effect. Five studies investigate whether the perception of simultaneous coattention, or shared attention, on a mass broadcasted political speech leads to more extreme judgments. The results indicate that the perception of synchronous coattention (as compared with coattending asynchronously and attending alone) renders persuasive speeches more persuasive, and unpersuasive speeches more unpersuasive. We also find that recall memory for the content of the speech mediates the effect of shared attention on political persuasion. The results are consistent with the notion that shared attention on mass broadcasted information results in deeper processing of the content, rendering judgments more extreme. In all, our findings imply that shared attention is a cognitive capacity that supports large-scale social coordination, where multitudes of people can cognitively prioritize simultaneously coattended information. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Performance upgrades to the MCNP6 burnup capability for large scale depletion calculations
Fensin, M. L.; Galloway, J. D.; James, M. R.
2015-04-11
The first MCNP based inline Monte Carlo depletion capability was officially released from the Radiation Safety Information and Computational Center as MCNPX 2.6.0. With the merger of MCNPX and MCNP5, MCNP6 combined the capability of both simulation tools, as well as providing new advanced technology, in a single radiation transport code. The new MCNP6 depletion capability was first showcased at the International Congress for Advancements in Nuclear Power Plants (ICAPP) meeting in 2012. At that conference the new capabilities addressed included the combined distributive and shared memory parallel architecture for the burnup capability, improved memory management, physics enhancements, and newmore » predictability as compared to the H.B Robinson Benchmark. At Los Alamos National Laboratory, a special purpose cluster named “tebow,” was constructed such to maximize available RAM per CPU, as well as leveraging swap space with solid state hard drives, to allow larger scale depletion calculations (allowing for significantly more burnable regions than previously examined). As the MCNP6 burnup capability was scaled to larger numbers of burnable regions, a noticeable slowdown was realized.This paper details two specific computational performance strategies for improving calculation speedup: (1) retrieving cross sections during transport; and (2) tallying mechanisms specific to burnup in MCNP. To combat this slowdown new performance upgrades were developed and integrated into MCNP6 1.2.« less
Dendrites, deep learning, and sequences in the hippocampus.
Bhalla, Upinder S
2017-10-12
The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
A microkernel design for component-based parallel numerical software systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.
1999-01-13
What is the minimal software infrastructure and what type of conventions are needed to simplify development of sophisticated parallel numerical application codes using a variety of software components that are not necessarily available as source code? We propose an opaque object-based model where the objects are dynamically loadable from the file system or network. The microkernel required to manage such a system needs to include, at most: (1) a few basic services, namely--a mechanism for loading objects at run time via dynamic link libraries, and consistent schemes for error handling and memory management; and (2) selected methods that all objectsmore » share, to deal with object life (destruction, reference counting, relationships), and object observation (viewing, profiling, tracing). We are experimenting with these ideas in the context of extensible numerical software within the ALICE (Advanced Large-scale Integrated Computational Environment) project, where we are building the microkernel to manage the interoperability among various tools for large-scale scientific simulations. This paper presents some preliminary observations and conclusions from our work with microkernel design.« less
Cross-Organizational Knowledge Sharing: Information Reuse in Small Organizations
ERIC Educational Resources Information Center
White, Kevin Forsyth
2010-01-01
Despite the potential value of leveraging organizational memory and expertise, small organizations have been unable to capitalize on its promised value. Existing solutions have largely side-stepped the unique needs of these organizations, which are relegated to systems designed to take advantage of large pools of experts or to use Internet sources…
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Temporal Organization of Sound Information in Auditory Memory.
Song, Kun; Luo, Huan
2017-01-01
Memory is a constructive and organizational process. Instead of being stored with all the fine details, external information is reorganized and structured at certain spatiotemporal scales. It is well acknowledged that time plays a central role in audition by segmenting sound inputs into temporal chunks of appropriate length. However, it remains largely unknown whether critical temporal structures exist to mediate sound representation in auditory memory. To address the issue, here we designed an auditory memory transferring study, by combining a previously developed unsupervised white noise memory paradigm with a reversed sound manipulation method. Specifically, we systematically measured the memory transferring from a random white noise sound to its locally temporal reversed version on various temporal scales in seven experiments. We demonstrate a U-shape memory-transferring pattern with the minimum value around temporal scale of 200 ms. Furthermore, neither auditory perceptual similarity nor physical similarity as a function of the manipulating temporal scale can account for the memory-transferring results. Our results suggest that sounds are not stored with all the fine spectrotemporal details but are organized and structured at discrete temporal chunks in long-term auditory memory representation.
Distributed shared memory for roaming large volumes.
Castanié, Laurent; Mion, Christophe; Cavin, Xavier; Lévy, Bruno
2006-01-01
We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming.
A mixed parallel strategy for the solution of coupled multi-scale problems at finite strains
NASA Astrophysics Data System (ADS)
Lopes, I. A. Rodrigues; Pires, F. M. Andrade; Reis, F. J. P.
2018-02-01
A mixed parallel strategy for the solution of homogenization-based multi-scale constitutive problems undergoing finite strains is proposed. The approach aims to reduce the computational time and memory requirements of non-linear coupled simulations that use finite element discretization at both scales (FE^2). In the first level of the algorithm, a non-conforming domain decomposition technique, based on the FETI method combined with a mortar discretization at the interface of macroscopic subdomains, is employed. A master-slave scheme, which distributes tasks by macroscopic element and adopts dynamic scheduling, is then used for each macroscopic subdomain composing the second level of the algorithm. This strategy allows the parallelization of FE^2 simulations in computers with either shared memory or distributed memory architectures. The proposed strategy preserves the quadratic rates of asymptotic convergence that characterize the Newton-Raphson scheme. Several examples are presented to demonstrate the robustness and efficiency of the proposed parallel strategy.
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less
The Development of Time-Based Prospective Memory in Childhood: The Role of Working Memory Updating
ERIC Educational Resources Information Center
Voigt, Babett; Mahy, Caitlin E. V.; Ellis, Judi; Schnitzspahn, Katharina; Krause, Ivonne; Altgassen, Mareike; Kliegel, Matthias
2014-01-01
This large-scale study examined the development of time-based prospective memory (PM) across childhood and the roles that working memory updating and time monitoring play in driving age effects in PM performance. One hundred and ninety-seven children aged 5 to 14 years completed a time-based PM task where working memory updating load was…
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.
ERIC Educational Resources Information Center
Schwartz, David G.
1998-01-01
Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
Wilhelm, Jan; Seewald, Patrick; Del Ben, Mauro; Hutter, Jürg
2016-12-13
We present an algorithm for computing the correlation energy in the random phase approximation (RPA) in a Gaussian basis requiring [Formula: see text] operations and [Formula: see text] memory. The method is based on the resolution of the identity (RI) with the overlap metric, a reformulation of RI-RPA in the Gaussian basis, imaginary time, and imaginary frequency integration techniques, and the use of sparse linear algebra. Additional memory reduction without extra computations can be achieved by an iterative scheme that overcomes the memory bottleneck of canonical RPA implementations. We report a massively parallel implementation that is the key for the application to large systems. Finally, cubic-scaling RPA is applied to a thousand water molecules using a correlation-consistent triple-ζ quality basis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, Andrew F.; Wetzstein, M.; Naab, T.
2009-10-01
We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Individual differences in false memory from misinformation: cognitive factors.
Zhu, Bi; Chen, Chuansheng; Loftus, Elizabeth F; Lin, Chongde; He, Qinghua; Chen, Chunhui; Li, He; Xue, Gui; Lu, Zhonglin; Dong, Qi
2010-07-01
This research investigated the cognitive correlates of false memories that are induced by the misinformation paradigm. A large sample of Chinese college students (N=436) participated in a misinformation procedure and also took a battery of cognitive tests. Results revealed sizable and systematic individual differences in false memory arising from exposure to misinformation. False memories were significantly and negatively correlated with measures of intelligence (measured with Raven's Advanced Progressive Matrices and Wechsler Adult Intelligence Scale), perception (Motor-Free Visual Perception Test, Change Blindness, and Tone Discrimination), memory (Wechsler Memory Scales and 2-back Working Memory tasks), and face judgement (Face Recognition and Facial Expression Recognition). These findings suggest that people with relatively low intelligence and poor perceptual abilities might be more susceptible to the misinformation effect.
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith
2014-08-25
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines themore » scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.« less
Gervais, Roger O; Ben-Porath, Yossef S; Wygant, Dustin B; Green, Paul
2008-12-01
The MMPI-2 Response Bias Scale (RBS) is designed to detect response bias in forensic neuropsychological and disability assessment settings. Validation studies have demonstrated that the scale is sensitive to cognitive response bias as determined by failure on the Word Memory Test (WMT) and other symptom validity tests. Exaggerated memory complaints are a common feature of cognitive response bias. The present study was undertaken to determine the extent to which the RBS is sensitive to memory complaints and how it compares in this regard to other MMPI-2 validity scales and indices. This archival study used MMPI-2 and Memory Complaints Inventory (MCI) data from 1550 consecutive non-head-injury disability-related referrals to the first author's private practice. ANOVA results indicated significant increases in memory complaints across increasing RBS score ranges with large effect sizes. Regression analyses indicated that the RBS was a better predictor of the mean memory complaints score than the F, F(B), and F(P) validity scales and the FBS. There was no correlation between the RBS and the CVLT, an objective measure of verbal memory. These findings suggest that elevated scores on the RBS are associated with over-reporting of memory problems, which provides further external validation of the RBS as a sensitive measure of cognitive response bias. Interpretive guidelines for the RBS are provided.
Recollection-Dependent Memory for Event Duration in Large-Scale Spatial Navigation
ERIC Educational Resources Information Center
Brunec, Iva K.; Ozubko, Jason D.; Barense, Morgan D.; Moscovitch, Morris
2017-01-01
Time and space represent two key aspects of episodic memories, forming the spatiotemporal context of events in a sequence. Little is known, however, about how temporal information, such as the duration and the order of particular events, are encoded into memory, and if it matters whether the memory representation is based on recollection or…
Semihierarchical quantum repeaters based on moderate lifetime quantum memories
NASA Astrophysics Data System (ADS)
Liu, Xiao; Zhou, Zong-Quan; Hua, Yi-Lin; Li, Chuan-Feng; Guo, Guang-Can
2017-01-01
The construction of large-scale quantum networks relies on the development of practical quantum repeaters. Many approaches have been proposed with the goal of outperforming the direct transmission of photons, but most of them are inefficient or difficult to implement with current technology. Here, we present a protocol that uses a semihierarchical structure to improve the entanglement distribution rate while reducing the requirement of memory time to a range of tens of milliseconds. This protocol can be implemented with a fixed distance of elementary links and fixed requirements on quantum memories, which are independent of the total distance. This configuration is especially suitable for scalable applications in large-scale quantum networks.
Applications considerations in the system design of highly concurrent multiprocessors
NASA Technical Reports Server (NTRS)
Lundstrom, Stephen F.
1987-01-01
A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
A real-time multi-scale 2D Gaussian filter based on FPGA
NASA Astrophysics Data System (ADS)
Luo, Haibo; Gai, Xingqin; Chang, Zheng; Hui, Bin
2014-11-01
Multi-scale 2-D Gaussian filter has been widely used in feature extraction (e.g. SIFT, edge etc.), image segmentation, image enhancement, image noise removing, multi-scale shape description etc. However, their computational complexity remains an issue for real-time image processing systems. Aimed at this problem, we propose a framework of multi-scale 2-D Gaussian filter based on FPGA in this paper. Firstly, a full-hardware architecture based on parallel pipeline was designed to achieve high throughput rate. Secondly, in order to save some multiplier, the 2-D convolution is separated into two 1-D convolutions. Thirdly, a dedicate first in first out memory named as CAFIFO (Column Addressing FIFO) was designed to avoid the error propagating induced by spark on clock. Finally, a shared memory framework was designed to reduce memory costs. As a demonstration, we realized a 3 scales 2-D Gaussian filter on a single ALTERA Cyclone III FPGA chip. Experimental results show that, the proposed framework can computing a Multi-scales 2-D Gaussian filtering within one pixel clock period, is further suitable for real-time image processing. Moreover, the main principle can be popularized to the other operators based on convolution, such as Gabor filter, Sobel operator and so on.
Liu, Qin; Ulloa, Antonio; Horwitz, Barry
2017-11-01
Many cognitive and computational models have been proposed to help understand working memory. In this article, we present a simulation study of cortical processing of visual objects during several working memory tasks using an extended version of a previously constructed large-scale neural model [Tagamets, M. A., & Horwitz, B. Integrating electrophysiological and anatomical experimental data to create a large-scale model that simulates a delayed match-to-sample human brain imaging study. Cerebral Cortex, 8, 310-320, 1998]. The original model consisted of arrays of Wilson-Cowan type of neuronal populations representing primary and secondary visual cortices, inferotemporal (IT) cortex, and pFC. We added a module representing entorhinal cortex, which functions as a gating module. We successfully implemented multiple working memory tasks using the same model and produced neuronal patterns in visual cortex, IT cortex, and pFC that match experimental findings. These working memory tasks can include distractor stimuli or can require that multiple items be retained in mind during a delay period (Sternberg's task). Besides electrophysiology data and behavioral data, we also generated fMRI BOLD time series from our simulation. Our results support the involvement of IT cortex in working memory maintenance and suggest the cortical architecture underlying the neural mechanisms mediating particular working memory tasks. Furthermore, we noticed that, during simulations of memorizing a list of objects, the first and last items in the sequence were recalled best, which may implicate the neural mechanism behind this important psychological effect (i.e., the primacy and recency effect).
NASA Technical Reports Server (NTRS)
Plesea, Lucian
2006-01-01
A computer program automatically builds large, full-resolution mosaics of multispectral images of Earth landmasses from images acquired by Landsat 7, complete with matching of colors and blending between adjacent scenes. While the code has been used extensively for Landsat, it could also be used for other data sources. A single mosaic of as many as 8,000 scenes, represented by more than 5 terabytes of data and the largest set produced in this work, demonstrated what the code could do to provide global coverage. The program first statistically analyzes input images to determine areas of coverage and data-value distributions. It then transforms the input images from their original universal transverse Mercator coordinates to other geographical coordinates, with scaling. It applies a first-order polynomial brightness correction to each band in each scene. It uses a data-mask image for selecting data and blending of input scenes. Under control by a user, the program can be made to operate on small parts of the output image space, with check-point and restart capabilities. The program runs on SGI IRIX computers. It is capable of parallel processing using shared-memory code, large memories, and tens of central processing units. It can retrieve input data and store output data at locations remote from the processors on which it is executed.
Xyce parallel electronic simulator users guide, version 6.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Data management strategies for multinational large-scale systems biology projects.
Wruck, Wasco; Peuker, Martin; Regenbrecht, Christian R A
2014-01-01
Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don't Embrace Open Access. The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects.
Data management strategies for multinational large-scale systems biology projects
Peuker, Martin; Regenbrecht, Christian R.A.
2014-01-01
Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don’t Embrace Open Access. The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects. PMID:23047157
The influence of cognitive load on spatial search performance.
Longstaffe, Kate A; Hood, Bruce M; Gilchrist, Iain D
2014-01-01
During search, executive function enables individuals to direct attention to potential targets, remember locations visited, and inhibit distracting information. In the present study, we investigated these executive processes in large-scale search. In our tasks, participants searched a room containing an array of illuminated locations embedded in the floor. The participants' task was to press the switches at the illuminated locations on the floor so as to locate a target that changed color when pressed. The perceptual salience of the search locations was manipulated by having some locations flashing and some static. Participants were more likely to search at flashing locations, even when they were explicitly informed that the target was equally likely to be at any location. In large-scale search, attention was captured by the perceptual salience of the flashing lights, leading to a bias to explore these targets. Despite this failure of inhibition, participants were able to restrict returns to previously visited locations, a measure of spatial memory performance. Participants were more able to inhibit exploration to flashing locations when they were not required to remember which locations had previously been visited. A concurrent digit-span memory task further disrupted inhibition during search, as did a concurrent auditory attention task. These experiments extend a load theory of attention to large-scale search, which relies on egocentric representations of space. High cognitive load on working memory leads to increased distractor interference, providing evidence for distinct roles for the executive subprocesses of memory and inhibition during large-scale search.
C-MOS array design techniques: SUMC multiprocessor system study
NASA Technical Reports Server (NTRS)
Clapp, W. A.; Helbig, W. A.; Merriam, A. S.
1972-01-01
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
NASA Technical Reports Server (NTRS)
Bates, Kevin R.; Daniels, Andrew D.; Scuseria, Gustavo E.
1998-01-01
We report a comparison of two linear-scaling methods which avoid the diagonalization bottleneck of traditional electronic structure algorithms. The Chebyshev expansion method (CEM) is implemented for carbon tight-binding calculations of large systems and its memory and timing requirements compared to those of our previously implemented conjugate gradient density matrix search (CG-DMS). Benchmark calculations are carried out on icosahedral fullerenes from C60 to C8640 and the linear scaling memory and CPU requirements of the CEM demonstrated. We show that the CPU requisites of the CEM and CG-DMS are similar for calculations with comparable accuracy.
Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Zheng, Yili
2015-10-26
Structured-grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained messagesizes without manual packing and unpacking. This approach is the distributed analogue of shared-memory programming, taking advantage of the global addressspace in PGAS languages to provide substantial programming ease. However, its performance may suffer from the large number of small messages. We investigate theruntime support required in the UPC ++ library for this naturally grained version to close the performance gapmore » between the two approaches and attain comparable performance at scale using the High-Performance Geometric Multgrid (HPGMG-FV) benchmark as a driver.« less
The neural basis of involuntary episodic memories.
Hall, Shana A; Rubin, David C; Miles, Amanda; Davis, Simon W; Wing, Erik A; Cabeza, Roberto; Berntsen, Dorthe
2014-10-01
Voluntary episodic memories require an intentional memory search, whereas involuntary episodic memories come to mind spontaneously without conscious effort. Cognitive neuroscience has largely focused on voluntary memory, leaving the neural mechanisms of involuntary memory largely unknown. We hypothesized that, because the main difference between voluntary and involuntary memory is the controlled retrieval processes required by the former, there would be greater frontal activity for voluntary than involuntary memories. Conversely, we predicted that other components of the episodic retrieval network would be similarly engaged in the two types of memory. During encoding, all participants heard sounds, half paired with pictures of complex scenes and half presented alone. During retrieval, paired and unpaired sounds were presented, panned to the left or to the right. Participants in the involuntary group were instructed to indicate the spatial location of the sound, whereas participants in the voluntary group were asked to additionally recall the pictures that had been paired with the sounds. All participants reported the incidence of their memories in a postscan session. Consistent with our predictions, voluntary memories elicited greater activity in dorsal frontal regions than involuntary memories, whereas other components of the retrieval network, including medial-temporal, ventral occipitotemporal, and ventral parietal regions were similarly engaged by both types of memories. These results clarify the distinct role of dorsal frontal and ventral occipitotemporal regions in predicting strategic retrieval and recalled information, respectively, and suggest that, although there are neural differences in retrieval, involuntary memories share neural components with established voluntary memory systems.
The Neural Basis of Involuntary Episodic Memories
Hall, Shana A.; Rubin, David C.; Miles, Amanda; Davis, Simon W.; Wing, Erik A.; Cabeza, Roberto; Berntsen, Dorthe
2014-01-01
Voluntary episodic memories require an intentional memory search, whereas involuntary episodic memories come to mind spontaneously without conscious effort. Cognitive neuroscience has largely focused on voluntary memory, leaving the neural mechanisms of involuntary memory largely unknown. We hypothesized that because the main difference between voluntary and involuntary memory is the controlled retrieval processes required by the former, there would be greater frontal activity for voluntary than involuntary memories. Conversely, we predicted that other components of the episodic retrieval network would be similarly engaged in the two types of memory. During encoding, all participants heard sounds, half paired with pictures of complex scenes and half presented alone. During retrieval, paired and unpaired sounds were presented panned to the left or to the right. Participants in the involuntary group were instructed to indicate the spatial location of the sound, whereas participants in the voluntary group were asked to additionally recall the pictures that had been paired with the sounds. All participants reported the incidence of their memories in a post-scan session. Consistent with our predictions, voluntary memories elicited greater activity in dorsal frontal regions than involuntary memories, whereas other components of the retrieval network, including medial temporal, ventral occipitotemporal, and ventral parietal regions were similarly engaged by both types of memories. These results clarify the distinct role of dorsal frontal and ventral occipitotemporal regions in predicting strategic retrieval and recalled information, respectively, and suggest that while there are neural differences in retrieval, involuntary memories share neural components with established voluntary memory systems. PMID:24702453
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning
Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron; ...
2017-11-21
Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron
Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Large-Scale Image Analytics Using Deep Learning
NASA Astrophysics Data System (ADS)
Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.
2014-12-01
High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The AWS core components that we use to solve this problem are DynamoDB along with S3 for database query and storage, ElastiCache shared memory architecture for image segmentation, Elastic Map Reduce (EMR) for image feature extraction, and the memory optimized Elastic Cloud Compute (EC2) for the learning algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xiaoliang; Stauffer, Philip H.
This effort is designed to expedite learnings from existing and planned large demonstration projects and their associated research through effective knowledge sharing among participants in the US and China.
Episodic memory in aspects of large-scale brain networks
Jeong, Woorim; Chung, Chun Kee; Kim, June Sic
2015-01-01
Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL) structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network (DMN). Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network (RSN). Altered patterns of functional connectivity (FC) among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment. PMID:26321939
Volatility return intervals analysis of the Japanese market
NASA Astrophysics Data System (ADS)
Jung, W.-S.; Wang, F. Z.; Havlin, S.; Kaizoji, T.; Moon, H.-T.; Stanley, H. E.
2008-03-01
We investigate scaling and memory effects in return intervals between price volatilities above a certain threshold q for the Japanese stock market using daily and intraday data sets. We find that the distribution of return intervals can be approximated by a scaling function that depends only on the ratio between the return interval τ and its mean <τ>. We also find memory effects such that a large (or small) return interval follows a large (or small) interval by investigating the conditional distribution and mean return interval. The results are similar to previous studies of other markets and indicate that similar statistical features appear in different financial markets. We also compare our results between the period before and after the big crash at the end of 1989. We find that scaling and memory effects of the return intervals show similar features although the statistical properties of the returns are different.
Visualization Co-Processing of a CFD Simulation
NASA Technical Reports Server (NTRS)
Vaziri, Arsi
1999-01-01
OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Balley, David H. (Technical Monitor)
1997-01-01
Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Three dimensions of dissociative amnesia.
Dell, Paul F
2013-01-01
Principal axis factor analysis with promax rotation extracted 3 factors from the 42 memory and amnesia items of the Multidimensional Inventory of Dissociation (MID) database (N = 2,569): Discovering Dissociated Actions, Lapses of Recent Memory and Skills, and Gaps in Remote Memory. The 3 factors' shared variance ranged from 36% to 64%. Construed as scales, the 3 factor scales had Cronbach's alpha coefficients of .96, .94, and .93, respectively. The scales correlated strongly with mean Dissociative Experiences Scale scores, mean MID scores, and total scores on the Structured Clinical Interview for DSM-IV Dissociative Disorders-Revised (SCID-D-R). What is interesting is that the 3 amnesia factors exhibited a range of correlations with SCID-D-R Amnesia scores (.52, .63, and .70, respectively), suggesting that the SCID-D-R Amnesia score emphasizes gaps in remote memory over amnesias related to dissociative identity disorder. The 3 amnesia factor scales exhibited a clinically meaningful pattern of significant differences among dissociative identity disorder, dissociative disorder not otherwise specified-1, dissociative amnesia, depersonalization disorder, and nonclinical participants. The 3 amnesia factors may have greater clinical utility for frontline clinicians than (a) amnesia as discussed in the context of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, nosology of the dissociative disorders or (b) P. Janet's (1893/1977 ) 4-fold classification of dissociative amnesia. The author recommends systematic study of the phenomenological differences within specific dissociative symptoms and their differential relationship to specific dissociative disorders.
Low rank approximation methods for MR fingerprinting with large scale dictionaries.
Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra
2018-04-01
This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Seidman, Larry J.; Hellemann, Gerhard; Nuechterlein, Keith H.; Greenwood, Tiffany A.; Braff, David L.; Cadenhead, Kristin S.; Calkins, Monica E.; Freedman, Robert; Gur, Raquel E.; Gur, Ruben C.; Lazzeroni, Laura C.; Light, Gregory A.; Olincy, Ann; Radant, Allen D.; Siever, Larry J.; Silverman, Jeremy M.; Sprock, Joyce; Stone, William S.; Sugar, Catherine; Swerdlow, Neal R.; Tsuang, Debby W.; Tsuang, Ming T.; Turetsky, Bruce I.; Green, Michael F.
2018-01-01
Background Although many endophenotypes for schizophrenia have been studied individually, few studies have examined the extent to which common neurocognitive and neurophysiological measures reflect shared versus unique endophenotypic factors. It may be possible to distill individual endophenotypes into composite measures that reflect dissociable, genetically informative elements. Methods The first phase of the Consortium on the Genetics of Schizophrenia (COGS-1) is a multisite family study that collected neurocognitive and neurophysiological data between 2003–2008. For these analyses, participants included schizophrenia probands (n=83), their nonpsychotic siblings (n=151), and community comparison subjects (n=209) with complete data on a battery of 12 neurocognitive tests (assessing domains of working memory, declarative memory, vigilance, spatial ability, abstract reasoning, facial emotion processing, and motor speed) and 3 neurophysiological tasks reflecting inhibitory processing (P50 gating, prepulse inhibition and antisaccade tasks). Factor analyses were conducted on the measures for each subject group and across the entire sample. Heritability analyses of factors were performed using SOLAR. Results Analyses yielded 5 distinct factors: 1) Episodic Memory, 2) Working Memory, 3) Perceptual Vigilance, 4) Visual Abstraction, and 5) Inhibitory Processing. Neurophysiological measures had low associations with these factors. The factor structure of endophenotypes was largely comparable across probands, siblings and controls. Significant heritability estimates for the factors ranged from 22% (Episodic Memory) to 39% (Visual Abstraction). Conclusions Neurocognitive measures reflect a meaningful amount of shared variance whereas the neurophysiological measures reflect largely unique contributions as endophenotypes for schizophrenia. Composite endophenotype measures may inform our neurobiological and genetic understanding of schizophrenia. PMID:25682549
Seidman, Larry J; Hellemann, Gerhard; Nuechterlein, Keith H; Greenwood, Tiffany A; Braff, David L; Cadenhead, Kristin S; Calkins, Monica E; Freedman, Robert; Gur, Raquel E; Gur, Ruben C; Lazzeroni, Laura C; Light, Gregory A; Olincy, Ann; Radant, Allen D; Siever, Larry J; Silverman, Jeremy M; Sprock, Joyce; Stone, William S; Sugar, Catherine; Swerdlow, Neal R; Tsuang, Debby W; Tsuang, Ming T; Turetsky, Bruce I; Green, Michael F
2015-04-01
Although many endophenotypes for schizophrenia have been studied individually, few studies have examined the extent to which common neurocognitive and neurophysiological measures reflect shared versus unique endophenotypic factors. It may be possible to distill individual endophenotypes into composite measures that reflect dissociable, genetically informative elements. The first phase of the Consortium on the Genetics of Schizophrenia (COGS-1) is a multisite family study that collected neurocognitive and neurophysiological data between 2003 and 2008. For these analyses, participants included schizophrenia probands (n=83), their nonpsychotic siblings (n=151), and community comparison subjects (n=209) with complete data on a battery of 12 neurocognitive tests (assessing domains of working memory, declarative memory, vigilance, spatial ability, abstract reasoning, facial emotion processing, and motor speed) and 3 neurophysiological tasks reflecting inhibitory processing (P50 gating, prepulse inhibition and antisaccade tasks). Factor analyses were conducted on the measures for each subject group and across the entire sample. Heritability analyses of factors were performed using SOLAR. Analyses yielded 5 distinct factors: 1) Episodic Memory, 2) Working Memory, 3) Perceptual Vigilance, 4) Visual Abstraction, and 5) Inhibitory Processing. Neurophysiological measures had low associations with these factors. The factor structure of endophenotypes was largely comparable across probands, siblings and controls. Significant heritability estimates for the factors ranged from 22% (Episodic Memory) to 39% (Visual Abstraction). Neurocognitive measures reflect a meaningful amount of shared variance whereas the neurophysiological measures reflect largely unique contributions as endophenotypes for schizophrenia. Composite endophenotype measures may inform our neurobiological and genetic understanding of schizophrenia. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola
2017-04-01
In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU
Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui
2015-01-01
Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
Time Constraints and Resource Sharing in Adults' Working Memory Spans
ERIC Educational Resources Information Center
Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie
2004-01-01
This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…
Optimized Laplacian image sharpening algorithm based on graphic processing unit
NASA Astrophysics Data System (ADS)
Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah
2014-12-01
In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
Intrahemispheric theta rhythm desynchronization impairs working memory.
Alekseichuk, Ivan; Pabel, Stefanie Corinna; Antal, Andrea; Paulus, Walter
2017-01-01
There is a growing interest in large-scale connectivity as one of the crucial factors in working memory. Correlative evidence has revealed the anatomical and electrophysiological players in the working memory network, but understanding of the effective role of their connectivity remains elusive. In this double-blind, placebo-controlled study we aimed to identify the causal role of theta phase connectivity in visual-spatial working memory. The frontoparietal network was over- or de-synchronized in the anterior-posterior direction by multi-electrode, 6 Hz transcranial alternating current stimulation (tACS). A decrease in memory performance and increase in reaction time was caused by frontoparietal intrahemispheric desynchronization. According to the diffusion drift model, this originated in a lower signal-to-noise ratio, known as the drift rate index, in the memory system. The EEG analysis revealed a corresponding decrease in phase connectivity between prefrontal and parietal areas after tACS-driven desynchronization. The over-synchronization did not result in any changes in either the behavioral or electrophysiological levels in healthy participants. Taken together, we demonstrate the feasibility of manipulating multi-site large-scale networks in humans, and the disruptive effect of frontoparietal desynchronization on theta phase connectivity and visual-spatial working memory.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Power and Performance Trade-offs for Space Time Adaptive Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino
Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Wang, Qi; Lee, Dasom; Hou, Yubo
2017-07-01
Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
Yang, C L; Wei, H Y; Adler, A; Soleimani, M
2013-06-01
Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, P.A.
1988-02-01
The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.
Li, Wenyuan; Gong, Ke; Li, Qingjiao; Alber, Frank; Zhou, Xianghong Jasmine
2015-03-15
Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/. © The Author 2014. Published by Oxford University Press.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Constructing Memory, Imagination, and Empathy: A Cognitive Neuroscience Perspective
Gaesser, Brendan
2012-01-01
Studies on memory, imagination, and empathy have largely progressed in isolation. Consequently, humans’ empathic tendencies to care about and help other people are considered independent of our ability to remember and imagine events. Despite this theoretical autonomy, work from across psychology, and neuroscience suggests that these cognitive abilities may be linked. In the present paper, I tentatively propose that humans’ ability to vividly imagine specific events (as supported by constructive memory) may facilitate prosocial intentions and behavior. Evidence of a relationship between memory, imagination, and empathy comes from research that shows imagination influences the perceived and actual likelihood an event occurs, improves intergroup relations, and shares a neural basis with memory and empathy. Although many questions remain, this paper outlines a new direction for research that investigates the role of imagination in promoting empathy and prosocial behavior. PMID:23440064
Recent Improvements in Aerodynamic Design Optimization on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Nielsen, Eric J.; Anderson, W. Kyle
2000-01-01
Recent improvements in an unstructured-grid method for large-scale aerodynamic design are presented. Previous work had shown such computations to be prohibitively long in a sequential processing environment. Also, robust adjoint solutions and mesh movement procedures were difficult to realize, particularly for viscous flows. To overcome these limiting factors, a set of design codes based on a discrete adjoint method is extended to a multiprocessor environment using a shared memory approach. A nearly linear speedup is demonstrated, and the consistency of the linearizations is shown to remain valid. The full linearization of the residual is used to precondition the adjoint system, and a significantly improved convergence rate is obtained. A new mesh movement algorithm is implemented and several advantages over an existing technique are presented. Several design cases are shown for turbulent flows in two and three dimensions.
An interactive display system for large-scale 3D models
NASA Astrophysics Data System (ADS)
Liu, Zijian; Sun, Kun; Tao, Wenbing; Liu, Liman
2018-04-01
With the improvement of 3D reconstruction theory and the rapid development of computer hardware technology, the reconstructed 3D models are enlarging in scale and increasing in complexity. Models with tens of thousands of 3D points or triangular meshes are common in practical applications. Due to storage and computing power limitation, it is difficult to achieve real-time display and interaction with large scale 3D models for some common 3D display software, such as MeshLab. In this paper, we propose a display system for large-scale 3D scene models. We construct the LOD (Levels of Detail) model of the reconstructed 3D scene in advance, and then use an out-of-core view-dependent multi-resolution rendering scheme to realize the real-time display of the large-scale 3D model. With the proposed method, our display system is able to render in real time while roaming in the reconstructed scene and 3D camera poses can also be displayed. Furthermore, the memory consumption can be significantly decreased via internal and external memory exchange mechanism, so that it is possible to display a large scale reconstructed scene with over millions of 3D points or triangular meshes in a regular PC with only 4GB RAM.
Programmable Direct-Memory-Access Controller
NASA Technical Reports Server (NTRS)
Hendry, David F.
1990-01-01
Proposed programmable direct-memory-access controller (DMAC) operates with computer systems of 32000 series, which have 32-bit data buses and use addresses of 24 (or potentially 32) bits. Controller functions with or without help of central processing unit (CPU) and starts itself. Includes such advanced features as ability to compare two blocks of memory for equality and to search block of memory for specific value. Made as single very-large-scale integrated-circuit chip.
Living Design Memory: Framework, Implementation, Lessons Learned.
ERIC Educational Resources Information Center
Terveen, Loren G.; And Others
1995-01-01
Discusses large-scale software development and describes the development of the Designer Assistant to improve software development effectiveness. Highlights include the knowledge management problem; related work, including artificial intelligence and expert systems, software process modeling research, and other approaches to organizational memory;…
Deformation and Failure Mechanisms of Shape Memory Alloys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daly, Samantha Hayes
2015-04-15
The goal of this research was to understand the fundamental mechanics that drive the deformation and failure of shape memory alloys (SMAs). SMAs are difficult materials to characterize because of the complex phase transformations that give rise to their unique properties, including shape memory and superelasticity. These phase transformations occur across multiple length scales (one example being the martensite-austenite twinning that underlies macroscopic strain localization) and result in a large hysteresis. In order to optimize the use of this hysteretic behavior in energy storage and damping applications, we must first have a quantitative understanding of this transformation behavior. Prior resultsmore » on shape memory alloys have been largely qualitative (i.e., mapping phase transformations through cracked oxide coatings or surface morphology). The PI developed and utilized new approaches to provide a quantitative, full-field characterization of phase transformation, conducting a comprehensive suite of experiments across multiple length scales and tying these results to theoretical and computational analysis. The research funded by this award utilized new combinations of scanning electron microscopy, diffraction, digital image correlation, and custom testing equipment and procedures to study phase transformation processes at a wide range of length scales, with a focus at small length scales with spatial resolution on the order of 1 nanometer. These experiments probe the basic connections between length scales during phase transformation. In addition to the insights gained on the fundamental mechanisms driving transformations in shape memory alloys, the unique experimental methodologies developed under this award are applicable to a wide range of solid-to-solid phase transformations and other strain localization mechanisms.« less
ERIC Educational Resources Information Center
Langley, Hillary A.; Coffman, Jennifer L.; Ornstein, Peter A.
2017-01-01
Data from a large-scale, longitudinal research study with an ethnically and socioeconomically diverse sample were utilized to explore linkages between maternal elaborative conversational style and the development of children's autobiographical and deliberate memory. Assessments were made when the children were aged 3, 5, and 6 years old, and the…
ERIC Educational Resources Information Center
Perfect, Timothy J.; Weber, Nathan
2012-01-01
Explorations of memory accuracy control normally contrast forced-report with free-report performance across a set of items and show a trade-off between memory quantity and accuracy. However, this memory control framework has not been tested with lineup identifications that may involve rejection of all alternatives. A large-scale (N = 439) lineup…
Xyce Parallel Electronic Simulator Users' Guide Version 6.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
ERIC Educational Resources Information Center
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie
2009-01-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale
NASA Astrophysics Data System (ADS)
Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.
2016-12-01
Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Jamieson, Matthew; Cullen, Breda; McGee-Lennon, Marilyn; Brewster, Stephen; Evans, Jonathan J
2014-01-01
Technology can compensate for memory impairment. The efficacy of assistive technology for people with memory difficulties and the methodology of selected studies are assessed. A systematic search was performed and all studies that investigated the impact of technology on memory performance for adults with impaired memory resulting from acquired brain injury (ABI) or a degenerative disease were included. Two 10-point scales were used to compare each study to an ideally reported single case experimental design (SCED) study (SCED scale; Tate et al., 2008 ) or randomised control group study (PEDro-P scale; Maher, Sherrington, Herbert, Moseley, & Elkins, 2003 ). Thirty-two SCED (mean = 5.9 on the SCED scale) and 11 group studies (mean = 4.45 on the PEDro-P scale) were found. Baseline and intervention performance for each participant in the SCED studies was re-calculated using non-overlap of all pairs (Parker & Vannest, 2009 ) giving a mean score of 0.85 on a 0 to 1 scale (17 studies, n = 36). A meta-analysis of the efficacy of technology vs. control in seven group studies gave a large effect size (d = 1.27) (n = 147). It was concluded that prosthetic technology can improve performance on everyday tasks requiring memory. There is a specific need for investigations of technology for people with degenerative diseases.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Butler, Bryan P.
1990-01-01
The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Wang, Qi
2006-01-01
The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.
Missing from the "Minority Mainstream": Pahari-Speaking Diaspora in Britain
ERIC Educational Resources Information Center
Hussain, Serena
2015-01-01
Pahari speakers form one of the largest ethnic non-European diasporas in Britain. Despite their size and over 60 years of settlement on British shores, the diaspora is shrouded by confusion regarding official and unofficial categorisations, remaining largely misunderstood as a collective with a shared ethnolinguistic memory. This has had…
A Multi-scale Cognitive Approach to Intrusion Detection and Response
2015-12-28
the behavior of the traffic on the network, either by using mathematical formulas or by replaying packet streams. As a result, simulators depend...large scale. Summary of the most important results We obtained a powerful machine, which has 768 cores and 1.25 TB memory . RBG has been...time. Each client is configured with 1GB memory , 10 GB disk space, and one 100M Ethernet interface. The server nodes include web servers
How institutions shaped the last major evolutionary transition to large-scale human societies
Powers, Simon T.; van Schaik, Carel P.; Lehmann, Laurent
2016-01-01
What drove the transition from small-scale human societies centred on kinship and personal exchange, to large-scale societies comprising cooperation and division of labour among untold numbers of unrelated individuals? We propose that the unique human capacity to negotiate institutional rules that coordinate social actions was a key driver of this transition. By creating institutions, humans have been able to move from the default ‘Hobbesian’ rules of the ‘game of life’, determined by physical/environmental constraints, into self-created rules of social organization where cooperation can be individually advantageous even in large groups of unrelated individuals. Examples include rules of food sharing in hunter–gatherers, rules for the usage of irrigation systems in agriculturalists, property rights and systems for sharing reputation between mediaeval traders. Successful institutions create rules of interaction that are self-enforcing, providing direct benefits both to individuals that follow them, and to individuals that sanction rule breakers. Forming institutions requires shared intentionality, language and other cognitive abilities largely absent in other primates. We explain how cooperative breeding likely selected for these abilities early in the Homo lineage. This allowed anatomically modern humans to create institutions that transformed the self-reliance of our primate ancestors into the division of labour of large-scale human social organization. PMID:26729937
Supporting shared data structures on distributed memory architectures
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Sosso, Gabriele C; Miceli, Giacomo; Caravati, Sebastiano; Giberti, Federico; Behler, Jörg; Bernasconi, Marco
2013-12-19
Phase change materials are of great interest as active layers in rewritable optical disks and novel electronic nonvolatile memories. These applications rest on a fast and reversible transformation between the amorphous and crystalline phases upon heating, taking place on the nanosecond time scale. In this work, we investigate the microscopic origin of the fast crystallization process by means of large-scale molecular dynamics simulations of the phase change compound GeTe. To this end, we use an interatomic potential generated from a Neural Network fitting of a large database of ab initio energies. We demonstrate that in the temperature range of the programming protocols of the electronic memories (500-700 K), nucleation of the crystal in the supercooled liquid is not rate-limiting. In this temperature range, the growth of supercritical nuclei is very fast because of a large atomic mobility, which is, in turn, the consequence of the high fragility of the supercooled liquid and the associated breakdown of the Stokes-Einstein relation between viscosity and diffusivity.
SAHAYOG: A Testbed for Load Sharing under Failure,
1987-07-01
messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2007-09-01
This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Test and Evaluation of Architecture-Aware Compiler Environment
2011-11-01
biology, medicine, social sciences , and security applications. Challenges include extremely large graphs (the Facebook friend network has over...Operations with Temporal Binning ....................................................................... 32 4.12 Memory behavior and Energy per...five challenge problems empirically, exploring their scaling properties, computation and datatype needs, memory behavior , and temporal behavior
Memory Benchmarks for SMP-Based High Performance Parallel Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A B; de Supinski, B; Mueller, F
2001-11-20
As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Manipulations of attention dissociate fragile visual short-term memory from visual working memory.
Vandenbroucke, Annelinde R E; Sligte, Ilja G; Lamme, Victor A F
2011-05-01
People often rely on information that is no longer in view, but maintained in visual short-term memory (VSTM). Traditionally, VSTM is thought to operate on either a short time-scale with high capacity - iconic memory - or a long time scale with small capacity - visual working memory. Recent research suggests that in addition, an intermediate stage of memory in between iconic memory and visual working memory exists. This intermediate stage has a large capacity and a lifetime of several seconds, but is easily overwritten by new stimulation. We therefore termed it fragile VSTM. In previous studies, fragile VSTM has been dissociated from iconic memory by the characteristics of the memory trace. In the present study, we dissociated fragile VSTM from visual working memory by showing a differentiation in their dependency on attention. A decrease in attention during presentation of the stimulus array greatly reduced the capacity of visual working memory, while this had only a small effect on the capacity of fragile VSTM. We conclude that fragile VSTM is a separate memory store from visual working memory. Thus, a tripartite division of VSTM appears to be in place, comprising iconic memory, fragile VSTM and visual working memory. Copyright © 2011 Elsevier Ltd. All rights reserved.
Sugiura, Lisa; Ojima, Shiro; Matsuba-Kurita, Hiroko; Dan, Ippeita; Tsuzuki, Daisuke; Katura, Takusige; Hagiwara, Hiroko
2015-10-01
Previous neuroimaging studies in adults have revealed that first and second languages (L1/L2) share similar neural substrates, and that proficiency is a major determinant of the neural organization of L2 in the lexical-semantic and syntactic domains. However, little is known about neural substrates of children in the phonological domain, or about sex differences. Here, we conducted a large-scale study (n = 484) of school-aged children using functional near-infrared spectroscopy and a word repetition task, which requires a great extent of phonological processing. We investigated cortical activation during word processing, emphasizing sex differences, to clarify similarities and differences between L1 and L2, and proficiency-related differences during early L2 learning. L1 and L2 shared similar neural substrates with decreased activation in L2 compared to L1 in the posterior superior/middle temporal and angular/supramarginal gyri for both sexes. Significant sex differences were found in cortical activation within language areas during high-frequency word but not during low-frequency word processing. During high-frequency word processing, widely distributed areas including the angular/supramarginal gyri were activated in boys, while more restricted areas, excluding the angular/supramarginal gyri were activated in girls. Significant sex differences were also found in L2 proficiency-related activation: activation significantly increased with proficiency in boys, whereas no proficiency-related differences were found in girls. Importantly, cortical sex differences emerged with proficiency. Based on previous research, the present results indicate that sex differences are acquired or enlarged during language development through different cognitive strategies between sexes, possibly reflecting their different memory functions. © 2015 Wiley Periodicals, Inc.
Working memory resources are shared across sensory modalities.
Salmela, V R; Moisala, M; Alho, K
2014-10-01
A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
ERIC Educational Resources Information Center
Feuer, Michael J.
2011-01-01
Few arguments about education are as effective at galvanizing public attention and motivating political action as those that compare the performance of students with their counterparts in other countries and that connect academic achievement to economic performance. Because data from international large-scale assessments (ILSA) have a powerful…
ERIC Educational Resources Information Center
Kraemer, David J. M.; Schinazi, Victor R.; Cawkwell, Philip B.; Tekriwal, Anand; Epstein, Russell A.; Thompson-Schill, Sharon L.
2017-01-01
Using novel virtual cities, we investigated the influence of verbal and visual strategies on the encoding of navigation-relevant information in a large-scale virtual environment. In 2 experiments, participants watched videos of routes through 4 virtual cities and were subsequently tested on their memory for observed landmarks and their ability to…
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Distributed simulation using a real-time shared memory network
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.
1993-01-01
The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Some aspects of control of a large-scale dynamic system
NASA Technical Reports Server (NTRS)
Aoki, M.
1975-01-01
Techniques of predicting and/or controlling the dynamic behavior of large scale systems are discussed in terms of decentralized decision making. Topics discussed include: (1) control of large scale systems by dynamic team with delayed information sharing; (2) dynamic resource allocation problems by a team (hierarchical structure with a coordinator); and (3) some problems related to the construction of a model of reduced dimension.
Multiprocessor shared-memory information exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santoline, L.L.; Bowers, M.D.; Crew, A.W.
1989-02-01
In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
The influence of super-horizon scales on cosmological observables generated during inflation
NASA Astrophysics Data System (ADS)
Matarrese, Sabino; Musso, Marcello A.; Riotto, Antonio
2004-05-01
Using the techniques of out-of-equilibrium field theory, we study the influence on properties of cosmological perturbations generated during inflation on observable scales coming from fluctuations corresponding today to scales much bigger than the present Hubble radius. We write the effective action for the coarse grained inflaton perturbations, integrating out the sub-horizon modes, which manifest themselves as a coloured noise and lead to memory effects. Using the simple model of a scalar field with cubic self-interactions evolving in a fixed de Sitter background, we evaluate the two- and three-point correlation function on observable scales. Our basic procedure shows that perturbations do preserve some memory of the super-horizon scale dynamics, in the form of scale dependent imprints in the statistical moments. In particular, we find a blue tilt of the power spectrum on large scales, in agreement with the recent results of the WMAP collaboration which show a suppression of the lower multipoles in the cosmic microwave background anisotropies, and a substantial enhancement of the intrinsic non-Gaussianity on large scales.
A pervasive parallel framework for visualization: final report for FWP 10-014707
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.
2014-01-01
We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less
Large-Scale Fluorescence Calcium-Imaging Methods for Studies of Long-Term Memory in Behaving Mammals
Jercog, Pablo; Rogerson, Thomas; Schnitzer, Mark J.
2016-01-01
During long-term memory formation, cellular and molecular processes reshape how individual neurons respond to specific patterns of synaptic input. It remains poorly understood how such changes impact information processing across networks of mammalian neurons. To observe how networks encode, store, and retrieve information, neuroscientists must track the dynamics of large ensembles of individual cells in behaving animals, over timescales commensurate with long-term memory. Fluorescence Ca2+-imaging techniques can monitor hundreds of neurons in behaving mice, opening exciting avenues for studies of learning and memory at the network level. Genetically encoded Ca2+ indicators allow neurons to be targeted by genetic type or connectivity. Chronic animal preparations permit repeated imaging of neural Ca2+ dynamics over multiple weeks. Together, these capabilities should enable unprecedented analyses of how ensemble neural codes evolve throughout memory processing and provide new insights into how memories are organized in the brain. PMID:27048190
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-01-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-09-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Paradigms and strategies for scientific computing on distributed memory concurrent computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.T.; Walker, D.W.
1994-06-01
In this work we examine recent advances in parallel languages and abstractions that have the potential for improving the programmability and maintainability of large-scale, parallel, scientific applications running on high performance architectures and networks. This paper focuses on Fortran M, a set of extensions to Fortran 77 that supports the modular design of message-passing programs. We describe the Fortran M implementation of a particle-in-cell (PIC) plasma simulation application, and discuss issues in the optimization of the code. The use of two other methodologies for parallelizing the PIC application are considered. The first is based on the shared object abstraction asmore » embodied in the Orca language. The second approach is the Split-C language. In Fortran M, Orca, and Split-C the ability of the programmer to control the granularity of communication is important is designing an efficient implementation.« less
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Gao, Shuang; Liu, Gang; Chen, Qilai; Xue, Wuhong; Yang, Huali; Shang, Jie; Chen, Bin; Zeng, Fei; Song, Cheng; Pan, Feng; Li, Run-Wei
2018-02-21
Resistive random access memory (RRAM) with inherent logic-in-memory capability exhibits great potential to construct beyond von-Neumann computers. Particularly, unipolar RRAM is more promising because its single polarity operation enables large-scale crossbar logic-in-memory circuits with the highest integration density and simpler peripheral control circuits. However, unipolar RRAM usually exhibits poor switching uniformity because of random activation of conducting filaments and consequently cannot meet the strict uniformity requirement for logic-in-memory application. In this contribution, a new methodology that constructs cone-shaped conducting filaments by using chemically a active metal cathode is proposed to improve unipolar switching uniformity. Such a peculiar metal cathode will react spontaneously with the oxide switching layer to form an interfacial layer, which together with the metal cathode itself can act as a load resistor to prevent the overgrowth of conducting filaments and thus make them more cone-like. In this way, the rupture of conducting filaments can be strictly limited to the tip region, making their residual parts favorable locations for subsequent filament growth and thus suppressing their random regeneration. As such, a novel "one switch + one unipolar RRAM cell" hybrid structure is capable to realize all 16 Boolean logic functions for large-scale logic-in-memory circuits.
Computational Issues in Damping Identification for Large Scale Problems
NASA Technical Reports Server (NTRS)
Pilkey, Deborah L.; Roe, Kevin P.; Inman, Daniel J.
1997-01-01
Two damping identification methods are tested for efficiency in large-scale applications. One is an iterative routine, and the other a least squares method. Numerical simulations have been performed on multiple degree-of-freedom models to test the effectiveness of the algorithm and the usefulness of parallel computation for the problems. High Performance Fortran is used to parallelize the algorithm. Tests were performed using the IBM-SP2 at NASA Ames Research Center. The least squares method tested incurs high communication costs, which reduces the benefit of high performance computing. This method's memory requirement grows at a very rapid rate meaning that larger problems can quickly exceed available computer memory. The iterative method's memory requirement grows at a much slower pace and is able to handle problems with 500+ degrees of freedom on a single processor. This method benefits from parallelization, and significant speedup can he seen for problems of 100+ degrees-of-freedom.
Using VirtualGL/TurboVNC Software on the Peregrine System |
High-Performance Computing | NREL VirtualGL/TurboVNC Software on the Peregrine System Using , allowing users to access and share large-memory visualization nodes with high-end graphics processing units may be better than just using X11 forwarding when connecting from a remote site with low bandwidth and
Schreier, Amy L; Grove, Matt
2014-05-01
The benefits of spatial memory for foraging animals can be assessed on two distinct spatial scales: small-scale space (travel within patches) and large-scale space (travel between patches). While the patches themselves may be distributed at low density, within patches resources are likely densely distributed. We propose, therefore, that spatial memory for recalling the particular locations of previously visited feeding sites will be more advantageous during between-patch movement, where it may reduce the distances traveled by animals that possess this ability compared to those that must rely on random search. We address this hypothesis by employing descriptive statistics and spectral analyses to characterize the daily foraging routes of a band of wild hamadryas baboons in Filoha, Ethiopia. The baboons slept on two main cliffs--the Filoha cliff and the Wasaro cliff--and daily travel began and ended on a cliff; thus four daily travel routes exist: Filoha-Filoha, Filoha-Wasaro, Wasaro-Wasaro, Wasaro-Filoha. We use newly developed partial sum methods and distribution-fitting analyses to distinguish periods of area-restricted search from more extensive movements. The results indicate a single peak in travel activity in the Filoha-Filoha and Wasaro-Filoha routes, three peaks of travel activity in the Filoha-Wasaro routes, and two peaks in the Wasaro-Wasaro routes; and are consistent with on-the-ground observations of foraging and ranging behavior of the baboons. In each of the four daily travel routes the "tipping points" identified by the partial sum analyses indicate transitions between travel in small- versus large-scale space. The correspondence between the quantitative analyses and the field observations suggest great utility for using these types of analyses to examine primate travel patterns and especially in distinguishing between movement in small versus large-scale space. Only the distribution-fitting analyses are inconsistent with the field observations, which may be due to the scale at which these analyses were conducted. © 2013 Wiley Periodicals, Inc.
77 FR 58415 - Large Scale Networking (LSN); Joint Engineering Team (JET)
Federal Register 2010, 2011, 2012, 2013, 2014
2012-09-20
... NATIONAL SCIENCE FOUNDATION Large Scale Networking (LSN); Joint Engineering Team (JET) AGENCY: The Networking and Information Technology Research and Development (NITRD) National Coordination Office (NCO..._Engineering_Team_ (JET). SUMMARY: The JET, established in 1997, provides for information sharing among Federal...
78 FR 70076 - Large Scale Networking (LSN)-Joint Engineering Team (JET)
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-22
... NATIONAL SCIENCE FOUNDATION Large Scale Networking (LSN)--Joint Engineering Team (JET) AGENCY: The Networking and Information Technology Research and Development (NITRD) National Coordination Office (NCO..._Engineering_Team_ (JET)#title. SUMMARY: The JET, established in 1997, provides for information sharing among...
Working Memory Span Development: A Time-Based Resource-Sharing Model Account
ERIC Educational Resources Information Center
Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie
2009-01-01
The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Direct access inter-process shared memory
Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B
2013-10-22
A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Grouping and binding in visual short-term memory.
Quinlan, Philip T; Cohen, Dale J
2012-09-01
Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Svob, Connie; Brown, Norman R; Takšić, Vladimir; Katulić, Katarina; Žauhar, Valnea
2016-08-01
Intergenerational transmission of memory is a process by which biographical knowledge contributes to the construction of collective memory (representation of a shared past). We investigated the intergenerational transmission of war-related memories and social-distance attitudes in second-generation post-war Croatians. We compared 2 groups of young adults from (1) Eastern Croatia (extensively affected by the war) and (2) Western Croatia (affected relatively less by the war). Participants were asked to (a) recall the 10 most important events that occurred in one of their parents' lives, (b) estimate the calendar years of each, and (c) provide scale ratings on them. Additionally, (d) all participants completed a modified Bogardus Social Distance scale, as well as an (e) War Events Checklist for their parents' lives. There were several findings. First, approximately two-thirds of Eastern Croatians and one-half of Western Croatians reported war-related events from their parents' lives. Second, war-related memories impacted the second-generation's identity to a greater extent than did non-war-related memories; this effect was significantly greater in Eastern Croatians than in Western Croatians. Third, war-related events displayed markedly different mnemonic characteristics than non-war-related events. Fourth, the temporal distribution of events surrounding the war produced an upheaval bump, suggesting major transitions (e.g., war) contribute to the way collective memory is formed. And, finally, outright social ostracism and aggression toward out-groups were rarely expressed, independent of region. Nonetheless, social-distance scores were notably higher in Eastern Croatia than in Western Croatia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-01-17
This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler
Towards Scalable 1024 Processor Shared Memory Systems
NASA Technical Reports Server (NTRS)
Ciotti, Robert B.; Thigpen, William W. (Technical Monitor)
2001-01-01
Over the past 3 years, NASA Ames has been involved in a cooperative effort with SGI to develop the largest single system image systems available. Currently a 1024 Origin3OOO is under development, with first boot expected later in the summer of 2001. This paper discusses some early results with a 512p Origin3OOO system and some arcane IRIX system calls that can dramatically improve scaling performance.
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
A kilobyte rewritable atomic memory
NASA Astrophysics Data System (ADS)
Kalff, Floris; Rebergen, Marnix; Fahrenfort, Nora; Girovsky, Jan; Toskovic, Ranko; Lado, Jose; FernáNdez-Rossier, JoaquíN.; Otte, Sander
The ability to manipulate individual atoms by means of scanning tunneling microscopy (STM) opens op opportunities for storage of digital data on the atomic scale. Recent achievements in this direction include data storage based on bits encoded in the charge state, the magnetic state, or the local presence of single atoms or atomic assemblies. However, a key challenge at this stage is the extension of such technologies into large-scale rewritable bit arrays. We demonstrate a digital atomic-scale memory of up to 1 kilobyte (8000 bits) using an array of individual surface vacancies in a chlorine terminated Cu(100) surface. The chlorine vacancies are found to be stable at temperatures up to 77 K. The memory, crafted using scanning tunneling microscopy at low temperature, can be read and re-written automatically by means of atomic-scale markers, and offers an areal density of 502 Terabits per square inch, outperforming state-of-the-art hard disk drives by three orders of magnitude.
A biometric latent curve analysis of memory decline in older men of the NAS-NRC twin registry.
McArdle, John J; Plassman, Brenda L
2009-09-01
Previous research has shown cognitive abilities to have different biometric patterns of age-changes. We examined the variation in episodic memory (word recall task) for over 6,000 twin pairs who were initially aged 59-75, and were subsequently re-assessed up to three more times over 12 years. In cross-sectional analyses, variation in the number of words recalled independent of age was explained largely by non-shared influences (65-72%), with clear additive genetic influences (12-32%), and marginal shared family influences (1-18%). The longitudinal phenotypic analysis of the word recall task showed systematic linear declines over age, but several nonlinear models with more dramatic changes at later ages, improved the overall fit. A two-part spline model for the longitudinal twin data with an optimal turning point at age 74 led to: (a) a separation of non-shared environmental influences and transient measurement error (~50%); (b) strong additive genetic components of this latent curve (~44% at age 60) with increases (over 50%) up to age 74, but with no additional genetic variation after age 74; (c) the smaller influences of shared family environment (~15% at age 74) were constant over all ages; (d) non-shared effects play an important role over most of the life-span but diminish after age 74.
Parallelization Issues and Particle-In Codes.
NASA Astrophysics Data System (ADS)
Elster, Anne Cathrine
1994-01-01
"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
A constitutive theory for shape memory polymers: coupling of small and large deformation
NASA Astrophysics Data System (ADS)
Tan, Qiao; Liu, Liwu; Liu, Yanju; Leng, Jinsong; Yan, Xiangqiao; Wang, Haifang
2013-04-01
At high temperatures, SMPs share attributes like rubber and exhibit long-range reversibility. In contrast, at low temperatures they become very rigid and are susceptible to plastic, only small strains are allowable. But there relatively little literature has considered the unique small stain (rubber phase) and large stain (glass phase) coupling in SMPs when developing the constitutive modeling. In this work, we present a 3D constitutive model for shape memory polymers in both low temperature small strain regime and high temperature large strain regime. The theory is based on the work of Liu et al. [15]. Four steps of SMP's thermomechanical loadings cycle are considered in the constitutive model completely. The linear elastic and hyperelastic effects of SMP in different temperatures are also fully accounted for in the proposed model by adopt the neo-Hookean model and the Generalized Hooke's laws.
NASA Astrophysics Data System (ADS)
Raghib, Michael; Levin, Simon; Kevrekidis, Ioannis
2010-05-01
Self-propelled particle models (SPP's) are a class of agent-based simulations that have been successfully used to explore questions related to various flavors of collective motion, including flocking, swarming, and milling. These models typically consist of particle configurations, where each particle moves with constant speed, but changes its orientation in response to local averages of the positions and orientations of its neighbors found within some interaction region. These local averages are based on `social interactions', which include avoidance of collisions, attraction, and polarization, that are designed to generate configurations that move as a single object. Errors made by the individuals in the estimates of the state of the local configuration are modeled as a random rotation of the updated orientation resulting from the social rules. More recently, SPP's have been introduced in the context of collective decision-making, where the main innovation consists of dividing the population into naïve and `informed' individuals. Whereas naïve individuals follow the classical collective motion rules, members of the informed sub-population update their orientations according to a weighted average of the social rules and a fixed `preferred' direction, shared by all the informed individuals. Collective decision-making is then understood in terms of the ability of the informed sub-population to steer the whole group along the preferred direction. Summary statistics of collective decision-making are defined in terms of the stochastic properties of the random walk followed by the centroid of the configuration as the particles move about, in particular the scaling behavior of the mean squared displacement (msd). For the region of parameters where the group remains coherent , we note that there are two characteristic time scales, first there is an anomalous transient shared by both purely naïve and informed configurations, i.e. the scaling exponent lies between 1 and 2. The long-time behavior of the msd of the centroid walk scales linearly with time for naïve groups (diffusion), but shows a sharp transition to quadratic scaling (advection) for informed ones. These observations suggest that the mesoscopic variables of interest are the magnitude of the drift, the diffusion coefficient and the time-scales at which the anomalous and the asymptotic behavior respectively dominate transport, the latter being linked to the time scale at which the group reaches a decision. In order to estimate these summary statistics from the msd, we assumed that the configuration centroid follows an uncoupled Continuous Time Random Walk (CTRW) with smooth jump and waiting time pdf's. The mesoscopic transport equation for this type of random walk corresponds to an Advection-Diffusion Equation with Memory (ADEM). The introduction of the memory, and thus non-Markovian effects, is necessary in order to correctly account for the two time scales present. Although we were not able to calculate the memory directly from the individual-level rules, we show that it can estimated from a single, relatively short, simulation run using a Mittag-Leffler function as template. With this function it is possible to predict accurately the behavior of the msd, as well as the full pdf for the position of the centroid. The resulting ADEM is self-consistent in the sense that transport parameters estimated from the memory via a Kubo relationship coincide with those estimated from the moments of the jump size pdf of the associated CTRW for a large number of group sizes, proportions of informed individuals, and degrees of bias along the preferred direction. We also discuss the phase diagrams for the transport coefficients estimated from this method, where we notice velocity-precision trade-offs, where precision is a measure of the deviation of realized group orientations with respect to the informed direction. We also note that the time scale to collective decision is invariant with respect to group size, and depends only on the proportion of informed individuals and the strength of the coupling along the informed direction.
NASA Astrophysics Data System (ADS)
McFarland, Jacob A.; Reilly, David; Black, Wolfgang; Greenough, Jeffrey A.; Ranjan, Devesh
2015-07-01
The interaction of a small-wavelength multimodal perturbation with a large-wavelength inclined interface perturbation is investigated for the reshocked Richtmyer-Meshkov instability using three-dimensional simulations. The ares code, developed at Lawrence Livermore National Laboratory, was used for these simulations and a detailed comparison of simulation results and experiments performed at the Georgia Tech Shock Tube facility is presented first for code validation. Simulation results are presented for four cases that vary in large-wavelength perturbation amplitude and the presence of secondary small-wavelength multimode perturbations. Previously developed measures of mixing and turbulence quantities are presented that highlight the large variation in perturbation length scales created by the inclined interface and the multimode complex perturbation. Measures are developed for entrainment, and turbulence anisotropy that help to identify the effects of and competition between each perturbations type. It is shown through multiple measures that before reshock the flow processes a distinct memory of the initial conditions that is present in both large-scale-driven entrainment measures and small-scale-driven mixing measures. After reshock the flow develops to a turbulentlike state that retains a memory of high-amplitude but not low-amplitude large-wavelength perturbations. It is also shown that the high-amplitude large-wavelength perturbation is capable of producing small-scale mixing and turbulent features similar to the small-wavelength multimode perturbations.
Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna
2017-09-01
Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montry, G.R.; Benner, R.E.
1985-12-01
The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin
2018-01-01
Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...
2018-01-05
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
How the amygdala affects emotional memory by altering brain network properties.
Hermans, Erno J; Battaglia, Francesco P; Atsak, Piray; de Voogd, Lycia D; Fernández, Guillén; Roozendaal, Benno
2014-07-01
The amygdala has long been known to play a key role in supporting memory for emotionally arousing experiences. For example, classical fear conditioning depends on neural plasticity within this anterior medial temporal lobe region. Beneficial effects of emotional arousal on memory, however, are not restricted to simple associative learning. Our recollection of emotional experiences often includes rich representations of, e.g., spatiotemporal context, visceral states, and stimulus-response associations. Critically, such memory features are known to bear heavily on regions elsewhere in the brain. These observations led to the modulation account of amygdala function, which postulates that amygdala activation enhances memory consolidation by facilitating neural plasticity and information storage processes in its target regions. Rodent work in past decades has identified the most important brain regions and neurochemical processes involved in these modulatory actions, and neuropsychological and neuroimaging work in humans has produced a large body of convergent data. Importantly, recent methodological developments make it increasingly realistic to monitor neural interactions underlying such modulatory effects as they unfold. For instance, functional connectivity network modeling in humans has demonstrated how information exchanges between the amygdala and specific target regions occur within the context of large-scale neural network interactions. Furthermore, electrophysiological and optogenetic techniques in rodents are beginning to make it possible to quantify and even manipulate such interactions with millisecond precision. In this paper we will discuss that these developments will likely lead to an updated view of the amygdala as a critical nexus within large-scale networks supporting different aspects of memory processing for emotionally arousing experiences. Copyright © 2014 Elsevier Inc. All rights reserved.
Inductive reasoning and implicit memory: evidence from intact and impaired memory systems.
Girelli, Luisa; Semenza, Carlo; Delazer, Margarete
2004-01-01
In this study, we modified a classic problem solving task, number series completion, in order to explore the contribution of implicit memory to inductive reasoning. Participants were required to complete number series sharing the same underlying algorithm (e.g., +2), differing in both constituent elements (e.g., 2468 versus 57911) and correct answers (e.g., 10 versus 13). In Experiment 1, reliable priming effects emerged, whether primes and targets were separated by four or ten fillers. Experiment 2 provided direct evidence that the observed facilitation arises at central stages of problem solving, namely the identification of the algorithm and its subsequent extrapolation. The observation of analogous priming effects in a severely amnesic patient strongly supports the hypothesis that the facilitation in number series completion was largely determined by implicit memory processes. These findings demonstrate that the influence of implicit processes extends to higher level cognitive domain such as induction reasoning.
A 10-year ecosystem restoration community of practice tracks large-scale restoration trends
In 2004, a group of large-scale ecosystem restoration practitioners across the United States convened to start the process of sharing restoration science, management, and best practices under the auspices of a traditional conference umbrella. This forum allowed scientists and dec...
NASA Astrophysics Data System (ADS)
Akanda, A. S.; Jutla, A. S.; Islam, S.
2009-12-01
Despite ravaging the continents through seven global pandemics in past centuries, the seasonal and interannual variability of cholera outbreaks remain a mystery. Previous studies have focused on the role of various environmental and climatic factors, but provided little or no predictive capability. Recent findings suggest a more prominent role of large scale hydroclimatic extremes - droughts and floods - and attempt to explain the seasonality and the unique dual cholera peaks in the Bengal Delta region of South Asia. We investigate the seasonal and interannual nature of cholera epidemiology in three geographically distinct locations within the region to identify the larger scale hydroclimatic controls that can set the ecological and environmental ‘stage’ for outbreaks and have significant memory on a seasonal scale. Here we show that two distinctly different, pre and post monsoon, cholera transmission mechanisms related to large scale climatic controls prevail in the region. An implication of our findings is that extreme climatic events such as prolonged droughts, record floods, and major cyclones may cause major disruption in the ecosystem and trigger large epidemics. We postulate that a quantitative understanding of the large-scale hydroclimatic controls and dominant processes with significant system memory will form the basis for forecasting such epidemic outbreaks. A multivariate regression method using these predictor variables to develop probabilistic forecasts of cholera outbreaks will be explored. Forecasts from such a system with a seasonal lead-time are likely to have measurable impact on early cholera detection and prevention efforts in endemic regions.
Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T
2017-11-01
Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
Flexible language constructs for large parallel programs
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
ERIC Educational Resources Information Center
Hayes-Roth, Barbara
Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study
Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...
2015-01-01
This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Anomalous Fluctuations in Autoregressive Models with Long-Term Memory
NASA Astrophysics Data System (ADS)
Sakaguchi, Hidetsugu; Honjo, Haruo
2015-10-01
An autoregressive model with a power-law type memory kernel is studied as a stochastic process that exhibits a self-affine-fractal-like behavior for a small time scale. We find numerically that the root-mean-square displacement Δ(m) for the time interval m increases with a power law as mα with α < 1/2 for small m but saturates at sufficiently large m. The exponent α changes with the power exponent of the memory kernel.
Wiese, Holger; Schweinberger, Stefan R
2015-01-01
The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.
Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane
2006-01-01
The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper
2016-01-01
The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan
2010-09-07
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Continuous-Time Random Walk with multi-step memory: an application to market dynamics
NASA Astrophysics Data System (ADS)
Gubiec, Tomasz; Kutner, Ryszard
2017-11-01
An extended version of the Continuous-Time Random Walk (CTRW) model with memory is herein developed. This memory involves the dependence between arbitrary number of successive jumps of the process while waiting times between jumps are considered as i.i.d. random variables. This dependence was established analyzing empirical histograms for the stochastic process of a single share price on a market within the high frequency time scale. Then, it was justified theoretically by considering bid-ask bounce mechanism containing some delay characteristic for any double-auction market. Our model appeared exactly analytically solvable. Therefore, it enables a direct comparison of its predictions with their empirical counterparts, for instance, with empirical velocity autocorrelation function. Thus, the present research significantly extends capabilities of the CTRW formalism. Contribution to the Topical Issue "Continuous Time Random Walk Still Trendy: Fifty-year History, Current State and Outlook", edited by Ryszard Kutner and Jaume Masoliver.
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbormore » points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.« less
Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan
2004-01-01
Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335
Sleep enhances a spatially mediated generalization of learned values
Tolat, Anisha; Spiers, Hugo J.
2015-01-01
Sleep is thought to play an important role in memory consolidation. Here we tested whether sleep alters the subjective value associated with objects located in spatial clusters that were navigated to in a large-scale virtual town. We found that sleep enhances a generalization of the value of high-value objects to the value of locally clustered objects, resulting in an impaired memory for the value of high-valued objects. Our results are consistent with (a) spatial context helping to bind items together in long-term memory and serve as a basis for generalizing across memories and (b) sleep mediating memory effects on salient/reward-related items. PMID:26373834
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Ergül, Özgür
2011-11-01
Fast and accurate solutions of large-scale electromagnetics problems involving homogeneous dielectric objects are considered. Problems are formulated with the electric and magnetic current combined-field integral equation and discretized with the Rao-Wilton-Glisson functions. Solutions are performed iteratively by using the multilevel fast multipole algorithm (MLFMA). For the solution of large-scale problems discretized with millions of unknowns, MLFMA is parallelized on distributed-memory architectures using a rigorous technique, namely, the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large problems involving as many as 100 million unknowns.
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
2017-12-06
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Working, declarative and procedural memory in specific language impairment
Lum, Jarrad A.G.; Conti-Ramsden, Gina; Page, Debra; Ullman, Michael T.
2012-01-01
According to the Procedural Deficit Hypothesis (PDH), abnormalities of brain structures underlying procedural memory largely explain the language deficits in children with specific language impairment (SLI). These abnormalities are posited to result in core deficits of procedural memory, which in turn explain the grammar problems in the disorder. The abnormalities are also likely to lead to problems with other, non-procedural functions, such as working memory, that rely at least partly on the affected brain structures. In contrast, declarative memory is expected to remain largely intact, and should play an important compensatory role for grammar. These claims were tested by examining measures of working, declarative and procedural memory in 51 children with SLI and 51 matched typically-developing (TD) children (mean age 10). Working memory was assessed with the Working Memory Test Battery for Children, declarative memory with the Children’s Memory Scale, and procedural memory with a visuo-spatial Serial Reaction Time task. As compared to the TD children, the children with SLI were impaired at procedural memory, even when holding working memory constant. In contrast, they were spared at declarative memory for visual information, and at declarative memory in the verbal domain after controlling for working memory and language. Visuo-spatial short-term memory was intact, whereas verbal working memory was impaired, even when language deficits were held constant. Correlation analyses showed neither visuo-spatial nor verbal working memory was associated with either lexical or grammatical abilities in either the SLI or TD children. Declarative memory correlated with lexical abilities in both groups of children. Finally, grammatical abilities were associated with procedural memory in the TD children, but with declarative memory in the children with SLI. These findings replicate and extend previous studies of working, declarative and procedural memory in SLI. Overall, we suggest that the evidence largely supports the predictions of the PDH. PMID:21774923
Low-Power Architectures for Large Radio Astronomy Correlators
NASA Technical Reports Server (NTRS)
D'Addario, Larry R.
2011-01-01
The architecture of a cross-correlator for a synthesis radio telescope with N greater than 1000 antennas is studied with the objective of minimizing power consumption. It is found that the optimum architecture minimizes memory operations, and this implies preference for a matrix structure over a pipeline structure and avoiding the use of memory banks as accumulation registers when sharing multiply-accumulators among baselines. A straw-man design for N = 2000 and bandwidth of 1 GHz, based on ASICs fabricated in a 90 nm CMOS process, is presented. The cross-correlator proper (excluding per-antenna processing) is estimated to consume less than 35 kW.
Howe, Piers D. L.
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383
Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie
2009-07-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.
Bhatti, A Aziz
2009-12-01
This study proposes an efficient and improved model of a direct storage bidirectional memory, improved bidirectional associative memory (IBAM), and emphasises the use of nanotechnology for efficient implementation of such large-scale neural network structures at a considerable lower cost reduced complexity, and less area required for implementation. This memory model directly stores the X and Y associated sets of M bipolar binary vectors in the form of (MxN(x)) and (MxN(y)) memory matrices, requires O(N) or about 30% of interconnections with weight strength ranging between +/-1, and is computationally very efficient as compared to sequential, intraconnected and other bidirectional associative memory (BAM) models of outer-product type that require O(N(2)) complex interconnections with weight strength ranging between +/-M. It is shown that it is functionally equivalent to and possesses all attributes of a BAM of outer-product type, and yet it is simple and robust in structure, very large scale integration (VLSI), optical and nanotechnology realisable, modular and expandable neural network bidirectional associative memory model in which the addition or deletion of a pair of vectors does not require changes in the strength of interconnections of the entire memory matrix. The analysis of retrieval process, signal-to-noise ratio, storage capacity and stability of the proposed model as well as of the traditional BAM has been carried out. Constraints on and characteristics of unipolar and bipolar binaries for improved storage and retrieval are discussed. The simulation results show that it has log(e) N times higher storage capacity, superior performance, faster convergence and retrieval time, when compared to traditional sequential and intraconnected bidirectional memories.
Muller, Lyle; Piantoni, Giovanni; Koller, Dominik; Cash, Sydney S; Halgren, Eric; Sejnowski, Terrence J
2016-01-01
During sleep, the thalamus generates a characteristic pattern of transient, 11-15 Hz sleep spindle oscillations, which synchronize the cortex through large-scale thalamocortical loops. Spindles have been increasingly demonstrated to be critical for sleep-dependent consolidation of memory, but the specific neural mechanism for this process remains unclear. We show here that cortical spindles are spatiotemporally organized into circular wave-like patterns, organizing neuronal activity over tens of milliseconds, within the timescale for storing memories in large-scale networks across the cortex via spike-time dependent plasticity. These circular patterns repeat over hours of sleep with millisecond temporal precision, allowing reinforcement of the activity patterns through hundreds of reverberations. These results provide a novel mechanistic account for how global sleep oscillations and synaptic plasticity could strengthen networks distributed across the cortex to store coherent and integrated memories. DOI: http://dx.doi.org/10.7554/eLife.17267.001 PMID:27855061
Why are you telling me that? A conceptual model of the social function of autobiographical memory.
Alea, Nicole; Bluck, Susan
2003-03-01
In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.
NASA Astrophysics Data System (ADS)
Noé, Pierre; Vallée, Christophe; Hippert, Françoise; Fillot, Frédéric; Raty, Jean-Yves
2018-01-01
Chalcogenide phase-change materials (PCMs), such as Ge-Sb-Te alloys, have shown outstanding properties, which has led to their successful use for a long time in optical memories (DVDs) and, recently, in non-volatile resistive memories. The latter, known as PCM memories or phase-change random access memories (PCRAMs), are the most promising candidates among emerging non-volatile memory (NVM) technologies to replace the current FLASH memories at CMOS technology nodes under 28 nm. Chalcogenide PCMs exhibit fast and reversible phase transformations between crystalline and amorphous states with very different transport and optical properties leading to a unique set of features for PCRAMs, such as fast programming, good cyclability, high scalability, multi-level storage capability, and good data retention. Nevertheless, PCM memory technology has to overcome several challenges to definitively invade the NVM market. In this review paper, we examine the main technological challenges that PCM memory technology must face and we illustrate how new memory architecture, innovative deposition methods, and PCM composition optimization can contribute to further improvements of this technology. In particular, we examine how to lower the programming currents and increase data retention. Scaling down PCM memories for large-scale integration means the incorporation of the PCM into more and more confined structures and raises materials science issues in order to understand interface and size effects on crystallization. Other materials science issues are related to the stability and ageing of the amorphous state of PCMs. The stability of the amorphous phase, which determines data retention in memory devices, can be increased by doping the PCM. Ageing of the amorphous phase leads to a large increase of the resistivity with time (resistance drift), which has up to now hindered the development of ultra-high multi-level storage devices. A review of the current understanding of all these issues is provided from a materials science point of view.
Meeting the memory challenges of brain-scale network simulation.
Kunkel, Susanne; Potjans, Tobias C; Eppler, Jochen M; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus
2011-01-01
The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10(5) neurons with up to 10(9) synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less expensive. Applying the model to our freely available Neural Simulation Tool (NEST), we identify the software components dominant at different scales, and develop general strategies for reducing the memory consumption, in particular by using data structures that exploit the sparseness of the local representation of the network. We show that these adaptations enable our simulation software to scale up to the order of 10,000 processors and beyond. As memory consumption issues are likely to be relevant for any software dealing with complex connectome data on such architectures, our approach and our findings should be useful for researchers developing novel neuroinformatics solutions to the challenges posed by the connectome project.
Destination memory impairment in older people.
Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn
2010-12-01
Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).
A review of aspects relating to the improvement of holographic memory technology
NASA Astrophysics Data System (ADS)
Vyukhina, N. N.; Gibin, I. S.; Dombrovsky, V. A.; Dombrovsky, S. A.; Pankov, B. N.; Pen, E. F.; Potapov, A. N.; Sinyukov, A. M.; Tverdokhleb, P. E.; Shelkovnikov, V. V.
1996-06-01
Results of studying a holographic memory to write/read digital data pages are presented. The research has been carried out in Novosibirsk, Russia. Great attention was paid to methods of improving recording density and the reliability of data reading, the development of 'dry' photopolymers that provide recording of superimposed three-dimensional phase holograms, and the designing of parallel optic input large-scale integration (LSI) for reading and logical processing of data arriving from the holographic memory.
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor
1991-06-01
Symposium on Compiler Construction, June 1986. [14] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In...Directory Methods. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [31] G . M. Papadopoulos and D.E. Culler...Monsoon: An Explicit Token-Store Ar- chitecture. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [32] G . F
Efficacy of the SU(3) scheme for ab initio large-scale calculations beyond the lightest nuclei
Dytrych, T.; Maris, P.; Launey, K. D.; ...
2016-06-22
We report on the computational characteristics of ab initio nuclear structure calculations in a symmetry-adapted no-core shell model (SA-NCSM) framework. We examine the computational complexity of the current implementation of the SA-NCSM approach, dubbed LSU3shell, by analyzing ab initio results for 6Li and 12C in large harmonic oscillator model spaces and SU3-selected subspaces. We demonstrate LSU3shell’s strong-scaling properties achieved with highly-parallel methods for computing the many-body matrix elements. Results compare favorably with complete model space calculations and significant memory savings are achieved in physically important applications. In particular, a well-chosen symmetry-adapted basis affords memory savings in calculations of states withmore » a fixed total angular momentum in large model spaces while exactly preserving translational invariance.« less
Efficacy of the SU(3) scheme for ab initio large-scale calculations beyond the lightest nuclei
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dytrych, T.; Maris, Pieter; Launey, K. D.
2016-06-09
We report on the computational characteristics of ab initio nuclear structure calculations in a symmetry-adapted no-core shell model (SA-NCSM) framework. We examine the computational complexity of the current implementation of the SA-NCSM approach, dubbed LSU3shell, by analyzing ab initio results for 6Li and 12C in large harmonic oscillator model spaces and SU(3)-selected subspaces. We demonstrate LSU3shell's strong-scaling properties achieved with highly-parallel methods for computing the many-body matrix elements. Results compare favorably with complete model space calculations and signi cant memory savings are achieved in physically important applications. In particular, a well-chosen symmetry-adapted basis a ords memory savings in calculations ofmore » states with a fixed total angular momentum in large model spaces while exactly preserving translational invariance.« less
ERIC Educational Resources Information Center
Kind, Per Morten
2013-01-01
The paper analyzes conceptualizations in the science frameworks in three large-scale assessments, Trends in Mathematics and Science Study (TIMSS), Programme for International Student Assessment (PISA), and National Assessment of Educational Progress (NAEP). The assessments have a shared history, but have developed different conceptualizations. The…
USDA-ARS?s Scientific Manuscript database
Large-scale crop monitoring and yield estimation are important for both scientific research and practical applications. Satellite remote sensing provides an effective means for regional and global cropland monitoring, particularly in data-sparse regions that lack reliable ground observations and rep...
Quantum teleportation between remote atomic-ensemble quantum memories.
Bao, Xiao-Hui; Xu, Xiao-Fan; Li, Che-Ming; Yuan, Zhen-Sheng; Lu, Chao-Yang; Pan, Jian-Wei
2012-12-11
Quantum teleportation and quantum memory are two crucial elements for large-scale quantum networks. With the help of prior distributed entanglement as a "quantum channel," quantum teleportation provides an intriguing means to faithfully transfer quantum states among distant locations without actual transmission of the physical carriers [Bennett CH, et al. (1993) Phys Rev Lett 70(13):1895-1899]. Quantum memory enables controlled storage and retrieval of fast-flying photonic quantum bits with stationary matter systems, which is essential to achieve the scalability required for large-scale quantum networks. Combining these two capabilities, here we realize quantum teleportation between two remote atomic-ensemble quantum memory nodes, each composed of ∼10(8) rubidium atoms and connected by a 150-m optical fiber. The spin wave state of one atomic ensemble is mapped to a propagating photon and subjected to Bell state measurements with another single photon that is entangled with the spin wave state of the other ensemble. Two-photon detection events herald the success of teleportation with an average fidelity of 88(7)%. Besides its fundamental interest as a teleportation between two remote macroscopic objects, our technique may be useful for quantum information transfer between different nodes in quantum networks and distributed quantum computing.
Vlachos, Ioannis; Herry, Cyril; Lüthi, Andreas; Aertsen, Ad; Kumar, Arvind
2011-01-01
The basal nucleus of the amygdala (BA) is involved in the formation of context-dependent conditioned fear and extinction memories. To understand the underlying neural mechanisms we developed a large-scale neuron network model of the BA, composed of excitatory and inhibitory leaky-integrate-and-fire neurons. Excitatory BA neurons received conditioned stimulus (CS)-related input from the adjacent lateral nucleus (LA) and contextual input from the hippocampus or medial prefrontal cortex (mPFC). We implemented a plasticity mechanism according to which CS and contextual synapses were potentiated if CS and contextual inputs temporally coincided on the afferents of the excitatory neurons. Our simulations revealed a differential recruitment of two distinct subpopulations of BA neurons during conditioning and extinction, mimicking the activation of experimentally observed cell populations. We propose that these two subgroups encode contextual specificity of fear and extinction memories, respectively. Mutual competition between them, mediated by feedback inhibition and driven by contextual inputs, regulates the activity in the central amygdala (CEA) thereby controlling amygdala output and fear behavior. The model makes multiple testable predictions that may advance our understanding of fear and extinction memories. PMID:21437238
An event map of memory space in the hippocampus
Deuker, Lorena; Bellmund, Jacob LS; Navarro Schröder, Tobias; Doeller, Christian F
2016-01-01
The hippocampus has long been implicated in both episodic and spatial memory, however these mnemonic functions have been traditionally investigated in separate research strands. Theoretical accounts and rodent data suggest a common mechanism for spatial and episodic memory in the hippocampus by providing an abstract and flexible representation of the external world. Here, we monitor the de novo formation of such a representation of space and time in humans using fMRI. After learning spatio-temporal trajectories in a large-scale virtual city, subject-specific neural similarity in the hippocampus scaled with the remembered proximity of events in space and time. Crucially, the structure of the entire spatio-temporal network was reflected in neural patterns. Our results provide evidence for a common coding mechanism underlying spatial and temporal aspects of episodic memory in the hippocampus and shed new light on its role in interleaving multiple episodes in a neural event map of memory space. DOI: http://dx.doi.org/10.7554/eLife.16534.001 PMID:27710766
NASA Astrophysics Data System (ADS)
Matsui, Chihiro; Kinoshita, Reika; Takeuchi, Ken
2018-04-01
A hybrid of storage class memory (SCM) and NAND flash is a promising technology for high performance storage. Error correction is inevitable on SCM and NAND flash because their bit error rate (BER) increases with write/erase (W/E) cycles, data retention, and program/read disturb. In addition, scaling and multi-level cell technologies increase BER. However, error-correcting code (ECC) degrades storage performance because of extra memory reading and encoding/decoding time. Therefore, applicable ECC strength of SCM and NAND flash is evaluated independently by fixing ECC strength of one memory in the hybrid storage. As a result, weak BCH ECC with small correctable bit is recommended for the hybrid storage with large SCM capacity because SCM is accessed frequently. In contrast, strong and long-latency LDPC ECC can be applied to NAND flash in the hybrid storage with large SCM capacity because large-capacity SCM improves the storage performance.
NASA Technical Reports Server (NTRS)
Morgan, R. P.; Singh, J. P.; Rothenberg, D.; Robinson, B. E.
1975-01-01
The needs to be served, the subsectors in which the system might be used, the technology employed, and the prospects for future utilization of an educational telecommunications delivery system are described and analyzed. Educational subsectors are analyzed with emphasis on the current status and trends within each subsector. Issues which affect future development, and prospects for future use of media, technology, and large-scale electronic delivery within each subsector are included. Information on technology utilization is presented. Educational telecommunications services are identified and grouped into categories: public television and radio, instructional television, computer aided instruction, computer resource sharing, and information resource sharing. Technology based services, their current utilization, and factors which affect future development are stressed. The role of communications satellites in providing these services is discussed. Efforts to analyze and estimate future utilization of large-scale educational telecommunications are summarized. Factors which affect future utilization are identified. Conclusions are presented.
NASA Astrophysics Data System (ADS)
Li, Hao; Xie, Lunguo
2013-03-01
The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.
Face classification using electronic synapses
NASA Astrophysics Data System (ADS)
Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H.-S. Philip; Qian, He
2017-05-01
Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
Face classification using electronic synapses.
Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H-S Philip; Qian, He
2017-05-12
Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
The Social Life of a Data Base
NASA Technical Reports Server (NTRS)
Linde, Charlotte; Wales, Roxana; Clancy, Dan (Technical Monitor)
2002-01-01
This paper presents the complex social life of a large data base. The topics include: 1) Social Construction of Mechanisms of Memory; 2) Data Bases: The Invisible Memory Mechanism; 3) The Human in the Machine; 4) Data of the Study: A Large-Scale Problem Reporting Data Base; 5) The PRACA Study; 6) Description of PRACA; 7) PRACA and Paper; 8) Multiple Uses of PRACA; 9) The Work of PRACA; 10) Multiple Forms of Invisibility; 11) Such Systems are Everywhere; and 12) Two Morals to the Story. This paper is in viewgraph form.
Interference due to shared features between action plans is influenced by working memory span.
Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M
2014-12-01
In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.
Destination Memory Impairment in Older People
Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn
2012-01-01
Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537
Helmstaedter, Christoph; Wietzke, Jennifer; Lutz, Martin T
2009-12-01
This study was set-up to evaluate the construct validity of three verbal memory tests in epilepsy patients. Sixty-one consecutively evaluated patients with temporal lobe epilepsy (TLE) or extra-temporal epilepsy (E-TLE) underwent testing with the verbal learning and memory test (VLMT, the German equivalent of the Rey auditory verbal learning test, RAVLT); the California verbal learning test (CVLT); the logical memory and digit span subtests of the Wechsler memory scale, revised (WMS-R); and testing of intelligence, attention, speech and executive functions. Factor analysis of the memory tests resulted in test-specific rather than test over-spanning factors. Parameters of the CVLT and WMS-R, and to a much lesser degree of the VLMT, were highly correlated with attention, language function and vocabulary. Delayed recall measures of logical memory and the VLMT differentiated TLE from E-TLE. Learning and memory scores off all three tests differentiated mesial temporal sclerosis from other pathologies. A lateralization of the epilepsy was possible only for a subsample of 15 patients with mesial TLE. Although the three tests provide overlapping indicators for a temporal lobe epilepsy or a mesial pathology, they can hardly be taken in exchange. The tests have different demands on semantic processing and memory organization, and they appear differentially sensitive to performance in non-memory domains. The tests capability to lateralize appears to be poor. The findings encourage the further discussion of the dependency of memory outcomes on test selection.
Memory Maintenance in Synapses with Calcium-Based Plasticity in the Presence of Background Activity
Higgins, David; Graupner, Michael; Brunel, Nicolas
2014-01-01
Most models of learning and memory assume that memories are maintained in neuronal circuits by persistent synaptic modifications induced by specific patterns of pre- and postsynaptic activity. For this scenario to be viable, synaptic modifications must survive the ubiquitous ongoing activity present in neural circuits in vivo. In this paper, we investigate the time scales of memory maintenance in a calcium-based synaptic plasticity model that has been shown recently to be able to fit different experimental data-sets from hippocampal and neocortical preparations. We find that in the presence of background activity on the order of 1 Hz parameters that fit pyramidal layer 5 neocortical data lead to a very fast decay of synaptic efficacy, with time scales of minutes. We then identify two ways in which this memory time scale can be extended: (i) the extracellular calcium concentration in the experiments used to fit the model are larger than estimated concentrations in vivo. Lowering extracellular calcium concentration to in vivo levels leads to an increase in memory time scales of several orders of magnitude; (ii) adding a bistability mechanism so that each synapse has two stable states at sufficiently low background activity leads to a further boost in memory time scale, since memory decay is no longer described by an exponential decay from an initial state, but by an escape from a potential well. We argue that both features are expected to be present in synapses in vivo. These results are obtained first in a single synapse connecting two independent Poisson neurons, and then in simulations of a large network of excitatory and inhibitory integrate-and-fire neurons. Our results emphasise the need for studying plasticity at physiological extracellular calcium concentration, and highlight the role of synaptic bi- or multistability in the stability of learned synaptic structures. PMID:25275319
A kilobyte rewritable atomic memory
NASA Astrophysics Data System (ADS)
Kalff, F. E.; Rebergen, M. P.; Fahrenfort, E.; Girovsky, J.; Toskovic, R.; Lado, J. L.; Fernández-Rossier, J.; Otte, A. F.
2016-11-01
The advent of devices based on single dopants, such as the single-atom transistor, the single-spin magnetometer and the single-atom memory, has motivated the quest for strategies that permit the control of matter with atomic precision. Manipulation of individual atoms by low-temperature scanning tunnelling microscopy provides ways to store data in atoms, encoded either into their charge state, magnetization state or lattice position. A clear challenge now is the controlled integration of these individual functional atoms into extended, scalable atomic circuits. Here, we present a robust digital atomic-scale memory of up to 1 kilobyte (8,000 bits) using an array of individual surface vacancies in a chlorine-terminated Cu(100) surface. The memory can be read and rewritten automatically by means of atomic-scale markers and offers an areal density of 502 terabits per square inch, outperforming state-of-the-art hard disk drives by three orders of magnitude. Furthermore, the chlorine vacancies are found to be stable at temperatures up to 77 K, offering the potential for expanding large-scale atomic assembly towards ambient conditions.
NASA Astrophysics Data System (ADS)
Cardall, Christian Y.; Budiardja, Reuben D.
2017-05-01
GenASiS Basics provides Fortran 2003 classes furnishing extensible object-oriented utilitarian functionality for large-scale physics simulations on distributed memory supercomputers. This functionality includes physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. This revision -Version 2 of Basics - makes mostly minor additions to functionality and includes some simplifying name changes.
Graf, Daniel; Beuerle, Matthias; Schurkus, Henry F; Luenser, Arne; Savasci, Gökcen; Ochsenfeld, Christian
2018-05-08
An efficient algorithm for calculating the random phase approximation (RPA) correlation energy is presented that is as accurate as the canonical molecular orbital resolution-of-the-identity RPA (RI-RPA) with the important advantage of an effective linear-scaling behavior (instead of quartic) for large systems due to a formulation in the local atomic orbital space. The high accuracy is achieved by utilizing optimized minimax integration schemes and the local Coulomb metric attenuated by the complementary error function for the RI approximation. The memory bottleneck of former atomic orbital (AO)-RI-RPA implementations ( Schurkus, H. F.; Ochsenfeld, C. J. Chem. Phys. 2016 , 144 , 031101 and Luenser, A.; Schurkus, H. F.; Ochsenfeld, C. J. Chem. Theory Comput. 2017 , 13 , 1647 - 1655 ) is addressed by precontraction of the large 3-center integral matrix with the Cholesky factors of the ground state density reducing the memory requirements of that matrix by a factor of [Formula: see text]. Furthermore, we present a parallel implementation of our method, which not only leads to faster RPA correlation energy calculations but also to a scalable decrease in memory requirements, opening the door for investigations of large molecules even on small- to medium-sized computing clusters. Although it is known that AO methods are highly efficient for extended systems, where sparsity allows for reaching the linear-scaling regime, we show that our work also extends the applicability when considering highly delocalized systems for which no linear scaling can be achieved. As an example, the interlayer distance of two covalent organic framework pore fragments (comprising 384 atoms in total) is analyzed.
Cost aware cache replacement policy in shared last-level cache for hybrid memory based fog computing
NASA Astrophysics Data System (ADS)
Jia, Gangyong; Han, Guangjie; Wang, Hao; Wang, Feng
2018-04-01
Fog computing requires a large main memory capacity to decrease latency and increase the Quality of Service (QoS). However, dynamic random access memory (DRAM), the commonly used random access memory, cannot be included into a fog computing system due to its high consumption of power. In recent years, non-volatile memories (NVM) such as Phase-Change Memory (PCM) and Spin-transfer torque RAM (STT-RAM) with their low power consumption have emerged to replace DRAM. Moreover, the currently proposed hybrid main memory, consisting of both DRAM and NVM, have shown promising advantages in terms of scalability and power consumption. However, the drawbacks of NVM, such as long read/write latency give rise to potential problems leading to asymmetric cache misses in the hybrid main memory. Current last level cache (LLC) policies are based on the unified miss cost, and result in poor performance in LLC and add to the cost of using NVM. In order to minimize the cache miss cost in the hybrid main memory, we propose a cost aware cache replacement policy (CACRP) that reduces the number of cache misses from NVM and improves the cache performance for a hybrid memory system. Experimental results show that our CACRP behaves better in LLC performance, improving performance up to 43.6% (15.5% on average) compared to LRU.
Social Aspects of Photobooks: Improving Photobook Authoring from Large-Scale Multimedia Analysis
NASA Astrophysics Data System (ADS)
Sandhaus, Philipp; Boll, Susanne
With photo albums we aim to capture personal events such as weddings, vacations, and parties of family and friends. By arranging photo prints, captions and paper souvenirs such as tickets over the pages of a photobook we tell a story to capture and share our memories. The photo memories captured in such a photobook tell us much about the content and the relevance of the photos for the user. The way in which we select photos and arrange them in the photo album reveal a lot about the events, persons and places on the photos: captions describe content, closeness and arrangement of photos express relations between photos and their content and especially about the social relations of the author and the persons present in the album. Nowadays the process of photo album authoring has become digital, photos and texts can be arranged and laid out with the help of authoring tools in a digital photo album which can be printed as a physical photobook. In this chapter we present results of the analysis of a large repository of digitally mastered photobooks to learn about their social aspects. We explore to which degree a social aspect can be identified and how expressive and vivid different classes of photobooks are. The photobooks are anonymized, real world photobooks from customers of our industry partner CeWe Color. The knowledge gained from this social photobook analysis is meant both to better understand how people author their photobooks and to improve the automatic selection of and layout of photobooks.
Location-Unbound Color-Shape Binding Representations in Visual Working Memory.
Saiki, Jun
2016-02-01
The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.
Echterhoff, Gerald; Kopietz, René; Higgins, E Tory
2017-06-01
Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Angular default mode network connectivity across working memory load.
Vatansever, D; Manktelow, A E; Sahakian, B J; Menon, D K; Stamatakis, E A
2017-01-01
Initially identified during no-task, baseline conditions, it has now been suggested that the default mode network (DMN) engages during a variety of working memory paradigms through its flexible interactions with other large-scale brain networks. Nevertheless, its contribution to whole-brain connectivity dynamics across increasing working memory load has not been explicitly assessed. The aim of our study was to determine which DMN hubs relate to working memory task performance during an fMRI-based n-back paradigm with parametric increases in difficulty. Using a voxel-wise metric, termed the intrinsic connectivity contrast (ICC), we found that the bilateral angular gyri (core DMN hubs) displayed the greatest change in global connectivity across three levels of n-back task load. Subsequent seed-based functional connectivity analysis revealed that the angular DMN regions robustly interact with other large-scale brain networks, suggesting a potential involvement in the global integration of information. Further support for this hypothesis comes from the significant correlations we found between angular gyri connectivity and reaction times to correct responses. The implication from our study is that the DMN is actively involved during the n-back task and thus plays an important role related to working memory, with its core angular regions contributing to the changes in global brain connectivity in response to increasing environmental demands. Hum Brain Mapp 38:41-52, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Single Sided Messaging v. 0.6.6
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curry, Matthew Leon; Farmer, Matthew Shane; Hassani, Amin
Single-Sided Messaging (SSM) is a portable, multitransport networking library that enables applications to leverage potential one-sided capabilities of underlying network transports. It also provides desirable semantics that services for highperformance, massively parallel computers can leverage, such as an explicit cancel operation for pending transmissions, as well as enhanced matching semantics favoring large numbers of buffers attached to a single match entry. This release supports TCP/IP, shared memory, and Infiniband.
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement
2015-01-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
The developmental influence of primary memory capacity on working memory and academic achievement.
Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L
2015-08-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong
Recent trends of CMOS scaling and increasing number of on-chip cores have led to a large increase in the size of on-chip caches. Since SRAM has low density and consumes large amount of leakage power, its use in designing on-chip caches has become more challenging. To address this issue, researchers are exploring the use of several emerging memory technologies, such as embedded DRAM, spin transfer torque RAM, resistive RAM, phase change RAM and domain wall memory. In this paper, we survey the architectural approaches proposed for designing memory systems and, specifically, caches with these emerging memory technologies. To highlight theirmore » similarities and differences, we present a classification of these technologies and architectural approaches based on their key characteristics. We also briefly summarize the challenges in using these technologies for architecting caches. We believe that this survey will help the readers gain insights into the emerging memory device technologies, and their potential use in designing future computing systems.« less
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.
Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad; Shi, Lin; Li, Keqin
2015-01-01
Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures. Due to the ever increasing sizes of sequence databases, there is increasing demand to accelerate this task. In this paper, we demonstrate how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm. To fully exploit the GPU's capabilities for accelerating MAFFT, we have optimized the sequence data organization to eliminate the bandwidth bottleneck of memory access, designed a memory allocation and reuse strategy to make full use of limited memory of GPUs, proposed a new modified-run-length encoding (MRLE) scheme to reduce memory consumption, and used high-performance shared memory to speed up I/O operations. Our implementation tested in three NVIDIA GPUs achieves speedup up to 11.28 on a Tesla K20m GPU compared to the sequential MAFFT 7.015.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Measuring Transactiving Memory Systems Using Network Analysis
ERIC Educational Resources Information Center
King, Kylie Goodell
2017-01-01
Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…
Operator Influence of Unexploded Ordnance Sensor Technologies
2007-03-01
chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX
From calls to communities: a model for time-varying social networks
NASA Astrophysics Data System (ADS)
Laurent, Guillaume; Saramäki, Jari; Karsai, Márton
2015-11-01
Social interactions vary in time and appear to be driven by intrinsic mechanisms that shape the emergent structure of social networks. Large-scale empirical observations of social interaction structure have become possible only recently, and modelling their dynamics is an actual challenge. Here we propose a temporal network model which builds on the framework of activity-driven time-varying networks with memory. The model integrates key mechanisms that drive the formation of social ties - social reinforcement, focal closure and cyclic closure, which have been shown to give rise to community structure and small-world connectedness in social networks. We compare the proposed model with a real-world time-varying network of mobile phone communication, and show that they share several characteristics from heterogeneous degrees and weights to rich community structure. Further, the strong and weak ties that emerge from the model follow similar weight-topology correlations as real-world social networks, including the role of weak ties.
Genetic Complexity of Episodic Memory: A Twin Approach to Studies of Aging
Kremen, William S.; Spoon, Kelly M.; Jacobson, Kristen C.; Vasilopoulos, Terrie; McCaffery, Jeanne M.; Panizzon, Matthew S.; Franz, Carol E.; Vuoksimaa, Eero; Xian, Hong; Rana, Brinda K.; Toomey, Rosemary; McKenzie, Ruth; Lyons, Michael J.
2016-01-01
Episodic memory change is a central issue in cognitive aging, and understanding that process will require elucidation of its genetic underpinnings. A key limiting factor in genetically informed research on memory has been lack of attention to genetic and phenotypic complexity, as if “memory is memory” and all well-validated assessments are essentially equivalent. Here we applied multivariate twin models to data from late-middle-aged participants in the Vietnam Era Twin Study of Aging to examine the genetic architecture of 6 measures from 3 standard neuropsychological tests: the California Verbal Learning Test-2, and Wechsler Memory Scale-III Logical Memory (LM) and Visual Reproductions (VR). An advantage of the twin method is that it can estimate the extent to which latent genetic influences are shared or independent across different measures before knowing which specific genes are involved. The best-fitting model was a higher order common pathways model with a heritable higher order general episodic memory factor and three test-specific subfactors. More importantly, substantial genetic variance was accounted for by genetic influences that were specific to the latent LM and VR subfactors (28% and 30%, respectively) and independent of the general factor. Such unique genetic influences could partially account for replication failures. Moreover, if different genes influence different memory phenotypes, they could well have different age-related trajectories. This approach represents an important step toward providing critical information for all types of genetically informative studies of aging and memory. PMID:24956007
Concurrent working memory load can facilitate selective attention: evidence for specialized load.
Park, Soojin; Kim, Min-Shik; Chun, Marvin M
2007-10-01
Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA
Optimizing ROOT’s Performance Using C++ Modules
NASA Astrophysics Data System (ADS)
Vassilev, Vassil
2017-10-01
ROOT comes with a C++ compliant interpreter cling. Cling needs to understand the content of the libraries in order to interact with them. Exposing the full shared library descriptors to the interpreter at runtime translates into increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on the runtime performance. Present state-of-art lazy parsing technique brings the runtime performance to reasonable levels but proves to be fragile and can introduce correctness issues. An elegant solution is to load information from the descriptor lazily and in a non-recursive way. The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The feature is standardized as a C++ technical specification. C++ Modules are a flexible concept, which can be employed to match CMS and other experiments’ requirement for ROOT: to optimize both runtime memory usage and performance. Cling technically “inherits” the feature, however tweaking it to ROOT scale and beyond is a complex endeavor. The paper discusses the status of the C++ Modules in the context of ROOT, supported by few preliminary performance results. It shows a step-by-step migration plan and describes potential challenges which could appear.
A fast low-power optical memory based on coupled micro-ring lasers
NASA Astrophysics Data System (ADS)
Hill, Martin T.; Dorren, Harmen J. S.; de Vries, Tjibbe; Leijtens, Xaveer J. M.; den Besten, Jan Hendrik; Smalbrugge, Barry; Oei, Yok-Siang; Binsma, Hans; Khoe, Giok-Djan; Smit, Meint K.
2004-11-01
The increasing speed of fibre-optic-based telecommunications has focused attention on high-speed optical processing of digital information. Complex optical processing requires a high-density, high-speed, low-power optical memory that can be integrated with planar semiconductor technology for buffering of decisions and telecommunication data. Recently, ring lasers with extremely small size and low operating power have been made, and we demonstrate here a memory element constructed by interconnecting these microscopic lasers. Our device occupies an area of 18 × 40µm2 on an InP/InGaAsP photonic integrated circuit, and switches within 20ps with 5.5fJ optical switching energy. Simulations show that the element has the potential for much smaller dimensions and switching times. Large numbers of such memory elements can be densely integrated and interconnected on a photonic integrated circuit: fast digital optical information processing systems employing large-scale integration should now be viable.
Flexible Language Constructs for Large Parallel Programs
Rosing, Matt; Schnabel, Robert
1994-01-01
The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Electronic device aspects of neural network memories
NASA Technical Reports Server (NTRS)
Lambe, J.; Moopenn, A.; Thakoor, A. P.
1985-01-01
The basic issues related to the electronic implementation of the neural network model (NNM) for content addressable memories are examined. A brief introduction to the principles of the NNM is followed by an analysis of the information storage of the neural network in the form of a binary connection matrix and the recall capability of such matrix memories based on a hardware simulation study. In addition, materials and device architecture issues involved in the future realization of such networks in VLSI-compatible ultrahigh-density memories are considered. A possible space application of such devices would be in the area of large-scale information storage without mechanical devices.
Threshold-voltage modulated phase change heterojunction for application of high density memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Baihan; Tong, Hao, E-mail: tonghao@hust.edu.cn; Qian, Hang
2015-09-28
Phase change random access memory is one of the most important candidates for the next generation non-volatile memory technology. However, the ability to reduce its memory size is compromised by the fundamental limitations inherent in the CMOS technology. While 0T1R configuration without any additional access transistor shows great advantages in improving the storage density, the leakage current and small operation window limit its application in large-scale arrays. In this work, phase change heterojunction based on GeTe and n-Si is fabricated to address those problems. The relationship between threshold voltage and doping concentration is investigated, and energy band diagrams and X-raymore » photoelectron spectroscopy measurements are provided to explain the results. The threshold voltage is modulated to provide a large operational window based on this relationship. The switching performance of the heterojunction is also tested, showing a good reverse characteristic, which could effectively decrease the leakage current. Furthermore, a reliable read-write-erase function is achieved during the tests. Phase change heterojunction is proposed for high-density memory, showing some notable advantages, such as modulated threshold voltage, large operational window, and low leakage current.« less
Nascimento, Thiago D; DosSantos, Marcos F; Danciu, Theodora; DeBoer, Misty; van Holsbeeck, Hendrik; Lucas, Sarah R; Aiello, Christine; Khatib, Leen; Bender, MaryCatherine A; Zubieta, Jon-Kar; DaSilva, Alexandre F
2014-04-03
Although population studies have greatly improved our understanding of migraine, they have relied on retrospective self-reports that are subject to memory error and experimenter-induced bias. Furthermore, these studies also lack specifics from the actual time that attacks were occurring, and how patients express and share their ongoing suffering. As technology and language constantly evolve, so does the way we share our suffering. We sought to evaluate the infodemiology of self-reported migraine headache suffering on Twitter. Trained observers in an academic setting categorized the meaning of every single "migraine" tweet posted during seven consecutive days. The main outcome measures were prevalence, life-style impact, linguistic, and timeline of actual self-reported migraine headache suffering on Twitter. From a total of 21,741 migraine tweets collected, only 64.52% (14,028/21,741 collected tweets) were from users reporting their migraine headache attacks in real-time. The remainder of the posts were commercial, re-tweets, general discussion or third person's migraine, and metaphor. The gender distribution available for the actual migraine posts was 73.47% female (10,306/14,028), 17.40% males (2441/14,028), and 0.01% transgendered (2/14,028). The personal impact of migraine headache was immediate on mood (43.91%, 6159/14,028), productivity at work (3.46%, 486/14,028), social life (3.45%, 484/14,028), and school (2.78%, 390/14,028). The most common migraine descriptor was "Worst" (14.59%, 201/1378) and profanity, the "F-word" (5.3%, 73/1378). The majority of postings occurred in the United States (58.28%, 3413/5856), peaking on weekdays at 10:00h and then gradually again at 22:00h; the weekend had a later morning peak. Twitter proved to be a powerful source of knowledge for migraine research. The data in this study overlap large-scale epidemiological studies, avoiding memory bias and experimenter-induced error. Furthermore, linguistics of ongoing migraine reports on social media proved to be highly heterogeneous and colloquial in our study, suggesting that current pain questionnaires should undergo constant reformulations to keep up with modernization in the expression of pain suffering in our society. In summary, this study reveals the modern characteristics and broad impact of migraine headache suffering on patients' lives as it is spontaneously shared via social media.
DMA shared byte counters in a parallel computer
Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos
2010-04-06
A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Cache Coherence Protocols for Large-Scale Multiprocessors
1990-09-01
and is compared with the other protocols for large-scale machines. In later analysis, this coherence method is designated by the acronym OCPD , which...private read misses 2 6 6 ( OCPD ) private write misses 2 6 6 Table 4.2: Transaction Types and Costs. the performance of the memory system. These...methodologies. Figure 4-2 shows the processor utiliza- tions of the Weather program, with special code in the dyn-nic post-mortem sched- 94 OCPD DlrINB
Welcoming nora: a family event.
Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T
2011-01-01
In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.
ERIC Educational Resources Information Center
Newton, Warren P.; Lefebvre, Ann; Donahue, Katrina E.; Bacon, Thomas; Dobson, Allen
2010-01-01
Introduction: Little is known regarding how to accomplish large-scale health care improvement. Our goal is to improve the quality of chronic disease care in all primary care practices throughout North Carolina. Methods: Methods for improvement include (1) common quality measures and shared data system; (2) rapid cycle improvement principles; (3)…
ERIC Educational Resources Information Center
Lockheed, Marlaine E.
2015-01-01
The number of countries that regularly participate in international large-scale assessments has increased sharply over the past 15 years, with the share of countries participating in the Programme for International Student Assessment growing from one-fifth of countries in 2000 to over one-third of countries in 2015. What accounts for this…
Scaling Law of Urban Ride Sharing.
Tachet, R; Sagarra, O; Santi, P; Resta, G; Szell, M; Strogatz, S H; Ratti, C
2017-03-06
Sharing rides could drastically improve the efficiency of car and taxi transportation. Unleashing such potential, however, requires understanding how urban parameters affect the fraction of individual trips that can be shared, a quantity that we call shareability. Using data on millions of taxi trips in New York City, San Francisco, Singapore, and Vienna, we compute the shareability curves for each city, and find that a natural rescaling collapses them onto a single, universal curve. We explain this scaling law theoretically with a simple model that predicts the potential for ride sharing in any city, using a few basic urban quantities and no adjustable parameters. Accurate extrapolations of this type will help planners, transportation companies, and society at large to shape a sustainable path for urban growth.
Cortico-hippocampal systems involved in memory and cognition: the PMAT framework.
Ritchey, Maureen; Libby, Laura A; Ranganath, Charan
2015-01-01
In this chapter, we review evidence that the cortical pathways to the hippocampus appear to extend from two large-scale cortical systems: a posterior medial (PM) system that includes the parahippocampal cortex and retrosplenial cortex, and an anterior temporal (AT) system that includes the perirhinal cortex. This "PMAT" framework accounts for differences in the anatomical and functional connectivity of the medial temporal lobes, which may underpin differences in cognitive function between the systems. The PM and AT systems make distinct contributions to memory and to other cognitive domains, and convergent findings suggest that they are involved in processing information about contexts and items, respectively. In order to support the full complement of memory-guided behavior, the two systems must interact, and the hippocampal and ventromedial prefrontal cortex may serve as sites of integration between the two systems. We conclude that when considering the "connected hippocampus," inquiry should extend beyond the medial temporal lobes to include the large-scale cortical systems of which they are a part. © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ivkin, N.; Liu, Z.; Yang, L. F.; Kumar, S. S.; Lemson, G.; Neyrinck, M.; Szalay, A. S.; Braverman, V.; Budavari, T.
2018-04-01
Cosmological N-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper (Liu et al., 2015) proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to 109 particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve performance and scalability. Our rigorous analysis of the sketch parameters improves the previous results from finding the centers of the 103 largest halos (Liu et al., 2015) to ∼ 104 - 105, and reveals the trade-offs between memory, running time and number of halos. Our experiments show that our tool can scale to datasets with up to ∼ 1012 particles while using less than an hour of running time on a single GPU Nvidia GTX 1080.
Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion
ERIC Educational Resources Information Center
Pepin-Wakefield, Yvonne
2009-01-01
This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…
ERIC Educational Resources Information Center
Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim
2009-01-01
This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
LAVA: Large scale Automated Vulnerability Addition
2016-05-23
memory copy, e.g., are reasonable attack points. If the goal is to inject divide- by-zero, then arithmetic operations involving division will be...ways. First, it introduces deterministic record and replay , which can be used for iterated and expensive analyses that cannot be performed online... memory . Since our approach records the correspondence between source lines and program basic block execution, it would be just as easy to figure out
Quantum teleportation between remote atomic-ensemble quantum memories
Bao, Xiao-Hui; Xu, Xiao-Fan; Li, Che-Ming; Yuan, Zhen-Sheng; Lu, Chao-Yang; Pan, Jian-Wei
2012-01-01
Quantum teleportation and quantum memory are two crucial elements for large-scale quantum networks. With the help of prior distributed entanglement as a “quantum channel,” quantum teleportation provides an intriguing means to faithfully transfer quantum states among distant locations without actual transmission of the physical carriers [Bennett CH, et al. (1993) Phys Rev Lett 70(13):1895–1899]. Quantum memory enables controlled storage and retrieval of fast-flying photonic quantum bits with stationary matter systems, which is essential to achieve the scalability required for large-scale quantum networks. Combining these two capabilities, here we realize quantum teleportation between two remote atomic-ensemble quantum memory nodes, each composed of ∼108 rubidium atoms and connected by a 150-m optical fiber. The spin wave state of one atomic ensemble is mapped to a propagating photon and subjected to Bell state measurements with another single photon that is entangled with the spin wave state of the other ensemble. Two-photon detection events herald the success of teleportation with an average fidelity of 88(7)%. Besides its fundamental interest as a teleportation between two remote macroscopic objects, our technique may be useful for quantum information transfer between different nodes in quantum networks and distributed quantum computing. PMID:23144222
An efficient photogrammetric stereo matching method for high-resolution images
NASA Astrophysics Data System (ADS)
Li, Yingsong; Zheng, Shunyi; Wang, Xiaonan; Ma, Hao
2016-12-01
Stereo matching of high-resolution images is a great challenge in photogrammetry. The main difficulty is the enormous processing workload that involves substantial computing time and memory consumption. In recent years, the semi-global matching (SGM) method has been a promising approach for solving stereo problems in different data sets. However, the time complexity and memory demand of SGM are proportional to the scale of the images involved, which leads to very high consumption when dealing with large images. To solve it, this paper presents an efficient hierarchical matching strategy based on the SGM algorithm using single instruction multiple data instructions and structured parallelism in the central processing unit. The proposed method can significantly reduce the computational time and memory required for large scale stereo matching. The three-dimensional (3D) surface is reconstructed by triangulating and fusing redundant reconstruction information from multi-view matching results. Finally, three high-resolution aerial date sets are used to evaluate our improvement. Furthermore, precise airborne laser scanner data of one data set is used to measure the accuracy of our reconstruction. Experimental results demonstrate that our method remarkably outperforms in terms of time and memory savings while maintaining the density and precision of the 3D cloud points derived.
Stillbirth and stigma: the spoiling and repair of multiple social identities.
Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan
This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.
Cross-indexing of binary SIFT codes for large-scale image search.
Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi
2014-05-01
In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
O'Shea, Deirdre M; Dotson, Vonetta M; Fieo, Robert A; Tsapanou, Angeliki; Zahodne, Laura; Stern, Yaakov
2016-07-01
To investigate whether self-efficacy moderates the association between self-rated memory and depressive symptoms in a large sample of older adults. The influence of self-efficacy and depressive symptoms on memory performance was also examined in a subsample of individuals who reported poor memory. Non-demented participants (n = 3766) were selected from the 2012 wave of the Health and Retirement Study. Depressive symptomatology was assessed with the 8-item Center for Epidemiologic Studies Depression Scale. A modified version of the Midlife Developmental Inventory Questionnaire was used as the measure of self-efficacy. Participants were asked to rate their memory presently on a five-point scale from Excellent (1) to Poor (5). Immediate memory and delayed memory (after a 5-min interval) were measured by the number of correct words recalled from a 10-item word list. Multiple regression analyses revealed that negative ratings of memory were significantly associated with greater levels of depressive symptoms, with this effect being greatest in those with low levels of self-efficacy. Additionally, greater self-efficacy was associated with optimal objective memory performances but only when depressive symptoms were low in individuals who reported poor memory function (n = 1196). Self-efficacy moderates the relationship between self-rated memory function and depressive symptoms. Higher self-efficacy may buffer against the impact of subjective memory difficulty on one's mood and thereby mitigating the effect of depressive symptoms on memory. Interventions should focus on increasing perceived self-efficacy in older adults reporting poor memory function to potentially minimize memory impairment. Copyright © 2015 John Wiley & Sons, Ltd.
Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin
2017-01-01
Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; ...
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
Stanley, Clayton; Byrne, Michael D
2016-12-01
The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Temporal pattern and memory in sediment transport in an experimental step-pool channel
NASA Astrophysics Data System (ADS)
Saletti, Matteo; Molnar, Peter; Zimmermann, André; Hassan, Marwan A.; Church, Michael; Burlando, Paolo
2015-04-01
In this work we study the complex dynamics of sediment transport and bed morphology in steep streams, using a dataset of experiments performed in a steep flume with natural sediment. High-resolution (1 sec) time series of sediment transport were measured for individual size classes at the outlet of the flume for different combinations of sediment input rates, discharges, and flume slopes. The data show that the relation between instantaneous discharge and sediment transport exhibits large variability on different levels. After dividing the time series into segments of constant water discharge, we quantify the statistical properties of transport rates by fitting the data with a Generalized Extreme Value distribution, whose 3 parameters are related to the average sediment flux. We analyze separately extreme events of transport rate in terms of their fractional composition; if only events of high magnitude are considered, coarse grains become the predominant component of the total sediment yield. We quantify the memory in grain size dependent sediment transport with variance scaling and autocorrelation analyses; more specifically, we study how the variance changes with different aggregation scales and how the autocorrelation coefficient changes with different time lags. Our results show that there is a tendency to an infinite memory regime in transport rate signals, which is limited by the intermittency of the largest fractions. Moreover, the structure of memory is both grain size-dependent and magnitude-dependent: temporal autocorrelation is stronger for small grain size fractions and when the average sediment transport rate is large. The short-term memory in coarse grain transport increases with temporal aggregation and this reveals the importance of the sampling frequency of bedload transport rates in natural streams, especially for large fractions.
Tactical Operations Analysis Support Facility.
1981-05-01
Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared
Cooperative system and method using mobile robots for testing a cooperative search controller
Byrne, Raymond H.; Harrington, John J.; Eskridge, Steven E.; Hurtado, John E.
2002-01-01
A test system for testing a controller provides a way to use large numbers of miniature mobile robots to test a cooperative search controller in a test area, where each mobile robot has a sensor, a communication device, a processor, and a memory. A method of using a test system provides a way for testing a cooperative search controller using multiple robots sharing information and communicating over a communication network.
General Recommendations on Fatigue Risk Management for the Canadian Forces
2010-04-01
missions performed in aviation require an individual(s) to process large amount of information in a short period of time and to do this on a continuous...information processing required during sustained operations can deteriorate an individual’s ability to perform a task. Given the high operational tempo...memory, which, in turn, is utilized to perform human thought processes (Baddeley, 2003). While various versions of this theory exist, they all share
Axiope tools for data management and data sharing.
Goddard, Nigel H; Cannon, Robert C; Howell, Fred W
2003-01-01
Many areas of biological research generate large volumes of very diverse data. Managing this data can be a difficult and time-consuming process, particularly in an academic environment where there are very limited resources for IT support staff such as database administrators. The most economical and efficient solutions are those that enable scientists with minimal IT expertise to control and operate their own desktop systems. Axiope provides one such solution, Catalyzer, which acts as flexible cataloging system for creating structured records describing digital resources. The user is able specify both the content and structure of the information included in the catalog. Information and resources can be shared by a variety of means, including automatically generated sets of web pages. Federation and integration of this information, where needed, is handled by Axiope's Mercat server. Where there is a need for standardization or compatibility of the structures usedby different researchers this canbe achieved later by applying user-defined mappings in Mercat. In this way, large-scale data sharing can be achieved without imposing unnecessary constraints or interfering with the way in which individual scientists choose to record and catalog their work. We summarize the key technical issues involved in scientific data management and data sharing, describe the main features and functionality of Axiope Catalyzer and Axiope Mercat, and discuss future directions and requirements for an information infrastructure to support large-scale data sharing and scientific collaboration.
Li, Ji-Qing; Zhang, Yu-Shan; Ji, Chang-Ming; Wang, Ai-Jing; Lund, Jay R
2013-01-01
This paper examines long-term optimal operation using dynamic programming for a large hydropower system of 10 reservoirs in Northeast China. Besides considering flow and hydraulic head, the optimization explicitly includes time-varying electricity market prices to maximize benefit. Two techniques are used to reduce the 'curse of dimensionality' of dynamic programming with many reservoirs. Discrete differential dynamic programming (DDDP) reduces the search space and computer memory needed. Object-oriented programming (OOP) and the ability to dynamically allocate and release memory with the C++ language greatly reduces the cumulative effect of computer memory for solving multi-dimensional dynamic programming models. The case study shows that the model can reduce the 'curse of dimensionality' and achieve satisfactory results.
Harding, Ian H; Yücel, Murat; Harrison, Ben J; Pantelis, Christos; Breakspear, Michael
2015-02-01
Cognitive control and working memory rely upon a common fronto-parietal network that includes the inferior frontal junction (IFJ), dorsolateral prefrontal cortex (dlPFC), pre-supplementary motor area/dorsal anterior cingulate cortex (pSMA/dACC), and intraparietal sulcus (IPS). This network is able to flexibly adapt its function in response to changing behavioral goals, mediating a wide range of cognitive demands. Here we apply dynamic causal modeling to functional magnetic resonance imaging data to characterize task-related alterations in the strength of network interactions across distinct cognitive processes. Evidence in favor of task-related connectivity dynamics was accrued across a very large space of possible network structures. Cognitive control and working memory demands were manipulated using a factorial combination of the multi-source interference task and a verbal 2-back working memory task, respectively. Both were found to alter the sensitivity of the IFJ to perceptual information, and to increase IFJ-to-pSMA/dACC connectivity. In contrast, increased connectivity from the pSMA/dACC to the IPS, as well as from the dlPFC to the IFJ, was uniquely driven by cognitive control demands; a task-induced negative influence of the dlPFC on the pSMA/dACC was specific to working memory demands. These results reflect a system of both shared and unique context-dependent dynamics within the fronto-parietal network. Mechanisms supporting cognitive engagement, response selection, and action evaluation may be shared across cognitive domains, while dynamic updating of task and context representations within this network are potentially specific to changing demands on cognitive control. Copyright © 2014 Elsevier Inc. All rights reserved.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
A depth-first search algorithm to compute elementary flux modes by linear programming.
Quek, Lake-Ee; Nielsen, Lars K
2014-07-30
The decomposition of complex metabolic networks into elementary flux modes (EFMs) provides a useful framework for exploring reaction interactions systematically. Generating a complete set of EFMs for large-scale models, however, is near impossible. Even for moderately-sized models (<400 reactions), existing approaches based on the Double Description method must iterate through a large number of combinatorial candidates, thus imposing an immense processor and memory demand. Based on an alternative elementarity test, we developed a depth-first search algorithm using linear programming (LP) to enumerate EFMs in an exhaustive fashion. Constraints can be introduced to directly generate a subset of EFMs satisfying the set of constraints. The depth-first search algorithm has a constant memory overhead. Using flux constraints, a large LP problem can be massively divided and parallelized into independent sub-jobs for deployment into computing clusters. Since the sub-jobs do not overlap, the approach scales to utilize all available computing nodes with minimal coordination overhead or memory limitations. The speed of the algorithm was comparable to efmtool, a mainstream Double Description method, when enumerating all EFMs; the attrition power gained from performing flux feasibility tests offsets the increased computational demand of running an LP solver. Unlike the Double Description method, the algorithm enables accelerated enumeration of all EFMs satisfying a set of constraints.
A shared resource between declarative memory and motor memory.
Keisler, Aysha; Shadmehr, Reza
2010-11-03
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory
Keisler, Aysha; Shadmehr, Reza
2010-01-01
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
A polymer/semiconductor write-once read-many-times memory
NASA Astrophysics Data System (ADS)
Möller, Sven; Perlov, Craig; Jackson, Warren; Taussig, Carl; Forrest, Stephen R.
2003-11-01
Organic devices promise to revolutionize the extent of, and access to, electronics by providing extremely inexpensive, lightweight and capable ubiquitous components that are printed onto plastic, glass or metal foils. One key component of an electronic circuit that has thus far received surprisingly little attention is an organic electronic memory. Here we report an architecture for a write-once read-many-times (WORM) memory, based on the hybrid integration of an electrochromic polymer with a thin-film silicon diode deposited onto a flexible metal foil substrate. WORM memories are desirable for ultralow-cost permanent storage of digital images, eliminating the need for slow, bulky and expensive mechanical drives used in conventional magnetic and optical memories. Our results indicate that the hybrid organic/inorganic memory device is a reliable means for achieving rapid, large-scale archival data storage. The WORM memory pixel exploits a mechanism of current-controlled, thermally activated un-doping of a two-component electrochromic conducting polymer.
Discrete-Slots Models of Visual Working-Memory Response Times
Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.
2014-01-01
Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
CERN data services for LHC computing
NASA Astrophysics Data System (ADS)
Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.
2017-10-01
Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.
ERIC Educational Resources Information Center
Schweppe, Judith; Rummer, Ralf
2007-01-01
The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…
The Precategorical Nature of Visual Short-Term Memory
ERIC Educational Resources Information Center
Quinlan, Philip T.; Cohen, Dale J.
2016-01-01
We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…
NASA Technical Reports Server (NTRS)
Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.
1994-01-01
The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Formation of large-scale structure from cosmic-string loops and cold dark matter
NASA Technical Reports Server (NTRS)
Melott, Adrian L.; Scherrer, Robert J.
1987-01-01
Some results from a numerical simulation of the formation of large-scale structure from cosmic-string loops are presented. It is found that even though G x mu is required to be lower than 2 x 10 to the -6th (where mu is the mass per unit length of the string) to give a low enough autocorrelation amplitude, there is excessive power on smaller scales, so that galaxies would be more dense than observed. The large-scale structure does not include a filamentary or connected appearance and shares with more conventional models based on Gaussian perturbations the lack of cluster-cluster correlation at the mean cluster separation scale as well as excessively small bulk velocities at these scales.
Taking Teacher Learning to Scale: Sharing Knowledge and Spreading Ideas across Geographies
ERIC Educational Resources Information Center
Klein, Emily J.; Jaffe-Walter, Reva; Riordan, Megan
2016-01-01
This research reports data from case studies of three intermediary organizations facing the challenge of scaling up teacher learning. The turn of the century launched scaling-up efforts of all three intermediaries, growing from intimate groups, where founding teachers and staff were key supports for teacher learning, to large multistate…
Hawkins, Keith A; Tulsky, David S
2004-06-01
Within discrepancy analysis differences between scores are examined for abnormality. Although larger differences are generally associated with rising impairment probabilities, the relationship between discrepancy size and abnormality varies across score pairs in relation to the correlation between the contrasted scores in normal subjects. Examinee ability level also affects the size of discrepancies observed normally. Wechsler Memory Scale-Third Edition (WMS-III) visual index scores correlate only modestly with other Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) and WMS-III index scores; consequently, differences between these scores and others have to be very large before they become unusual, especially for subjects of higher intelligence. The substitution of the Faces subtest by Visual Reproductions within visual memory indexes formed by the combination of WMS-III visual subtests (creating immediate recall, delayed recall, and combined immediate and delayed index scores) results in higher correlation coefficients, and a decline in the discrepancy size required to surpass base rate thresholds for probable impairment. This gain appears not to occur at the cost of a diminished sensitivity to diverse pathologies. New WMS-III discrepancy base rate data are supplied to complement those currently available to clinicians.
SmallTool - a toolkit for realizing shared virtual environments on the Internet
NASA Astrophysics Data System (ADS)
Broll, Wolfgang
1998-09-01
With increasing graphics capabilities of computers and higher network communication speed, networked virtual environments have become available to a large number of people. While the virtual reality modelling language (VRML) provides users with the ability to exchange 3D data, there is still a lack of appropriate support to realize large-scale multi-user applications on the Internet. In this paper we will present SmallTool, a toolkit to support shared virtual environments on the Internet. The toolkit consists of a VRML-based parsing and rendering library, a device library, and a network library. This paper will focus on the networking architecture, provided by the network library - the distributed worlds transfer and communication protocol (DWTP). DWTP provides an application-independent network architecture to support large-scale multi-user environments on the Internet.
Collaborative Working for Large Digitisation Projects
ERIC Educational Resources Information Center
Yeates, Robin; Guy, Damon
2006-01-01
Purpose: To explore the effectiveness of large-scale consortia for disseminating local heritage via the web. To describe the creation of a large geographically based cultural heritage consortium in the South East of England and management lessons resulting from a major web site digitisation project. To encourage the improved sharing of experience…
Webinar July 28: H2@Scale - A Potential Opportunity | News | NREL
role of hydrogen at the grid scale and the efforts of a large, national lab team assembled to evaluate the potential of hydrogen to play a critical role in our energy future. Presenters will share facts
Lee, Sylvia E.; Kibby, Michelle Y.; Cohen, Morris J.; Stanford, Lisa; Park, Yong; Strickland, Suzanne
2016-01-01
Prior research has shown that attention-deficit/hyperactivity disorder (ADHD) and epilepsy are frequently comorbid and that both disorders are associated with various attention and memory problems. Nonetheless, limited research has been conducted comparing the two disorders in one sample to determine unique versus shared deficits. Hence, we investigated differences in working memory and short-term and delayed recall between children with ADHD, focal epilepsy of mixed foci, comorbid ADHD/epilepsy and controls. Participants were compared on the Core subtests and the Picture Locations subtest of the Children’s Memory Scale (CMS). Results indicated that children with ADHD displayed intact verbal working memory and long-term memory (LTM), as well as intact performance on most aspects of short-term memory (STM). They performed worse than controls on Numbers Forward and Picture Locations, suggesting problems with focused attention and simple span for visual-spatial material. Conversely, children with epilepsy displayed poor focused attention and STM regardless of modality assessed, which affected encoding into LTM. The only loss over time was found for passages (Stories). Working memory was intact. Children with comorbid ADHD/epilepsy displayed focused attention and STM/LTM problems consistent with both disorders, having the lowest scores across the four groups. Hence, focused attention and visual-spatial span appear to be affected in both disorders, whereas additional STM/encoding problems are specific to epilepsy. Children with comorbid ADHD/epilepsy have deficits consistent with both disorders, with slight additive effects. This study suggests that attention and memory testing should be a regular part of the evaluation of children with epilepsy and ADHD. PMID:26156331
Volk, Carol J; Lucero, Yasmin; Barnas, Katie
2014-05-01
Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents (n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.
NASA Astrophysics Data System (ADS)
Volk, Carol J.; Lucero, Yasmin; Barnas, Katie
2014-05-01
Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents ( n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.
Optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1979-01-01
High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Crystallographic and general use programs for the XDS Sigma 5 computer
NASA Technical Reports Server (NTRS)
Snyder, R. L.
1973-01-01
Programs in basic FORTRAN 4 are described, which fall into three catagories: (1) interactive programs to be executed under time sharing (BTM); (2) non interactive programs which are executed in batch processing mode (BPM); and (3) large non interactive programs which require more memory than is available in the normal BPM/BTM operating system and must be run overnight on a special system called XRAY which releases about 45,000 words of memory to the user. Programs in catagories (1) and (2) are stored as FORTRAN source files in the account FSNYDER. Programs in catagory (3) are stored in the XRAY system as load modules. The type of file in account FSNYDER is identified by the first two letters in the name.
A Large-scale Distributed Indexed Learning Framework for Data that Cannot Fit into Memory
2015-03-27
learn a classifier. Integrating three learning techniques (online, semi-supervised and active learning ) together with a selective sampling with minimum communication between the server and the clients solved this problem.
Roudi, Yasser; Latham, Peter E
2007-09-01
A fundamental problem in neuroscience is understanding how working memory--the ability to store information at intermediate timescales, like tens of seconds--is implemented in realistic neuronal networks. The most likely candidate mechanism is the attractor network, and a great deal of effort has gone toward investigating it theoretically. Yet, despite almost a quarter century of intense work, attractor networks are not fully understood. In particular, there are still two unanswered questions. First, how is it that attractor networks exhibit irregular firing, as is observed experimentally during working memory tasks? And second, how many memories can be stored under biologically realistic conditions? Here we answer both questions by studying an attractor neural network in which inhibition and excitation balance each other. Using mean-field analysis, we derive a three-variable description of attractor networks. From this description it follows that irregular firing can exist only if the number of neurons involved in a memory is large. The same mean-field analysis also shows that the number of memories that can be stored in a network scales with the number of excitatory connections, a result that has been suggested for simple models but never shown for realistic ones. Both of these predictions are verified using simulations with large networks of spiking neurons.
Reader set encoding for directory of shared cache memory in multiprocessor system
Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang
2014-06-10
In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Insights on consciousness from taste memory research.
Gallo, Milagros
2016-01-01
Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.
NASA Astrophysics Data System (ADS)
Most, S.; Dentz, M.; Bolster, D.; Bijeljic, B.; Nowak, W.
2017-12-01
Transport in real porous media shows non-Fickian characteristics. In the Lagrangian perspective this leads to skewed distributions of particle arrival times. The skewness is triggered by particles' memory of velocity that persists over a characteristic length. Capturing process memory is essential to represent non-Fickianity thoroughly. Classical non-Fickian models (e.g., CTRW models) simulate the effects of memory but not the mechanisms leading to process memory. CTRWs have been applied successfully in many studies but nonetheless they have drawbacks. In classical CTRWs each particle makes a spatial transition for which each particle adapts a random transit time. Consecutive transit times are drawn independently from each other, and this is only valid for sufficiently large spatial transitions. If we want to apply a finer numerical resolution than that, we have to implement memory into the simulation. Recent CTRW methods use transitions matrices to simulate correlated transit times. However, deriving such transition matrices require transport data of a fine-scale transport simulation, and the obtained transition matrix is solely valid for this single Péclet regime. The CTRW method we propose overcomes all three drawbacks: 1) We simulate transport without restrictions in transition length. 2) We parameterize our CTRW without requiring a transport simulation. 3) Our parameterization scales across Péclet regimes. We do so by sampling the pore-scale velocity distribution to generate correlated transit times as a Lévy flight on the CDF-axis of velocities with reflection at 0 and 1. The Lévy flight is parametrized only by the correlation length. We explicitly model memory including the evolution and decay of non-Fickianity, so it extends from local via pre-asymptotic to asymptotic scales.
A Simulation System Based on the Actor Paradigm
1988-02-01
of the protocol. Shared memory communication requires the programmer to wait and signal semaphores explicitly to synchronize the communicating parties...wide range of possibilities within the same basic protocol. - The simplicity of the primitive operation set affords those creating new operations...more flexibility (Ada has a large and complicated primitive set). -3- II I I I B A I -I I I I I . 0 1 2 3 4 5 Time 0: Both processes A and B are
Wang, Kun; Yu, Chunshui; Xu, Lijuan; Qin, Wen; Li, Kuncheng; Xu, Lin; Jiang, Tianzi
2009-01-01
Spontaneous thought processes (STPs), also called daydreaming or mind-wandering, occur ubiquitously in daily life. However, the functional significance of STPs remains largely unknown. Using functional magnetic resonance imaging (fMRI), we first identified an STPs-network whose activity was positively correlated with the subjects' tendency of having STPs during a task-free state. The STPs-network was then found to be strongly associated with the default network, which has previously been established as being active during the task-free state. Interestingly, we found that offline reprocessing of previously memorized information further increased the activity of the STPs-network regions, although during a state with less STPs. In addition, we found that the STPs-network kept a dynamic balance between functional integration and functional separation among its component regions to execute offline memory reprocessing in STPs. These findings strengthen a view that offline memory reprocessing and STPs share the brain's default network, and thus implicate that offline memory reprocessing may be a predetermined function of STPs. This supports the perspective that memory can be consolidated and modified during STPs, and thus gives rise to a dynamic behavior dependent on both previous external and internal experiences.
Benjamin, Aaron S.
2011-01-01
It is widely assumed that older adults suffer a deficit in the psychological processes that underlie remembering of contextual or source information. This conclusion is based in large part on empirical interactions, including disordinal ones, that reveal differential effects of manipulations of memory strength on recognition in young and old subjects. This paper lays out an alternative theory that takes as a starting point the overwhelming evidence from the psychometric literature that the effects of age on memory share a single mediating influence. Thus, the theory assumes no differences between younger and older subjects other than a global difference in memory fidelity—that is, the older subjects are presumed to have less valid representations of events and objects than are young subjects. The theory is articulated through three major assumptions and implemented in a computational model, DRYAD, in order to simulate fundamental results in the literature on aging and recognition, including the very interactions taken to imply selective impairment in older people. The theoretical perspective presented here allows for a critical examination of the widely held belief that aging entails the selective disruption of particular memory processes. PMID:20822289
Investigating Ground Swarm Robotics Using Agent Based Simulation
2006-12-01
Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
CATEGORY-SPECIFIC SEMANTIC MEMORY: CONVERGING EVIDENCE FROM BOLD fMRI AND ALZHEIMER’S DISEASE
Grossman, Murray; Peelle, Jonathan E.; Smith, Edward E.; McMillan, Corey T.; Cook, Philip; Powers, John; Dreyfuss, Michael; Bonner, Michael F.; Richmond, Lauren; Boller, Ashley; Camp, Emily; Burkholder, Lisa
2012-01-01
Patients with Alzheimer’s disease have category-specific semantic memory difficulty for natural relative to manufactured objects. We assessed the basis for this deficit by asking healthy adults and patients to judge whether pairs of words share a feature (e.g. “banana:lemon – COLOR”). In an fMRI study, healthy adults showed gray matter (GM) activation of temporal-occipital cortex (TOC) where visual-perceptual features may be represented, and prefrontal cortex (PFC) which may contribute to feature selection. Tractography revealed dorsal and ventral stream white matter (WM) projections between PFC and TOC. Patients had greater difficulty with natural than manufactured objects. This was associated with greater overlap between diseased GM areas correlated with natural kinds in patients and fMRI activation in healthy adults for natural than manufactured artifacts, and the dorsal WM projection between PFC and TOC in patients correlated only with judgments of natural kinds. Patients thus remained dependent on the same neural network as controls during judgments of natural kinds, despite disease in these areas. For manufactured objects, patients’ judgments showed limited correlations with PFC and TOC GM areas activated by controls, and did not correlate with the PFC-TOC dorsal WM tract. Regions outside of the PFC–TOC network thus may help support patients’ judgments of manufactured objects. We conclude that a large-scale neural network for semantic memory implicates both feature knowledge representations in modality-specific association cortex and heteromodal regions important for accessing this knowledge, and that patients’ relative deficit for natural kinds is due in part to their dependence on this network despite disease in these areas. PMID:23220494
Category-specific semantic memory: converging evidence from bold fMRI and Alzheimer's disease.
Grossman, Murray; Peelle, Jonathan E; Smith, Edward E; McMillan, Corey T; Cook, Philip; Powers, John; Dreyfuss, Michael; Bonner, Michael F; Richmond, Lauren; Boller, Ashley; Camp, Emily; Burkholder, Lisa
2013-03-01
Patients with Alzheimer's disease have category-specific semantic memory difficulty for natural relative to manufactured objects. We assessed the basis for this deficit by asking healthy adults and patients to judge whether pairs of words share a feature (e.g. "banana:lemon-COLOR"). In an fMRI study, healthy adults showed gray matter (GM) activation of temporal-occipital cortex (TOC) where visual-perceptual features may be represented, and prefrontal cortex (PFC) which may contribute to feature selection. Tractography revealed dorsal and ventral stream white matter (WM) projections between PFC and TOC. Patients had greater difficulty with natural than manufactured objects. This was associated with greater overlap between diseased GM areas correlated with natural kinds in patients and fMRI activation in healthy adults for natural kinds. The dorsal WM projection between PFC and TOC in patients correlated only with judgments of natural kinds. Patients thus remained dependent on the same neural network as controls during judgments of natural kinds, despite disease in these areas. For manufactured objects, patients' judgments showed limited correlations with PFC and TOC GM areas activated by controls, and did not correlate with the PFC-TOC dorsal WM tract. Regions outside of the PFC-TOC network thus may help support patients' judgments of manufactured objects. We conclude that a large-scale neural network for semantic memory implicates both feature knowledge representations in modality-specific association cortex and heteromodal regions important for accessing this knowledge, and that patients' relative deficit for natural kinds is due in part to their dependence on this network despite disease in these areas. Copyright © 2012 Elsevier Inc. All rights reserved.
A biased competition account of attention and memory in Alzheimer's disease
Finke, Kathrin; Myers, Nicholas; Bublak, Peter; Sorg, Christian
2013-01-01
The common view of Alzheimer's disease (AD) is that of an age-related memory disorder, i.e. declarative memory deficits are the first signs of the disease and associated with progressive brain changes in the medial temporal lobes and the default mode network. However, two findings challenge this view. First, new model-based tools of attention research have revealed that impaired selective attention accompanies memory deficits from early pre-dementia AD stages on. Second, very early distributed lesions of lateral parietal networks may cause these attention deficits by disrupting brain mechanisms underlying attentional biased competition. We suggest that memory and attention impairments might indicate disturbances of a common underlying neurocognitive mechanism. We propose a unifying account of impaired neural interactions within and across brain networks involved in attention and memory inspired by the biased competition principle. We specify this account at two levels of analysis: at the computational level, the selective competition of representations during both perception and memory is biased by AD-induced lesions; at the large-scale brain level, integration within and across intrinsic brain networks, which overlap in parietal and temporal lobes, is disrupted. This account integrates a large amount of previously unrelated findings of changed behaviour and brain networks and favours a brain mechanism-centred view on AD. PMID:24018724
A biased competition account of attention and memory in Alzheimer's disease.
Finke, Kathrin; Myers, Nicholas; Bublak, Peter; Sorg, Christian
2013-10-19
The common view of Alzheimer's disease (AD) is that of an age-related memory disorder, i.e. declarative memory deficits are the first signs of the disease and associated with progressive brain changes in the medial temporal lobes and the default mode network. However, two findings challenge this view. First, new model-based tools of attention research have revealed that impaired selective attention accompanies memory deficits from early pre-dementia AD stages on. Second, very early distributed lesions of lateral parietal networks may cause these attention deficits by disrupting brain mechanisms underlying attentional biased competition. We suggest that memory and attention impairments might indicate disturbances of a common underlying neurocognitive mechanism. We propose a unifying account of impaired neural interactions within and across brain networks involved in attention and memory inspired by the biased competition principle. We specify this account at two levels of analysis: at the computational level, the selective competition of representations during both perception and memory is biased by AD-induced lesions; at the large-scale brain level, integration within and across intrinsic brain networks, which overlap in parietal and temporal lobes, is disrupted. This account integrates a large amount of previously unrelated findings of changed behaviour and brain networks and favours a brain mechanism-centred view on AD.
The effect of environmental harshness on neurogenesis: a large-scale comparison.
Chancellor, Leia V; Roth, Timothy C; LaDage, Lara D; Pravosudov, Vladimir V
2011-03-01
Harsh environmental conditions may produce strong selection pressure on traits, such as memory, that may enhance fitness. Enhanced memory may be crucial for survival in animals that use memory to find food and, thus, particularly important in environments where food sources may be unpredictable. For example, animals that cache and later retrieve their food may exhibit enhanced spatial memory in harsh environments compared with those in mild environments. One way that selection may enhance memory is via the hippocampus, a brain region involved in spatial memory. In a previous study, we established a positive relationship between environmental severity and hippocampal morphology in food-caching black-capped chickadees (Poecile atricapillus). Here, we expanded upon this previous work to investigate the relationship between environmental harshness and neurogenesis, a process that may support hippocampal cytoarchitecture. We report a significant and positive relationship between the degree of environmental harshness across several populations over a large geographic area and (1) the total number of immature hippocampal neurons, (2) the number of immature neurons relative to the hippocampal volume, and (3) the number of immature neurons relative to the total number of hippocampal neurons. Our results suggest that hippocampal neurogenesis may play an important role in environments where increased reliance on memory for cache recovery is critical. Copyright © 2010 Wiley Periodicals, Inc.
Christopher, Micaela E.; Miyake, Akira; Keenan, Janice M.; Pennington, Bruce; DeFries, John C.; Wadsworth, Sally J.; Willcutt, Erik; Olson, Richard K.
2012-01-01
The present study explored whether different executive control and speed measures (working memory, inhibition, processing speed, and naming speed) independently predict individual differences in word reading and reading comprehension. Although previous studies suggest these cognitive constructs are important for reading, we analyze the constructs simultaneously to test whether each is a unique predictor. We used latent variables from 483 participants (ages 8 to 16) to portion each cognitive and reading construct into its unique and shared variance. In these models we address two specific issues: (a) given that our wide age range may span the theoretical transition from “learning to read” to “reading to learn,” we first test whether the relation between word reading and reading comprehension is stable across two age groups (ages 8 to 10 and 11 to 16); and (b) the main theoretical question of interest: whether what is shared and what is separable for word reading and reading comprehension are associated with individual differences in working memory, inhibition, and measures of processing and naming speed. The results indicated that: (a) the relation between word reading and reading comprehension is largely invariant across the age groups; (b) working memory and general processing speed, but not inhibition or the speeded naming of non-alphanumeric stimuli, are unique predictors of both word reading and comprehension, with working memory equally important for both reading abilities and processing speed more important for word reading. These results have implications for understanding why reading comprehension and word reading are highly correlated yet separable. PMID:22352396
Kinetic Inductance Memory Cell and Architecture for Superconducting Computers
NASA Astrophysics Data System (ADS)
Chen, George J.
Josephson memory devices typically use a superconducting loop containing one or more Josephson junctions to store information. The magnetic inductance of the loop in conjunction with the Josephson junctions provides multiple states to store data. This thesis shows that replacing the magnetic inductor in a memory cell with a kinetic inductor can lead to a smaller cell size. However, magnetic control of the cells is lost. Thus, a current-injection based architecture for a memory array has been designed to work around this problem. The isolation between memory cells that magnetic control provides is provided through resistors in this new architecture. However, these resistors allow leakage current to flow which ultimately limits the size of the array due to power considerations. A kinetic inductance memory array will be limited to 4K bits with a read access time of 320 ps for a 1 um linewidth technology. If a power decoder could be developed, the memory architecture could serve as the blueprint for a fast (<1 ns), large scale (>1 Mbit) superconducting memory array.
Samaha, Jason; Postle, Bradley R
2017-11-29
Adaptive behaviour depends on the ability to introspect accurately about one's own performance. Whether this metacognitive ability is supported by the same mechanisms across different tasks is unclear. We investigated the relationship between metacognition of visual perception and metacognition of visual short-term memory (VSTM). Experiments 1 and 2 required subjects to estimate the perceived or remembered orientation of a grating stimulus and rate their confidence. We observed strong positive correlations between individual differences in metacognitive accuracy between the two tasks. This relationship was not accounted for by individual differences in task performance or average confidence, and was present across two different metrics of metacognition and in both experiments. A model-based analysis of data from a third experiment showed that a cross-domain correlation only emerged when both tasks shared the same task-relevant stimulus feature. That is, metacognition for perception and VSTM were correlated when both tasks required orientation judgements, but not when the perceptual task was switched to require contrast judgements. In contrast with previous results comparing perception and long-term memory, which have largely provided evidence for domain-specific metacognitive processes, the current findings suggest that metacognition of visual perception and VSTM is supported by a domain-general metacognitive architecture, but only when both domains share the same task-relevant stimulus feature. © 2017 The Author(s).
Breaking barriers through collaboration: the example of the Cell Migration Consortium.
Horwitz, Alan Rick; Watson, Nikki; Parsons, J Thomas
2002-10-15
Understanding complex integrated biological processes, such as cell migration, requires interdisciplinary approaches. The Cell Migration Consortium, funded by a Large-Scale Collaborative Project Award from the National Institute of General Medical Science, develops and disseminates new technologies, data, reagents, and shared information to a wide audience. The development and operation of this Consortium may provide useful insights for those who plan similarly large-scale, interdisciplinary approaches.
Principles for problem aggregation and assignment in medium scale multiprocessors
NASA Technical Reports Server (NTRS)
Nicol, David M.; Saltz, Joel H.
1987-01-01
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
Efficient partitioning and assignment on programs for multiprocessor execution
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1993-01-01
The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
Centrally managed unified shared virtual address space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkes, John
Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less
Physical principles and current status of emerging non-volatile solid state memories
NASA Astrophysics Data System (ADS)
Wang, L.; Yang, C.-H.; Wen, J.
2015-07-01
Today the influence of non-volatile solid-state memories on persons' lives has become more prominent because of their non-volatility, low data latency, and high robustness. As a pioneering technology that is representative of non-volatile solidstate memories, flash memory has recently seen widespread application in many areas ranging from electronic appliances, such as cell phones and digital cameras, to external storage devices such as universal serial bus (USB) memory. Moreover, owing to its large storage capacity, it is expected that in the near future, flash memory will replace hard-disk drives as a dominant technology in the mass storage market, especially because of recently emerging solid-state drives. However, the rapid growth of the global digital data has led to the need for flash memories to have larger storage capacity, thus requiring a further downscaling of the cell size. Such a miniaturization is expected to be extremely difficult because of the well-known scaling limit of flash memories. It is therefore necessary to either explore innovative technologies that can extend the areal density of flash memories beyond the scaling limits, or to vigorously develop alternative non-volatile solid-state memories including ferroelectric random-access memory, magnetoresistive random-access memory, phase-change random-access memory, and resistive random-access memory. In this paper, we review the physical principles of flash memories and their technical challenges that affect our ability to enhance the storage capacity. We then present a detailed discussion of novel technologies that can extend the storage density of flash memories beyond the commonly accepted limits. In each case, we subsequently discuss the physical principles of these new types of non-volatile solid-state memories as well as their respective merits and weakness when utilized for data storage applications. Finally, we predict the future prospects for the aforementioned solid-state memories for the next generation of data-storage devices based on a comparison of their performance. [Figure not available: see fulltext.
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator
Wang, Runchun M.; Thakur, Chetan S.; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks. PMID:29692702
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator.
Wang, Runchun M; Thakur, Chetan S; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
Magnetic Memory of Two Lunar Samples, 15405 and 15445
NASA Astrophysics Data System (ADS)
Kletetschka, G.; Kamenikova, K.; Fuller, M.; Cizkova, K.
2016-08-01
We reanalyzed Apollo's literature records using empirical scaling law methods to search for the presence of Lunar iron carriers capable of recording reliable magnetic paleofields. Lunar rocks have a large spectrum of paleofields (1-100uT).
Memory effect in M ≥ 6 earthquakes of South-North Seismic Belt, Mainland China
NASA Astrophysics Data System (ADS)
Wang, Jeen-Hwa
2013-07-01
The M ≥ 6 earthquakes occurred in the South-North Seismic Belt, Mainland China, during 1901-2008 are taken to study the possible existence of memory effect in large earthquakes. The fluctuation analysis technique is applied to analyze the sequences of earthquake magnitude and inter-event time represented in the natural time domain. Calculated results show that the exponents of scaling law of fluctuation versus window length are less than 0.5 for the sequences of earthquake magnitude and inter-event time. The migration of earthquakes in study is taken to discuss the possible correlation between events. The phase portraits of two sequent magnitudes and two sequent inter-event times are also applied to explore if large (or small) earthquakes are followed by large (or small) events. Together with all kinds of given information, we conclude that the earthquakes in study is short-term correlated and thus the short-term memory effect would be operative.
Tough times call for bigger brains
Pravosudov, Vladimir V
2009-01-01
Memory is crucial for survival in many animals. Spatial memory in particular is important for food-caching species and may be influenced by selective pressures such as climate. The influence of climate on memory may be facilitated through the hippocampus (Hp), the part of the brain responsible in part for spatial memory. In a recent paper, we conducted the first large-scale test of the relationship between memory, the climate and the brain in a single food-caching species, the black-capped chickadee (Poecile atricapillus). We found that birds from more harsh northern climates had significantly larger hippocampal volumes and more neurons than those from more mild southern latitudes. This work suggests that environmental pressures are capable of influencing specific brain regions, which may result in enhanced memory, and hence survival, in harsh climates. This work gives us a better understanding of how the brain responds to different environments and how animals can adapt to their environment in general. PMID:19641741
Langley, Hillary A.; Coffman, Jennifer L.; Ornstein, Peter A.
2017-01-01
Data from a large-scale, longitudinal research study with an ethnically and socioeconomically diverse sample were utilized to explore linkages between maternal elaborative conversational style and the development of children’s autobiographical and deliberate memory. Assessments were made when the children were 3, 5, and 6 years of age, and the results reveal concurrent and longitudinal linkages between maternal conversational style in a mother-child reminiscing task and children’s autobiographical memory performance. Maternal conversational style while reminiscing was also significantly related to children’s strategic behaviors and recall in two deliberate memory tasks, both concurrently and longitudinally. Results from this examination replicate and extend what is known about the linkages between maternal conversational style, children’s abilities to talk about previous experiences, and children’s deliberate memory skills as they transition from the preschool to early elementary school years. PMID:29270083
Maguire, Elizabeth M; Bokhour, Barbara G; Wagner, Todd H; Asch, Steven M; Gifford, Allen L; Gallagher, Thomas H; Durfee, Janet M; Martinello, Richard A; Elwy, A Rani
2016-11-11
Many healthcare organizations have developed disclosure policies for large-scale adverse events, including the Veterans Health Administration (VA). This study evaluated VA's national large-scale disclosure policy and identifies gaps and successes in its implementation. Semi-structured qualitative interviews were conducted with leaders, hospital employees, and patients at nine sites to elicit their perceptions of recent large-scale adverse events notifications and the national disclosure policy. Data were coded using the constructs of the Consolidated Framework for Implementation Research (CFIR). We conducted 97 interviews. Insights included how to handle the communication of large-scale disclosures through multiple levels of a large healthcare organization and manage ongoing communications about the event with employees. Of the 5 CFIR constructs and 26 sub-constructs assessed, seven were prominent in interviews. Leaders and employees specifically mentioned key problem areas involving 1) networks and communications during disclosure, 2) organizational culture, 3) engagement of external change agents during disclosure, and 4) a need for reflecting on and evaluating the policy implementation and disclosure itself. Patients shared 5) preferences for personal outreach by phone in place of the current use of certified letters. All interviewees discussed 6) issues with execution and 7) costs of the disclosure. CFIR analysis reveals key problem areas that need to be addresses during disclosure, including: timely communication patterns throughout the organization, establishing a supportive culture prior to implementation, using patient-approved, effective communications strategies during disclosures; providing follow-up support for employees and patients, and sharing lessons learned.
Extreme Quantum Memory Advantage for Rare-Event Sampling
NASA Astrophysics Data System (ADS)
Aghamohammadi, Cina; Loomis, Samuel P.; Mahoney, John R.; Crutchfield, James P.
2018-02-01
We introduce a quantum algorithm for memory-efficient biased sampling of rare events generated by classical memoryful stochastic processes. Two efficiency metrics are used to compare quantum and classical resources for rare-event sampling. For a fixed stochastic process, the first is the classical-to-quantum ratio of required memory. We show for two example processes that there exists an infinite number of rare-event classes for which the memory ratio for sampling is larger than r , for any large real number r . Then, for a sequence of processes each labeled by an integer size N , we compare how the classical and quantum required memories scale with N . In this setting, since both memories can diverge as N →∞ , the efficiency metric tracks how fast they diverge. An extreme quantum memory advantage exists when the classical memory diverges in the limit N →∞ , but the quantum memory has a finite bound. We then show that finite-state Markov processes and spin chains exhibit memory advantage for sampling of almost all of their rare-event classes.
RAID-2: Design and implementation of a large scale disk array controller
NASA Technical Reports Server (NTRS)
Katz, R. H.; Chen, P. M.; Drapeau, A. L.; Lee, E. K.; Lutz, K.; Miller, E. L.; Seshan, S.; Patterson, D. A.
1992-01-01
We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly.
Hodgetts, Carl J; Postans, Mark; Warne, Naomi; Varnava, Alice; Lawrence, Andrew D; Graham, Kim S
2017-09-01
Autobiographical memory (AM) is multifaceted, incorporating the vivid retrieval of contextual detail (episodic AM), together with semantic knowledge that infuses meaning and coherence into past events (semantic AM). While neuropsychological evidence highlights a role for the hippocampus and anterior temporal lobe (ATL) in episodic and semantic AM, respectively, it is unclear whether these constitute dissociable large-scale AM networks. We used high angular resolution diffusion-weighted imaging and constrained spherical deconvolution-based tractography to assess white matter microstructure in 27 healthy young adult participants who were asked to recall past experiences using word cues. Inter-individual variation in the microstructure of the fornix (the main hippocampal input/output pathway) related to the amount of episodic, but not semantic, detail in AMs - independent of memory age. Conversely, microstructure of the inferior longitudinal fasciculus, linking occipitotemporal regions with ATL, correlated with semantic, but not episodic, AMs. Further, these significant correlations remained when controlling for hippocampal and ATL grey matter volume, respectively. This striking correlational double dissociation supports the view that distinct, large-scale distributed brain circuits underpin context and concepts in AM. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
System and method for memory allocation in a multiclass memory system
Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark
2016-06-28
A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.
NASA Astrophysics Data System (ADS)
Nakamura, Kazuyuki; Sasao, Tsutomu; Matsuura, Munehiro; Tanaka, Katsumasa; Yoshizumi, Kenichi; Nakahara, Hiroki; Iguchi, Yukihiro
2006-04-01
A large-scale memory-technology-based programmable logic device (PLD) using a look-up table (LUT) cascade is developed in the 0.35-μm standard complementary metal oxide semiconductor (CMOS) logic process. Eight 64 K-bit synchronous SRAMs are connected to form an LUT cascade with a few additional circuits. The features of the LUT cascade include: 1) a flexible cascade connection structure, 2) multi phase pseudo asynchronous operations with synchronous static random access memory (SRAM) cores, and 3) LUT-bypass redundancy. This chip operates at 33 MHz in 8-LUT cascades at 122 mW. Benchmark results show that it achieves a comparable performance to field programmable gate array (FPGAs).
Electronic shift register memory based on molecular electron-transfer reactions
NASA Technical Reports Server (NTRS)
Hopfield, J. J.; Onuchic, Jose Nelson; Beratan, David N.
1989-01-01
The design of a shift register memory at the molecular level is described in detail. The memory elements are based on a chain of electron-transfer molecules incorporated on a very large scale integrated (VLSI) substrate, and the information is shifted by photoinduced electron-transfer reactions. The design requirements for such a system are discussed, and several realistic strategies for synthesizing these systems are presented. The immediate advantage of such a hybrid molecular/VLSI device would arise from the possible information storage density. The prospect of considerable savings of energy per bit processed also exists. This molecular shift register memory element design solves the conceptual problems associated with integrating molecular size components with larger (micron) size features on a chip.
A Formal Model of Capacity Limits in Working Memory
ERIC Educational Resources Information Center
Oberauer, Klaus; Kliegl, Reinhold
2006-01-01
A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…
Ordering of guarded and unguarded stores for no-sync I/O
Gara, Alan; Ohmacht, Martin
2013-06-25
A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
NASA Technical Reports Server (NTRS)
Gilbertsen, Noreen D.; Belytschko, Ted
1990-01-01
The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu
2012-03-01
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
ERIC Educational Resources Information Center
Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.
2011-01-01
Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…
Improved Cognitive Development in Preterm Infants with Shared Book Reading.
Braid, Susan; Bernstein, Jenny
2015-01-01
To examine the effect of shared book reading on the cognitive development of children born preterm and to determine what factors influence shared book reading in this population. Secondary analysis using the Early Childhood Longitudinal Study-Birth Cohort, a large, nationally representative survey of children born in the United States in 2001. One thousand four hundred singleton preterm infants (22-36 weeks gestation). Cognitive development measured using the Bayley Mental Scale score from the Bayley Scales of Infant Development Research Edition. Adjusting for neonatal, maternal, and socioeconomic characteristics, reading aloud more than two times a week is associated with higher cognitive development scores in two-year-old children born preterm (p < .001). Race/ethnicity and maternal education affect how often parents read to their children. Shared book reading holds potential as an early developmental intervention for this population.
Long-distance quantum communication over noisy networks without long-time quantum memory
NASA Astrophysics Data System (ADS)
Mazurek, Paweł; Grudka, Andrzej; Horodecki, Michał; Horodecki, Paweł; Łodyga, Justyna; Pankowski, Łukasz; PrzysieŻna, Anna
2014-12-01
The problem of sharing entanglement over large distances is crucial for implementations of quantum cryptography. A possible scheme for long-distance entanglement sharing and quantum communication exploits networks whose nodes share Einstein-Podolsky-Rosen (EPR) pairs. In Perseguers et al. [Phys. Rev. A 78, 062324 (2008), 10.1103/PhysRevA.78.062324] the authors put forward an important isomorphism between storing quantum information in a dimension D and transmission of quantum information in a D +1 -dimensional network. We show that it is possible to obtain long-distance entanglement in a noisy two-dimensional (2D) network, even when taking into account that encoding and decoding of a state is exposed to an error. For 3D networks we propose a simple encoding and decoding scheme based solely on syndrome measurements on 2D Kitaev topological quantum memory. Our procedure constitutes an alternative scheme of state injection that can be used for universal quantum computation on 2D Kitaev code. It is shown that the encoding scheme is equivalent to teleporting the state, from a specific node into a whole two-dimensional network, through some virtual EPR pair existing within the rest of network qubits. We present an analytic lower bound on fidelity of the encoding and decoding procedure, using as our main tool a modified metric on space-time lattice, deviating from a taxicab metric at the first and the last time slices.
Social networks and environmental outcomes.
Barnes, Michele L; Lynham, John; Kalberg, Kolter; Leung, PingSun
2016-06-07
Social networks can profoundly affect human behavior, which is the primary force driving environmental change. However, empirical evidence linking microlevel social interactions to large-scale environmental outcomes has remained scarce. Here, we leverage comprehensive data on information-sharing networks among large-scale commercial tuna fishers to examine how social networks relate to shark bycatch, a global environmental issue. We demonstrate that the tendency for fishers to primarily share information within their ethnic group creates segregated networks that are strongly correlated with shark bycatch. However, some fishers share information across ethnic lines, and examinations of their bycatch rates show that network contacts are more strongly related to fishing behaviors than ethnicity. Our findings indicate that social networks are tied to actions that can directly impact marine ecosystems, and that biases toward within-group ties may impede the diffusion of sustainable behaviors. Importantly, our analysis suggests that enhanced communication channels across segregated fisher groups could have prevented the incidental catch of over 46,000 sharks between 2008 and 2012 in a single commercial fishery.
Clarification of the memory artefact in the assessment of suggestibility.
Willner, P
2008-04-01
The Gudjonsson Suggestibility Scale (GSS) assesses suggestibility by asking respondents to recall a short story, followed by exposure to leading questions and pressure to change their responses. Suggestibility, as assessed by the GSS, appears to be elevated in people with intellectual disabilities (ID). This has been shown to reflect to some extent the fact that people with ID have poor recall of the story; however, there are discrepancies in this relationship. The aim of the present study was to investigate whether a closer match between memory and suggestibility would be found using a measure of recognition memory rather than free recall. Three modifications to the procedure were presented to users of a learning disabilities day service. In all three experiments, a measure of forced-choice recognition memory was built into the suggestibility test. In experiments 1 and 2, the GSS was presented using either divided presentation (splitting the story into two halves, with memory and suggestibility tests after each half) or multiple presentation (the story was presented three times before presentation of the memory and suggestibility tests). Participants were tested twice, once with the standard version of the test and once with one of the modified versions. In experiment 3, an alternative suggestibility scale (ASS3) was created, based on real events in a learning disabilities day service. The ASS3 was presented to one group of participants who had been present at the events, and a second group who attended a different day service, to whom the events were unfamiliar. As observed previously, suggestibility was not closely related to free recall performance: recall was increased equally by all three manipulations, but they produced, respectively, no effect, a modest effect and a large effect on suggestibility. However, the effects on suggestibility were closely related to performance on the forced-choice recognition memory task: divided presentation of the GSS2 had no effect on either of these measures; multiple presentation of the GSS2 produced a modest increase in recognition memory and a modest decrease in suggestibility; and replacing the GSS with the ASS3 produced a large increase in recognition memory and a large decrease in suggestibility. The results support earlier findings that the GSS is likely to overestimate how suggestible a person will be in relation to a personally significant event. This reflects poor recognition memory for the material being tested, rather than increased suggestibility per se. People with ID may in fact be relatively non-suggestible for well-remembered events, which would include personally significant events, particularly those witnessed recently.
A depth-first search algorithm to compute elementary flux modes by linear programming
2014-01-01
Background The decomposition of complex metabolic networks into elementary flux modes (EFMs) provides a useful framework for exploring reaction interactions systematically. Generating a complete set of EFMs for large-scale models, however, is near impossible. Even for moderately-sized models (<400 reactions), existing approaches based on the Double Description method must iterate through a large number of combinatorial candidates, thus imposing an immense processor and memory demand. Results Based on an alternative elementarity test, we developed a depth-first search algorithm using linear programming (LP) to enumerate EFMs in an exhaustive fashion. Constraints can be introduced to directly generate a subset of EFMs satisfying the set of constraints. The depth-first search algorithm has a constant memory overhead. Using flux constraints, a large LP problem can be massively divided and parallelized into independent sub-jobs for deployment into computing clusters. Since the sub-jobs do not overlap, the approach scales to utilize all available computing nodes with minimal coordination overhead or memory limitations. Conclusions The speed of the algorithm was comparable to efmtool, a mainstream Double Description method, when enumerating all EFMs; the attrition power gained from performing flux feasibility tests offsets the increased computational demand of running an LP solver. Unlike the Double Description method, the algorithm enables accelerated enumeration of all EFMs satisfying a set of constraints. PMID:25074068
Memory-Scalable GPU Spatial Hierarchy Construction.
Qiming Hou; Xin Sun; Kun Zhou; Lauterbach, C; Manocha, D
2011-04-01
Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
Memory reduction through higher level language hardware
NASA Technical Reports Server (NTRS)
Kerner, H.; Gellman, L.
1972-01-01
Application of large scale integration in computers to reduce size and manufacturing costs and to produce improvements in logic function is discussed. Use of FORTRAN 4 as computer language for this purpose is described. Effectiveness of method in storing information is illustrated.
Memory: Ironing Out a Wrinkle in Time.
Miller, Adam M P; Frankland, Paul W; Josselyn, Sheena A
2018-05-21
Individual hippocampal neurons encode time over seconds, whereas large-scale changes in population activity of hippocampal neurons encode time over minutes and days. New research shows how the hippocampus represents these multiple timescales simultaneously. Copyright © 2018 Elsevier Ltd. All rights reserved.
Audience-tuning effects on memory: the role of shared reality.
Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan
2005-09-01
After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Transition from lognormal to χ2-superstatistics for financial time series
NASA Astrophysics Data System (ADS)
Xu, Dan; Beck, Christian
2016-07-01
Share price returns on different time scales can be well modelled by a superstatistical dynamics. Here we provide an investigation which type of superstatistics is most suitable to properly describe share price dynamics on various time scales. It is shown that while χ2-superstatistics works well on a time scale of days, on a much smaller time scale of minutes the price changes are better described by lognormal superstatistics. The system dynamics thus exhibits a transition from lognormal to χ2 superstatistics as a function of time scale. We discuss a more general model interpolating between both statistics which fits the observed data very well. We also present results on correlation functions of the extracted superstatistical volatility parameter, which exhibits exponential decay for returns on large time scales, whereas for returns on small time scales there are long-range correlations and power-law decay.
Buchy, L.; Czechowska, Y.; Chochol, C.; Malla, A.; Joober, R.; Pruessner, J.; Lepage, M.
2010-01-01
Our previous work has linked verbal learning and memory with cognitive insight, but not clinical insight, in individuals with a first-episode psychosis (FEP). The current study reassessed the neurocognitive basis of cognitive and clinical insight and explored their neural basis in 61 FEP patients. Cognitive insight was measured with the Beck Cognitive Insight Scale (BCIS) and clinical insight with the Scale to assess Unawareness of Mental Disorder (SUMD). Global measures for 7 domains of cognition were examined. Hippocampi were manually segmented in to 3 parts: the body, head, and tail. Verbal learning and memory significantly correlated with the BCIS composite index. Composite index scores were significantly associated with total left hippocampal (HC) volume; partial correlations, however, revealed that this relationship was attributable largely to verbal memory performance. The BCIS self-certainty subscale significantly and inversely correlated with bilateral HC volumes, and these associations were independent of verbal learning and memory performance. The BCIS self-reflectiveness subscale significantly correlated with verbal learning and memory but not with HC volume. No significant correlations emerged between the SUMD and verbal memory or HC volume. These results strengthen our previous assertion that in individuals with an FEP cognitive insight may rely on memory whereby current experiences are appraised based on previous ones. The HC may be a viable location among others for the brain system that underlies aspects of cognitive insight in individuals with an FEP. PMID:19346315
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Large-scale inverse model analyses employing fast randomized data reduction
NASA Astrophysics Data System (ADS)
Lin, Youzuo; Le, Ellen B.; O'Malley, Daniel; Vesselinov, Velimir V.; Bui-Thanh, Tan
2017-08-01
When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called "sketching" matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.
Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver
2015-01-01
The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Solutions and debugging for data consistency in multiprocessors with noncoherent caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.
1995-02-01
We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Fast distributed large-pixel-count hologram computation using a GPU cluster.
Pan, Yuechao; Xu, Xuewu; Liang, Xinan
2013-09-10
Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Dissociation of immediate and delayed effects of emotional arousal on episodic memory.
Schümann, Dirk; Bayer, Janine; Talmi, Deborah; Sommer, Tobias
2018-02-01
Emotionally arousing events are usually better remembered than neutral ones. This phenomenon is in humans mostly studied by presenting mixed lists of neutral and emotional items. An emotional enhancement of memory is observed in these studies often already immediately after encoding and increases with longer delays and consolidation. A large body of animal research showed that the more efficient consolidation of emotionally arousing events is based on an activation of the central noradrenergic system and the amygdala (Modulation Hypothesis; Roozendaal & McGaugh, 2011). The immediately superior recognition of emotional items is attributed primarily to their attraction of attention during encoding which is also thought to be based on the amygdala and the central noradrenergic system. To investigate whether the amygdala and noradrenergic system support memory encoding and consolidation via shared neural substrates and processes a large sample of participants (n = 690) encoded neutral and arousing pictures. Their memory was tested immediately and after a consolidation delay. In addition, they were genotyped in two relevant polymorphisms (α 2B -adrenergic receptor and serotonin transporter). Memory for negative and positive emotional pictures was enhanced at both time points where these enhancements were correlated (immediate r = 0.60 and delayed test r = 0.46). Critically, the effects of emotional arousal on encoding and consolidation correlated only very low (negative r = 0.14 and positive r = 0.03 pictures) suggesting partly distinct underlying processes consistent with a functional heterogeneity of the central noradrenergic system. No effect of genotype on either effect was observed. Copyright © 2017 Elsevier Inc. All rights reserved.
A molecular shift register based on electron transfer
NASA Technical Reports Server (NTRS)
Hopfield, J. J.; Onuchic, Josenelson; Beratan, David N.
1988-01-01
An electronic shift-register memory at the molecular level is described. The memory elements are based on a chain of electron-transfer molecules and the information is shifted by photoinduced electron-transfer reactions. This device integrates designed electronic molecules onto a very large scale integrated (silicon microelectronic) substrate, providing an example of a 'molecular electronic device' that could actually be made. The design requirements for such a device and possible synthetic strategies are discussed. Devices along these lines should have lower energy usage and enhanced storage density.
Mahjouri, Najmeh; Ardestani, Mojtaba
2011-01-01
In this paper, two cooperative and non-cooperative methodologies are developed for a large-scale water allocation problem in Southern Iran. The water shares of the water users and their net benefits are determined using optimization models having economic objectives with respect to the physical and environmental constraints of the system. The results of the two methodologies are compared based on the total obtained economic benefit, and the role of cooperation in utilizing a shared water resource is demonstrated. In both cases, the water quality in rivers satisfies the standards. Comparing the results of the two mentioned approaches shows the importance of acting cooperatively to achieve maximum revenue in utilizing a surface water resource while the river water quantity and quality issues are addressed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marrinan, Thomas; Leigh, Jason; Renambot, Luc
Mixed presence collaboration involves remote collaboration between multiple collocated groups. This paper presents the design and results of a user study that focused on mixed presence collaboration using large-scale tiled display walls. The research was conducted in order to compare data synchronization schemes for multi-user visualization applications. Our study compared three techniques for sharing data between display spaces with varying constraints and affordances. The results provide empirical evidence that using data sharing techniques with continuous synchronization between the sites lead to improved collaboration for a search and analysis task between remotely located groups. We have also identified aspects of synchronizedmore » sessions that result in increased remote collaborator awareness and parallel task coordination. It is believed that this research will lead to better utilization of large-scale tiled display walls for distributed group work.« less
NASA Astrophysics Data System (ADS)
Döveling, Katrin
2015-04-01
In an age of rising impact of online communication in social network sites (SNS), emotional interaction is neither limited nor restricted by time or space. Bereavement extends to the anonymity of cyberspace. What role does virtual interaction play in SNS in dealing with the basic human emotion of grief caused by the loss of a beloved person? The analysis laid out in this article provides answers in light of an interdisciplinary perspective on online bereavement. Relevant lines of research are scrutinized. After laying out the theoretical spectrum for the study, hypotheses based on a prior in-depth qualitative content analysis of 179 postings in three different German online bereavement platforms are proposed and scrutinized in a quantitative content analysis (2127 postings from 318 users). Emotion regulation patterns in SNS and similarities as well as differences in online bereavement of children, adolescents and adults are revealed. Large-scale quantitative findings into central motives, patterns, and restorative effects of online shared bereavement in regulating distress, fostering personal empowerment, and engendering meaning are presented. The article closes with implications for further analysis in memorialization practices.
Fan, Chong; Chen, Xushuai; Zhong, Lei; Zhou, Min; Shi, Yun; Duan, Yulin
2017-03-18
A sub-block algorithm is usually applied in the super-resolution (SR) reconstruction of images because of limitations in computer memory. However, the sub-block SR images can hardly achieve a seamless image mosaicking because of the uneven distribution of brightness and contrast among these sub-blocks. An effectively improved weighted Wallis dodging algorithm is proposed, aiming at the characteristic that SR reconstructed images are gray images with the same size and overlapping region. This algorithm can achieve consistency of image brightness and contrast. Meanwhile, a weighted adjustment sequence is presented to avoid the spatial propagation and accumulation of errors and the loss of image information caused by excessive computation. A seam line elimination method can share the partial dislocation in the seam line to the entire overlapping region with a smooth transition effect. Subsequently, the improved method is employed to remove the uneven illumination for 900 SR reconstructed images of ZY-3. Then, the overlapping image mosaic method is adopted to accomplish a seamless image mosaic based on the optimal seam line.
ERIC Educational Resources Information Center
Burt, S. Alexandra
2010-01-01
A recent large-scale meta-analysis of twin and adoption studies indicated that shared environmental influences make important contributions to most forms of child and adolescent psychopathology (Burt, 2009b). The sole exception to this robust pattern of results was observed for attention-deficit/hyperactivity disorder (ADHD), which appeared to be…
Callie Jo Schweitzer; John A. Stanturf
1997-01-01
There has been a tremendous upsurge in interest in reforestation of bottomland hardwoods. In the lower Mississippi alluvial valley, reforestation projects are occurringon a large scale on abandoned agricultural fields, often in conjunction with state or federal cost-share programs. This paper describes some of the cost share programs used to establish bottomland...
Development of the Wechsler Memory Scale--Revised.
ERIC Educational Resources Information Center
Herman, David O.; Young, Laura
A study involving a sample of people selected to represent the nonimpaired American population, aged 16 to 74 years, was undertaken to determine the effectiveness of the Wechsler Memory Scale--Revised. The scale's subtests were designed to assess memory of personal and general knowledge, logical memory, verbal paired association, figural memory,…
Scheduling for Locality in Shared-Memory Multiprocessors
1993-05-01
Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Advanced Development of Certified OS Kernels
2015-06-01
It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers
6 DOF Nonlinear AUV Simulation Toolbox
1997-01-01
is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor
A cache-aided multiprocessor rollback recovery scheme
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent
1989-01-01
This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís
2017-08-09
System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.
NASA Astrophysics Data System (ADS)
Howard, E. A.; Coleman, K. J.; Barford, C. L.; Kucharik, C.; Foley, J. A.
2005-12-01
Understanding environmental problems that cross physical and disciplinary boundaries requires a more holistic view of the world - a "systems" approach. Yet it is a challenge for many learners to start thinking this way, particularly when the problems are large in scale and not easily visible. We will describe our online university course, "Humans and the Changing Biosphere," which takes a whole-systems perspective for teaching regional to global-scale environmental science concepts, including climate, hydrology, ecology, and human demographics. We will share our syllabus and learning objectives and summarize our efforts to incorporate "best" practices for online teaching. We will describe challenges we have faced, and our efforts to reach different learner types. Our goals for this presentation are: (1) to communicate how a systems approach ties together environmental sciences (including climate, hydrology, ecology, biogeochemistry, and demography) that are often taught as separate disciplines; (2) to generate discussion about challenges of teaching large-scale environmental processes; (3) to share our experiences in teaching these topics online; (4) to receive ideas and feedback on future teaching strategies. We will explain why we developed this course online, and share our experiences about benefits and challenges of teaching over the web - including some suggestions about how to use technology to supplement face-to-face learning experiences (and vice versa). We will summarize assessment data about what students learned during the course, and discuss key misconceptions and barriers to learning. We will highlight the role of an online discussion board in creating classroom community, identifying misconceptions, and engaging different types of learners.
Behavioral profiles in frontal lobe epilepsy: Autobiographic memory versus mood impairment.
Rayner, Genevieve; Jackson, Graeme D; Wilson, Sarah J
2015-02-01
Autobiographic memory encompasses the encoding and retrieval of episodes, people, and places encountered in everyday life. It can be impaired in both epilepsy and frontal lobe damage. Here, we performed an initial investigation of how autobiographic memory is impacted by chronic frontal lobe epilepsy (FLE) together with its underlying pathology. We prospectively studied a series of nine consecutive patients with medically refractory FLE, relative to 24 matched healthy controls. Seven of the nine patients had frontal lobe structural abnormalities. Episodic and semantic autobiographic memory functioning was profiled, and factors associated with impaired autobiographic memory were identified among epileptologic, neuroimaging, neuropsychiatric, and cognitive variables including auditory-verbal and visual memory, and the executive function of cognitive control. Results showed that the FLE group experienced significantly higher rates of autobiographic memory and mood disturbance (p < 0.001), with detailed assessment of individual patients revealing two profiles of impairment, primarily characterized by cognitive or mood disturbance. Five of the patients (56%) exhibited significant episodic autobiographic memory deficits, whereas in three of these, knowledge of semantic autobiographic facts was preserved. Four of them also had reduced cognitive control. Mood disorder was largely unrelated to poor autobiographic memory. In contrast, the four cases with preserved autobiographic memory were notable for their past or current depressive symptoms. These findings provide preliminary data that frontal lobe seizure activity with its underlying pathology may selectively disrupt large-scale cognitive or affective networks, giving rise to different neurobehavioral profiles that may be used to inform clinical management. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.
Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine
2018-02-01
We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.
Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud
2013-09-01
Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.
Mental time travel and the shaping of the human mind
Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.
2009-01-01
Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013
Shape Memory Alloy Research and Development at NASA Glenn - Current and Future Progress
NASA Technical Reports Server (NTRS)
Benafan, Othmane
2015-01-01
Shape memory alloys (SMAs) are a unique class of multifunctional materials that have the ability to recover large deformations or generate high stresses in response to thermal, mechanical and or electromagnetic stimuli. These abilities have made them a viable option for actuation systems in aerospace, medical, and automotive applications, amongst others. However, despite many advantages and the fact that SMA actuators have been developed and used for many years, so far they have only found service in a limited range of applications. In order to expand their applications, further developments are needed to increase their reliability and stability and to address processing, testing and qualification needed for large-scale commercial application of SMA actuators.
GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil
2015-11-15
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less
Nascimento, Thiago D; DosSantos, Marcos F; Danciu, Theodora; DeBoer, Misty; van Holsbeeck, Hendrik; Lucas, Sarah R; Aiello, Christine; Khatib, Leen; Bender, MaryCatherine A; Zubieta, Jon-Kar
2014-01-01
Background Although population studies have greatly improved our understanding of migraine, they have relied on retrospective self-reports that are subject to memory error and experimenter-induced bias. Furthermore, these studies also lack specifics from the actual time that attacks were occurring, and how patients express and share their ongoing suffering. Objective As technology and language constantly evolve, so does the way we share our suffering. We sought to evaluate the infodemiology of self-reported migraine headache suffering on Twitter. Methods Trained observers in an academic setting categorized the meaning of every single “migraine” tweet posted during seven consecutive days. The main outcome measures were prevalence, life-style impact, linguistic, and timeline of actual self-reported migraine headache suffering on Twitter. Results From a total of 21,741 migraine tweets collected, only 64.52% (14,028/21,741 collected tweets) were from users reporting their migraine headache attacks in real-time. The remainder of the posts were commercial, re-tweets, general discussion or third person’s migraine, and metaphor. The gender distribution available for the actual migraine posts was 73.47% female (10,306/14,028), 17.40% males (2441/14,028), and 0.01% transgendered (2/14,028). The personal impact of migraine headache was immediate on mood (43.91%, 6159/14,028), productivity at work (3.46%, 486/14,028), social life (3.45%, 484/14,028), and school (2.78%, 390/14,028). The most common migraine descriptor was “Worst” (14.59%, 201/1378) and profanity, the “F-word” (5.3%, 73/1378). The majority of postings occurred in the United States (58.28%, 3413/5856), peaking on weekdays at 10:00h and then gradually again at 22:00h; the weekend had a later morning peak. Conclusions Twitter proved to be a powerful source of knowledge for migraine research. The data in this study overlap large-scale epidemiological studies, avoiding memory bias and experimenter-induced error. Furthermore, linguistics of ongoing migraine reports on social media proved to be highly heterogeneous and colloquial in our study, suggesting that current pain questionnaires should undergo constant reformulations to keep up with modernization in the expression of pain suffering in our society. In summary, this study reveals the modern characteristics and broad impact of migraine headache suffering on patients’ lives as it is spontaneously shared via social media. PMID:24698747
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoemmen, Mark
2010-11-01
Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
NASA Astrophysics Data System (ADS)
Itter, M.; D'Orangeville, L.; Dawson, A.; Kneeshaw, D.; Finley, A. O.
2017-12-01
Drought and insect defoliation have lasting impacts on the dynamics of the boreal forest. Impacts are expected to worsen under global climate change as hotter, drier conditions forecast for much of the boreal increase the frequency and severity of drought and defoliation events. Contemporary ecological theory predicts physiological feedbacks in tree responses to drought and defoliation amplify impacts potentially causing large-scale productivity losses and forest mortality. Quantifying the interactive impacts of drought and insect defoliation on regional forest health is difficult given delayed and persistent responses to disturbance events. We developed a Bayesian hierarchical model to estimate forest growth responses to interactions between drought and insect defoliation by species and size class. Delayed and persistent responses to past drought and defoliation were quantified using empirical memory functions allowing for improved detection of interactions. The model was applied to tree-ring data from stands in Western (Alberta) and Eastern (Québec) regions of the Canadian boreal forest with different species compositions, disturbance regimes, and regional climates. Western stands experience chronic water deficit and forest tent caterpillar (FTC) defoliation; Eastern stands experience irregular water deficit and spruce budworm (SBW) defoliation. Ecosystem memory to past water deficit peaked in the year previous to growth and decayed to zero within 5 (West) to 8 (East) years; memory to past defoliation ranged from 8 (West) to 12 (East) years. The drier regional climate and faster FTC defoliation dynamics (compared to SBW) likely contribute to shorter ecosystem memory in the West. Drought and defoliation had the largest negative impact on large-diameter, host tree growth. Surprisingly, a positive interaction was observed between drought and defoliation for large-diameter, non-host trees likely due to reduced stand-level competition for water. Results highlight the temporal persistence of drought and defoliation stress on boreal forest growth dynamics and provide an empirical estimate of their interactive effects with explicit uncertainty.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-07-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-09-25
The Megatux platform enables the emulation of large scale (multi-million node) distributed systems. In particular, it allows for the emulation of large-scale networks interconnecting a very large number of emulated computer systems. It does this by leveraging virtualization and associated technologies to allow hundreds of virtual computers to be hosted on a single moderately sized server or workstation. Virtualization technology provided by modern processors allows for multiple guest OSs to run at the same time, sharing the hardware resources. The Megatux platform can be deployed on a single PC, a small cluster of a few boxes or a large clustermore » of computers. With a modest cluster, the Megatux platform can emulate complex organizational networks. By using virtualization, we emulate the hardware, but run actual software enabling large scale without sacrificing fidelity.« less
A shared neural ensemble links distinct contextual memories encoded close in time
NASA Astrophysics Data System (ADS)
Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.
2016-06-01
Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
Morey, Rajendra A.; Dolcos, Florin; Petty, Christopher M.; Cooper, Debra A.; Hayes, Jasmeet Pannu; LaBar, Kevin S.; McCarthy, Gregory
2009-01-01
The relevance of emotional stimuli to threat and survival confers a privileged role in their processing. In PTSD, the ability of trauma-related information to divert attention is especially pronounced. Information unrelated to the trauma may also be highly distracting when it shares perceptual features with trauma material. Our goal was to study how trauma-related environmental cues modulate working memory networks in PTSD. We examined neural activity in participants performing a visual working memory task while distracted by task-irrelevant trauma and non-trauma material. Recent post-9/11 veterans were divided into a PTSD group (n = 22) and a trauma-exposed control group (n = 20) based on the Davidson trauma scale. Using fMRI, we measured hemodynamic change in response to emotional (trauma-related) and neutral distraction presented during the active maintenance period of a delayed-response working memory task. The goal was to examine differences in functional networks associated with working memory (dorsolateral prefrontal cortex and lateral parietal cortex) and emotion processing (amygdala, ventrolateral prefrontal cortex, and fusiform gyrus). The PTSD group showed markedly different neural activity compared to the trauma-exposed control group in response to task-irrelevant visual distractors. Enhanced activity in ventral emotion processing regions was associated with trauma distractors in the PTSD group, whereas activity in brain regions associated with working memory and attention regions was disrupted by distractor stimuli independent of trauma content. Neural evidence for the impact of distraction on working memory is consistent with PTSD symptoms of hypervigilance and general distractibility during goal-directed cognitive processing. PMID:19091328
Lee, Sylvia E; Kibby, Michelle Y; Cohen, Morris J; Stanford, Lisa; Park, Yong; Strickland, Suzanne
2016-01-01
Prior research has shown that attention-deficit/hyperactivity disorder (ADHD) and epilepsy are frequently comorbid and that both disorders are associated with various attention and memory problems. Nonetheless, limited research has been conducted comparing the two disorders in one sample to determine unique versus shared deficits. Hence, we investigated differences in working memory (WM) and short-term and delayed recall between children with ADHD, focal epilepsy of mixed foci, comorbid ADHD/epilepsy and controls. Participants were compared on the Core subtests and the Picture Locations subtest of the Children's Memory Scale (CMS). Results indicated that children with ADHD displayed intact verbal WM and long-term memory (LTM), as well as intact performance on most aspects of short-term memory (STM). They performed worse than controls on Numbers Forward and Picture Locations, suggesting problems with focused attention and simple span for visual-spatial material. Conversely, children with epilepsy displayed poor focused attention and STM regardless of the modality assessed, which affected encoding into LTM. The only loss over time was found for passages (Stories). WM was intact. Children with comorbid ADHD/epilepsy displayed focused attention and STM/LTM problems consistent with both disorders, having the lowest scores across the four groups. Hence, focused attention and visual-spatial span appear to be affected in both disorders, whereas additional STM/encoding problems are specific to epilepsy. Children with comorbid ADHD/epilepsy have deficits consistent with both disorders, with slight additive effects. This study suggests that attention and memory testing should be a regular part of the evaluation of children with epilepsy and ADHD.
Efficient Approximation Algorithms for Weighted $b$-Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khan, Arif; Pothen, Alex; Mostofa Ali Patwary, Md.
2016-01-01
We describe a half-approximation algorithm, b-Suitor, for computing a b-Matching of maximum weight in a graph with weights on the edges. b-Matching is a generalization of the well-known Matching problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-Suitor algorithm computes the same b-Matching as the one obtained by the greedy algorithm for themore » problem. We implement the algorithm on serial and shared-memory parallel processors, and compare its performance against a collection of approximation algorithms that have been proposed for the Matching problem. Our results show that the b-Suitor algorithm outperforms the Greedy and Locally Dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-Suitor algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared memory multiprocessor. The b-Suitor algorithm outperforms the Locally Dominant edge algorithm by a factor of fourteen on 16 cores of an Intel Xeon multiprocessor.« less
Rasmussen, Anne S; Habermas, Tilmann
2011-08-01
According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation
NASA Technical Reports Server (NTRS)
O'Connell, Matthew; Karman, Steve L.
2015-01-01
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
Whorfian effects on colour memory are not reliable.
Wright, Oliver; Davies, Ian R L; Franklin, Anna
2015-01-01
The Whorfian hypothesis suggests that differences between languages cause differences in cognitive processes. Support for this idea comes from studies that find that patterns of colour memory errors made by speakers of different languages align with differences in colour lexicons. The current study provides a large-scale investigation of the relationship between colour language and colour memory, adopting a cross-linguistic and developmental approach. Colour memory on a delayed matching-to-sample (XAB) task was investigated in 2 language groups with differing colour lexicons, for 3 developmental stages and 2 regions of colour space. Analyses used a Bayesian technique to provide simultaneous assessment of two competing hypotheses (H1-Whorfian effect present, H0-Whorfian effect absent). Results of the analyses consistently favoured H0. The findings suggest that Whorfian effects on colour memory are not reliable and that the importance of such effects should not be overestimated.
Elephant random walks and their connection to Pólya-type urns
NASA Astrophysics Data System (ADS)
Baur, Erich; Bertoin, Jean
2016-11-01
In this paper, we explain the connection between the elephant random walk (ERW) and an urn model à la Pólya and derive functional limit theorems for the former. The ERW model was introduced in [Phys. Rev. E 70, 045101 (2004), 10.1103/PhysRevE.70.045101] to study memory effects in a highly non-Markovian setting. More specifically, the ERW is a one-dimensional discrete-time random walk with a complete memory of its past. The influence of the memory is measured in terms of a memory parameter p between zero and one. In the past years, a considerable effort has been undertaken to understand the large-scale behavior of the ERW, depending on the choice of p . Here, we use known results on urns to explicitly solve the ERW in all memory regimes. The method works as well for ERWs in higher dimensions and is widely applicable to related models.
NASA Astrophysics Data System (ADS)
González, Diego; Botella, Guillermo; García, Carlos; Prieto, Manuel; Tirado, Francisco
2013-12-01
This contribution focuses on the optimization of matching-based motion estimation algorithms widely used for video coding standards using an Altera custom instruction-based paradigm and a combination of synchronous dynamic random access memory (SDRAM) with on-chip memory in Nios II processors. A complete profile of the algorithms is achieved before the optimization, which locates code leaks, and afterward, creates a custom instruction set, which is then added to the specific design, enhancing the original system. As well, every possible memory combination between on-chip memory and SDRAM has been tested to achieve the best performance. The final throughput of the complete designs are shown. This manuscript outlines a low-cost system, mapped using very large scale integration technology, which accelerates software algorithms by converting them into custom hardware logic blocks and showing the best combination between on-chip memory and SDRAM for the Nios II processor.
Automated quantitative muscle biopsy analysis system
NASA Technical Reports Server (NTRS)
Castleman, Kenneth R. (Inventor)
1980-01-01
An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Experimental evaluation of multiprocessor cache-based error recovery
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. K.
1991-01-01
Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A; Faraj, Daniel A
2013-06-04
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A.; Faraj, Daniel A.
2012-12-11
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Cognitive Correlates of Metamemory in Alzheimer's Disease
Shaked, Danielle; Farrell, Meagan; Huey, Edward; Metcalfe, Janet; Cines, Sarah; Karlawish, Jason; Sullo, Elisabeth; Cosentino, Stephanie
2014-01-01
Objective Metamemory, or knowledge of one's memory abilities, is often impaired in individuals with Alzheimer's disease (AD), although the basis of this metacognitive deficit has not been fully articulated. Behavioral and imaging studies have produced conflicting evidence regarding the extent to which specific cognitive domains (i.e., executive functioning (EF), memory) and brain regions contribute to memory awareness. The primary aim of this study was to disentangle the cognitive correlates of metamemory in AD by examining the relatedness of objective metamemory performance to cognitive tasks grouped by domain (EF or memory) as well as by preferential hemispheric reliance defined by task modality (verbal or nonverbal). Method 89 participants with mild AD recruited at Columbia University Medical Center and the University of Pennsylvania underwent objective metamemory and cognitive testing. Partial correlations were used to assess the relationship between metamemory and four cognitive variables, adjusted for recruitment site. Results The significant correlates of metamemory included nonverbal fluency (r = .27 p = .02) and nonverbal memory (r = .24, p = .04). Conclusions Our findings suggest that objectively measured metamemory in a large sample of individuals with mild AD is selectively related to a set of inter-domain nonverbal tasks. The association between metamemory and the nonverbal tasks may implicate a shared reliance on a right-sided cognitive network that spans frontal and temporal regions. PMID:24819066
Wierzba, M; Riegel, M; Wypych, M; Jednoróg, K; Grabowska, A; Marchewka, A
2018-02-28
It is widely accepted that people differ in memory performance. The ability to control one's memory depends on multiple factors, including the emotional properties of the memorized material. While it was widely demonstrated that emotion can facilitate memory, it is unclear how emotion modifies our ability to suppress memory. One of the reasons for the lack of consensus among researchers is that individual differences in memory performance were largely neglected in previous studies. We used the directed forgetting paradigm in an fMRI study, in which subjects viewed neutral and emotional words, which they were instructed to remember or to forget. Subsequently, subjects' memory of these words was tested. Finally, they assessed the words on scales of valence, arousal, sadness and fear. We found that memory performance depended on instruction as reflected in the engagement of the lateral prefrontal cortex (lateral PFC), irrespective of emotional properties of words. While the lateral PFC engagement did not differ between neutral and emotional conditions, it correlated with behavioural performance when emotional - as opposed to neutral - words were presented. A deeper understanding of the underlying brain mechanisms is likely to require a study of individual differences in cognitive abilities to suppress memory.
a Voxel-Based Metadata Structure for Change Detection in Point Clouds of Large-Scale Urban Areas
NASA Astrophysics Data System (ADS)
Gehrung, J.; Hebel, M.; Arens, M.; Stilla, U.
2018-05-01
Mobile laser scanning has not only the potential to create detailed representations of urban environments, but also to determine changes up to a very detailed level. An environment representation for change detection in large scale urban environments based on point clouds has drawbacks in terms of memory scalability. Volumes, however, are a promising building block for memory efficient change detection methods. The challenge of working with 3D occupancy grids is that the usual raycasting-based methods applied for their generation lead to artifacts caused by the traversal of unfavorable discretized space. These artifacts have the potential to distort the state of voxels in close proximity to planar structures. In this work we propose a raycasting approach that utilizes knowledge about planar surfaces to completely prevent this kind of artifacts. To demonstrate the capabilities of our approach, a method for the iterative volumetric approximation of point clouds that allows to speed up the raycasting by 36 percent is proposed.
Violante, Ines R; Li, Lucia M; Carmichael, David W; Lorenz, Romy; Leech, Robert; Hampshire, Adam; Rothwell, John C; Sharp, David J
2017-03-14
Cognitive functions such as working memory (WM) are emergent properties of large-scale network interactions. Synchronisation of oscillatory activity might contribute to WM by enabling the coordination of long-range processes. However, causal evidence for the way oscillatory activity shapes network dynamics and behavior in humans is limited. Here we applied transcranial alternating current stimulation (tACS) to exogenously modulate oscillatory activity in a right frontoparietal network that supports WM. Externally induced synchronization improved performance when cognitive demands were high. Simultaneously collected fMRI data reveals tACS effects dependent on the relative phase of the stimulation and the internal cognitive processing state. Specifically, synchronous tACS during the verbal WM task increased parietal activity, which correlated with behavioral performance. Furthermore, functional connectivity results indicate that the relative phase of frontoparietal stimulation influences information flow within the WM network. Overall, our findings demonstrate a link between behavioral performance in a demanding WM task and large-scale brain synchronization.
Violante, Ines R; Li, Lucia M; Carmichael, David W; Lorenz, Romy; Leech, Robert; Hampshire, Adam; Rothwell, John C; Sharp, David J
2017-01-01
Cognitive functions such as working memory (WM) are emergent properties of large-scale network interactions. Synchronisation of oscillatory activity might contribute to WM by enabling the coordination of long-range processes. However, causal evidence for the way oscillatory activity shapes network dynamics and behavior in humans is limited. Here we applied transcranial alternating current stimulation (tACS) to exogenously modulate oscillatory activity in a right frontoparietal network that supports WM. Externally induced synchronization improved performance when cognitive demands were high. Simultaneously collected fMRI data reveals tACS effects dependent on the relative phase of the stimulation and the internal cognitive processing state. Specifically, synchronous tACS during the verbal WM task increased parietal activity, which correlated with behavioral performance. Furthermore, functional connectivity results indicate that the relative phase of frontoparietal stimulation influences information flow within the WM network. Overall, our findings demonstrate a link between behavioral performance in a demanding WM task and large-scale brain synchronization. DOI: http://dx.doi.org/10.7554/eLife.22001.001 PMID:28288700
... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...
Rudolph, Marc D; Graham, Alice M; Feczko, Eric; Miranda-Dominguez, Oscar; Rasmussen, Jerod M; Nardos, Rahel; Entringer, Sonja; Wadhwa, Pathik D; Buss, Claudia; Fair, Damien A
2018-05-01
Several lines of evidence support the link between maternal inflammation during pregnancy and increased likelihood of neurodevelopmental and psychiatric disorders in offspring. This longitudinal study seeks to advance understanding regarding implications of systemic maternal inflammation during pregnancy, indexed by plasma interleukin-6 (IL-6) concentrations, for large-scale brain system development and emerging executive function skills in offspring. We assessed maternal IL-6 during pregnancy, functional magnetic resonance imaging acquired in neonates, and working memory (an important component of executive function) at 2 years of age. Functional connectivity within and between multiple neonatal brain networks can be modeled to estimate maternal IL-6 concentrations during pregnancy. Brain regions heavily weighted in these models overlap substantially with those supporting working memory in a large meta-analysis. Maternal IL-6 also directly accounts for a portion of the variance of working memory at 2 years of age. Findings highlight the association of maternal inflammation during pregnancy with the developing functional architecture of the brain and emerging executive function.
Evaluating the performance of the particle finite element method in parallel architectures
NASA Astrophysics Data System (ADS)
Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.
2014-05-01
This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Wu, Junjie; Waxman, David J
2015-01-01
Cancer chemotherapy using cytotoxic drugs can induce immunogenic tumor cell death; however, dosing regimens and schedules that enable single-agent chemotherapy to induce adaptive immune-dependent ablation of large, established tumors with activation of long-term immune memory have not been identified. Here, we investigate this issue in a syngeneic, implanted GL261 glioma model in immune-competent mice given cyclophosphamide on a 6-day repeating metronomic schedule. Two cycles of metronomic cyclophosphamide treatment induced sustained upregulation of tumor-associated CD8+ cytotoxic T lymphocyte (CTL) cells, natural killer (NK) cells, macrophages, and other immune cells. Expression of CTL- and NK–cell-shared effectors peaked on Day 6, and then declined by Day 9 after the second cyclophosphamide injection and correlated inversely with the expression of the regulatory T cell (Treg) marker Foxp3. Sustained tumor regression leading to tumor ablation was achieved after several cyclophosphamide treatment cycles. Tumor ablation required CD8+ T cells, as shown by immunodepletion studies, and was associated with immunity to re-challenge with GL261 glioma cells, but not B16-F10 melanoma or Lewis lung carcinoma cells. Rejection of GL261 tumor re-challenge was associated with elevated CTLs in blood and increased CTL infiltration in tumors, consistent with the induction of long-term, specific CD8+ T-cell anti-GL261 tumor memory. Co-depletion of CD8+ T cells and NK cells did not inhibit tumor regression beyond CD8+ T-cell depletion alone, suggesting that the metronomic cyclophosphamide-activated NK cells function via CD8a+ T cells. Taken together, these findings provide proof-of-concept that single-agent chemotherapy delivered on an optimized metronomic schedule can eradicate large, established tumors and induce long-term immune memory. PMID:26137402
We Remember, We Forget: Collaborative Remembering in Older Couples
ERIC Educational Resources Information Center
Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.
2011-01-01
Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…
76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-09
... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports
1991-06-01
Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
ERIC Educational Resources Information Center
Veaner, Allen B.
Project BALLOTS is a large-scale library automation development project of the Stanford University Libraries which has demonstrated the feasibility of conducting on-line interactive searches of complex bibliographic files, with a large number of users working simultaneously in the same or different files. This report documents the continuing…
Blanket Gate Would Address Blocks Of Memory
NASA Technical Reports Server (NTRS)
Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.
1988-01-01
Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.
The potential of multi-port optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1975-01-01
A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
How much a galaxy knows about its large-scale environment?: An information theoretic perspective
NASA Astrophysics Data System (ADS)
Pandey, Biswajit; Sarkar, Suman
2017-05-01
The small-scale environment characterized by the local density is known to play a crucial role in deciding the galaxy properties but the role of large-scale environment on galaxy formation and evolution still remain a less clear issue. We propose an information theoretic framework to investigate the influence of large-scale environment on galaxy properties and apply it to the data from the Galaxy Zoo project that provides the visual morphological classifications of ˜1 million galaxies from the Sloan Digital Sky Survey. We find a non-zero mutual information between morphology and environment that decreases with increasing length-scales but persists throughout the entire length-scales probed. We estimate the conditional mutual information and the interaction information between morphology and environment by conditioning the environment on different length-scales and find a synergic interaction between them that operates up to at least a length-scales of ˜30 h-1 Mpc. Our analysis indicates that these interactions largely arise due to the mutual information shared between the environments on different length-scales.
[Artificial intelligence meeting neuropsychology. Semantic memory in normal and pathological aging].
Aimé, Xavier; Charlet, Jean; Maillet, Didier; Belin, Catherine
2015-03-01
Artificial intelligence (IA) is the subject of much research, but also many fantasies. It aims to reproduce human intelligence in its learning capacity, knowledge storage and computation. In 2014, the Defense Advanced Research Projects Agency (DARPA) started the restoring active memory (RAM) program that attempt to develop implantable technology to bridge gaps in the injured brain and restore normal memory function to people with memory loss caused by injury or disease. In another IA's field, computational ontologies (a formal and shared conceptualization) try to model knowledge in order to represent a structured and unambiguous meaning of the concepts of a target domain. The aim of these structures is to ensure a consensual understanding of their meaning and a univariant use (the same concept is used by all to categorize the same individuals). The first representations of knowledge in the AI's domain are largely based on model tests of semantic memory. This one, as a component of long-term memory is the memory of words, ideas, concepts. It is the only declarative memory system that resists so remarkably to the effects of age. In contrast, non-specific cognitive changes may decrease the performance of elderly in various events and instead report difficulties of access to semantic representations that affect the semantics stock itself. Some dementias, like semantic dementia and Alzheimer's disease, are linked to alteration of semantic memory. We propose in this paper, using the computational ontologies model, a formal and relatively thin modeling, in the service of neuropsychology: 1) for the practitioner with decision support systems, 2) for the patient as cognitive prosthesis outsourced, and 3) for the researcher to study semantic memory.
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Network selection, Information filtering and Scalable computation
NASA Astrophysics Data System (ADS)
Ye, Changqing
This dissertation explores two application scenarios of sparsity pursuit method on large scale data sets. The first scenario is classification and regression in analyzing high dimensional structured data, where predictors corresponds to nodes of a given directed graph. This arises in, for instance, identification of disease genes for the Parkinson's diseases from a network of candidate genes. In such a situation, directed graph describes dependencies among the genes, where direction of edges represent certain causal effects. Key to high-dimensional structured classification and regression is how to utilize dependencies among predictors as specified by directions of the graph. In this dissertation, we develop a novel method that fully takes into account such dependencies formulated through certain nonlinear constraints. We apply the proposed method to two applications, feature selection in large margin binary classification and in linear regression. We implement the proposed method through difference convex programming for the cost function and constraints. Finally, theoretical and numerical analyses suggest that the proposed method achieves the desired objectives. An application to disease gene identification is presented. The second application scenario is personalized information filtering which extracts the information specifically relevant to a user, predicting his/her preference over a large number of items, based on the opinions of users who think alike or its content. This problem is cast into the framework of regression and classification, where we introduce novel partial latent models to integrate additional user-specific and content-specific predictors, for higher predictive accuracy. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each representing a user's preference and an item preference by users. Then we propose a likelihood method to seek a sparsest latent factorization, from a class of over-complete factorizations, possibly with a high percentage of missing values. This promotes additional sparsity beyond rank reduction. Computationally, we design methods based on a ``decomposition and combination'' strategy, to break large-scale optimization into many small subproblems to solve in a recursive and parallel manner. On this basis, we implement the proposed methods through multi-platform shared-memory parallel programming, and through Mahout, a library for scalable machine learning and data mining, for mapReduce computation. For example, our methods are scalable to a dataset consisting of three billions of observations on a single machine with sufficient memory, having good timings. Both theoretical and numerical investigations show that the proposed methods exhibit significant improvement in accuracy over state-of-the-art scalable methods.
Nieder, Andreas; Miller, Earl K
2003-01-09
Whether cognitive representations are better conceived as language-based, symbolic representations or perceptually related, analog representations is a subject of debate. If cognitive processes parallel perceptual processes, then fundamental psychophysical laws should hold for each. To test this, we analyzed both behavioral and neuronal representations of numerosity in the prefrontal cortex of rhesus monkeys. The data were best described by a nonlinearly compressed scaling of numerical information, as postulated by the Weber-Fechner law or Stevens' law for psychophysical/sensory magnitudes. This nonlinear compression was observed on the neural level during the acquisition phase of the task and maintained through the memory phase with no further compression. These results suggest that certain cognitive and perceptual/sensory representations share the same fundamental mechanisms and neural coding schemes.
Emerging Applications for High K Materials in VLSI Technology
Clark, Robert D.
2014-01-01
The current status of High K dielectrics in Very Large Scale Integrated circuit (VLSI) manufacturing for leading edge Dynamic Random Access Memory (DRAM) and Complementary Metal Oxide Semiconductor (CMOS) applications is summarized along with the deposition methods and general equipment types employed. Emerging applications for High K dielectrics in future CMOS are described as well for implementations in 10 nm and beyond nodes. Additional emerging applications for High K dielectrics include Resistive RAM memories, Metal-Insulator-Metal (MIM) diodes, Ferroelectric logic and memory devices, and as mask layers for patterning. Atomic Layer Deposition (ALD) is a common and proven deposition method for all of the applications discussed for use in future VLSI manufacturing. PMID:28788599
Sison, Margarette; Gerlai, Robert
2011-01-01
The zebrafish is gaining popularity in behavioral neuroscience perhaps because of a promise of efficient large scale mutagenesis and drug screens that could identify a substantial number of yet undiscovered molecular players involved in complex traits. Learning and memory are complex functions of the brain and the analysis of their mechanisms may benefit from such large scale zebrafish screens. One bottleneck in this research is the paucity of appropriate behavioral screening paradigms, which may be due to the relatively uncharacterized nature of the behavior of this species. Here we show that zebrafish exhibit good learning performance in a task adapted from the mammalian literature, a plus maze in which zebrafish are required to associate a neutral visual stimulus with the presence of conspecifics, the rewarding unconditioned stimulus. Furthermore, we show that MK-801, a non-competitive NMDA-R antagonist, impairs memory performance in this maze when administered right after training or just before recall but not when given before training at a dose that does not impair motor function, perception or motivation. These results suggest that the plus maze associative learning paradigm has face and construct validity and that zebrafish may become an appropriate and translationally relevant study species for the analysis of the mechanisms of vertebrate, including mammalian, learning and memory. PMID:21596149
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Methodology for fast detection of false sharing in threaded scientific codes
Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang
2014-11-25
A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Protein-Based Three-Dimensional Memories and Associative Processors
NASA Astrophysics Data System (ADS)
Birge, Robert
2008-03-01
The field of bioelectronics has benefited from the fact that nature has often solved problems of a similar nature to those which must be solved to create molecular electronic or photonic devices that operate with efficiency and reliability. Retinal proteins show great promise in bioelectronic devices because they operate with high efficiency (˜0.65%), high cyclicity (>10^7), operate over an extended wavelength range (360 -- 630 nm) and can convert light into changes in voltage, pH, absorption or refractive index. This talk will focus on a retinal protein called bacteriorhodopsin, the proton pump of the organism Halobacterium salinarum. Two memories based on this protein will be described. The first is an optical three-dimensional memory. This memory stores information using volume elements (voxels), and provides as much as a thousand-fold improvement in effective capacity over current technology. A unique branching reaction of a variant of bacteriorhodopsin is used to turn each protein into an optically addressed latched AND gate. Although three working prototypes have been developed, a number of cost/performance and architectural issues must be resolved prior to commercialization. The major issue is that the native protein provides a very inefficient branching reaction. Genetic engineering has improved performance by nearly 500-fold, but a further order of magnitude improvement is needed. Protein-based holographic associative memories will also be discussed. The human brain stores and retrieves information via association, and human intelligence is intimately connected to the nature and enormous capacity of this associative search and retrieval process. To a first order approximation, creativity can be viewed as the association of two seemingly disparate concepts to form a totally new construct. Thus, artificial intelligence requires large scale associative memories. Current computer hardware does not provide an optimal environment for creating artificial intelligence due to the serial nature of random access memories. Software cannot provide a satisfactory work-around that does not introduce unacceptable latency. Holographic associative memories provide a useful approach to large scale associative recall. Bacteriorhodopsin has long been recognized for its outstanding holographic properties, and when utilized in the Paek and Psaltis design, provides a high-speed real-time associative memory with variable thresholding and feedback. What remains is to make an associative memory capable of high-speed association and long-term data storage. The use of directed evolution to create a protein with the necessary unique properties will be discussed.
Comparing soil moisture memory in satellite observations and models
NASA Astrophysics Data System (ADS)
Stacke, Tobias; Hagemann, Stefan; Loew, Alexander
2013-04-01
A major obstacle to a correct parametrization of soil processes in large scale global land surface models is the lack of long term soil moisture observations for large parts of the globe. Currently, a compilation of soil moisture data derived from a range of satellites is released by the ESA Climate Change Initiative (ECV_SM). Comprising the period from 1978 until 2010, it provides the opportunity to compute climatological relevant statistics on a quasi-global scale and to compare these to the output of climate models. Our study is focused on the investigation of soil moisture memory in satellite observations and models. As a proxy for memory we compute the autocorrelation length (ACL) of the available satellite data and the uppermost soil layer of the models. Additional to the ECV_SM data, AMSR-E soil moisture is used as observational estimate. Simulated soil moisture fields are taken from ERA-Interim reanalysis and generated with the land surface model JSBACH, which was driven with quasi-observational meteorological forcing data. The satellite data show ACLs between one week and one month for the greater part of the land surface while the models simulate a longer memory of up to two months. Some pattern are similar in models and observations, e.g. a longer memory in the Sahel Zone and the Arabian Peninsula, but the models are not able to reproduce regions with a very short ACL of just a few days. If the long term seasonality is subtracted from the data the memory is strongly shortened, indicating the importance of seasonal variations for the memory in most regions. Furthermore, we analyze the change of soil moisture memory in the different soil layers of the models to investigate to which extent the surface soil moisture includes information about the whole soil column. A first analysis reveals that the ACL is increasing for deeper layers. However, its increase is stronger in the soil moisture anomaly than in its absolute values and the first even exceeds the latter in the deepest layer. From this we conclude that the seasonal soil moisture variations dominate the memory close to the surface but these are dampened in lower layers where the memory is mainly affected by longer term variations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N
2015-10-13
We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
Ultralow-power switching via defect engineering in germanium telluride phase-change memory devices.
Nukala, Pavan; Lin, Chia-Chun; Composto, Russell; Agarwal, Ritesh
2016-01-25
Crystal-amorphous transformation achieved via the melt-quench pathway in phase-change memory involves fundamentally inefficient energy conversion events; and this translates to large switching current densities, responsible for chemical segregation and device degradation. Alternatively, introducing defects in the crystalline phase can engineer carrier localization effects enhancing carrier-lattice coupling; and this can efficiently extract work required to introduce bond distortions necessary for amorphization from input electrical energy. Here, by pre-inducing extended defects and thus carrier localization effects in crystalline GeTe via high-energy ion irradiation, we show tremendous improvement in amorphization current densities (0.13-0.6 MA cm(-2)) compared with the melt-quench strategy (∼50 MA cm(-2)). We show scaling behaviour and good reversibility on these devices, and explore several intermediate resistance states that are accessible during both amorphization and recrystallization pathways. Existence of multiple resistance states, along with ultralow-power switching and scaling capabilities, makes this approach promising in context of low-power memory and neuromorphic computation.
Ultralow-power switching via defect engineering in germanium telluride phase-change memory devices
Nukala, Pavan; Lin, Chia-Chun; Composto, Russell; Agarwal, Ritesh
2016-01-01
Crystal–amorphous transformation achieved via the melt-quench pathway in phase-change memory involves fundamentally inefficient energy conversion events; and this translates to large switching current densities, responsible for chemical segregation and device degradation. Alternatively, introducing defects in the crystalline phase can engineer carrier localization effects enhancing carrier–lattice coupling; and this can efficiently extract work required to introduce bond distortions necessary for amorphization from input electrical energy. Here, by pre-inducing extended defects and thus carrier localization effects in crystalline GeTe via high-energy ion irradiation, we show tremendous improvement in amorphization current densities (0.13–0.6 MA cm−2) compared with the melt-quench strategy (∼50 MA cm−2). We show scaling behaviour and good reversibility on these devices, and explore several intermediate resistance states that are accessible during both amorphization and recrystallization pathways. Existence of multiple resistance states, along with ultralow-power switching and scaling capabilities, makes this approach promising in context of low-power memory and neuromorphic computation. PMID:26805748