Mehdi, Niaz; Rehan, Muhammad; Malik, Fahad Mumtaz; Bhatti, Aamer Iqbal; Tufail, Muhammad
2014-05-01
This paper describes the anti-windup compensator (AWC) design methodologies for stable and unstable cascade plants with cascade controllers facing actuator saturation. Two novel full-order decoupling AWC architectures, based on equivalence of the overall closed-loop system, are developed to deal with windup effects. The decoupled architectures have been developed, to formulate the AWC synthesis problem, by assuring equivalence of the coupled and the decoupled architectures, instead of using an analogy, for cascade control systems. A comparison of both AWC architectures from application point of view is provided to consolidate their utilities. Mainly, one of the architecture is better in terms of computational complexity for implementation, while the other is suitable for unstable cascade systems. On the basis of the architectures for cascade systems facing stability and performance degradation problems in the event of actuator saturation, the global AWC design methodologies utilizing linear matrix inequalities (LMIs) are developed. These LMIs are synthesized by application of the Lyapunov theory, the global sector condition and the ℒ2 gain reduction of the uncertain decoupled nonlinear component of the decoupled architecture. Further, an LMI-based local AWC design methodology is derived by utilizing a local sector condition by means of a quadratic Lyapunov function to resolve the windup problem for unstable cascade plants under saturation. To demonstrate effectiveness of the proposed AWC schemes, an underactuated mechanical system, the ball-and-beam system, is considered, and details of the simulation and practical implementation results are described. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Hadwiger, M; Beyer, J; Jeong, Won-Ki; Pfister, H
2012-12-01
This paper presents the first volume visualization system that scales to petascale volumes imaged as a continuous stream of high-resolution electron microscopy images. Our architecture scales to dense, anisotropic petascale volumes because it: (1) decouples construction of the 3D multi-resolution representation required for visualization from data acquisition, and (2) decouples sample access time during ray-casting from the size of the multi-resolution hierarchy. Our system is designed around a scalable multi-resolution virtual memory architecture that handles missing data naturally, does not pre-compute any 3D multi-resolution representation such as an octree, and can accept a constant stream of 2D image tiles from the microscopes. A novelty of our system design is that it is visualization-driven: we restrict most computations to the visible volume data. Leveraging the virtual memory architecture, missing data are detected during volume ray-casting as cache misses, which are propagated backwards for on-demand out-of-core processing. 3D blocks of volume data are only constructed from 2D microscope image tiles when they have actually been accessed during ray-casting. We extensively evaluate our system design choices with respect to scalability and performance, compare to previous best-of-breed systems, and illustrate the effectiveness of our system for real microscopy data from neuroscience.
A Down-to-Earth Educational Operating System for Up-in-the-Cloud Many-Core Architectures
ERIC Educational Resources Information Center
Ziwisky, Michael; Persohn, Kyle; Brylow, Dennis
2013-01-01
We present "Xipx," the first port of a major educational operating system to a processor in the emerging class of many-core architectures. Through extensions to the proven Embedded Xinu operating system, Xipx gives students hands-on experience with system programming in a distributed message-passing environment. We expose the software primitives…
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.
2015-05-01
Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
Fischbach, Martin; Wiebusch, Dennis; Latoschik, Marc Erich
2017-04-01
Modularity, modifiability, reusability, and API usability are important software qualities that determine the maintainability of software architectures. Virtual, Augmented, and Mixed Reality (VR, AR, MR) systems, modern computer games, as well as interactive human-robot systems often include various dedicated input-, output-, and processing subsystems. These subsystems collectively maintain a real-time simulation of a coherent application state. The resulting interdependencies between individual state representations, mutual state access, overall synchronization, and flow of control implies a conceptual close coupling whereas software quality asks for a decoupling to develop maintainable solutions. This article presents five semantics-based software techniques that address this contradiction: Semantic grounding, code from semantics, grounded actions, semantic queries, and decoupling by semantics. These techniques are applied to extend the well-established entity-component-system (ECS) pattern to overcome some of this pattern's deficits with respect to the implied state access. A walk-through of central implementation aspects of a multimodal (speech and gesture) VR-interface is used to highlight the techniques' benefits. This use-case is chosen as a prototypical example of complex architectures with multiple interacting subsystems found in many VR, AR and MR architectures. Finally, implementation hints are given, lessons learned regarding maintainability pointed-out, and performance implications discussed.
Li, Hao; Chen, Guang; Das, Siddhartha
2016-11-01
Understanding the behavior and properties of spherical polyelectrolyte brushes (SPEBs), which are polyelectrolyte brushes grafted to a spherical core, is fundamental to many applications in biomedical, chemical and petroleum engineering as well as in pharmaceutics. In this paper, we study the pH-responsive electrostatics of such SPEBs in the decoupled regime. In the first part of the paper, we derive the scaling conditions in terms of the grafting density of the PEs on the spherical core that ensure that the analysis can be performed in the decoupled regime. In such a regime the elastic and the excluded volume effects of polyelectrolyte brushes (PEBs) can be decoupled from the electrostatic effects associated with the PE charge and the induced EDL. As a consequence the PE brush height, assumed to be dictated by the balance of the elastic and excluded volume effects, can be independent of the electrostatic effects. In the second part, we quantify the pH-responsive electrostatics of the SPEBs - we pinpoint that the radial monomer distribution for a given brush molecule exhibit a non-unique cubic distribution that decays away from the spherical core. Such a monomer distribution ensures that the hydrogen ion concentration is appropriately accounted for in the description of the SPEB thermodynamics. We anticipate that the present analysis, which provides possibly one of the first models for probing the electrostatics of pH-responsive SPEBs in a thermodynamically-consistent framework, will be vital for understanding the behavior of a large number of entities ranging from PE-coated NPs and stealth liposomes to biomolecules like bacteria and viruses. Copyright © 2016 Elsevier B.V. All rights reserved.
Coupled and decoupled on-chip solenoid inductors with nanogranular magnetic cores
NASA Astrophysics Data System (ADS)
He, Yuhan; Wang, Luo; Wang, Yicheng; Zhang, Huaiwu; Peng, Dongliang; Bai, Feiming
2017-12-01
On-chip integrated solenoid inductors with multilayered nanogranular magnetic cores have been designed and fabricated on silicon wafers. Both decoupled and coupled inductors with multilayered magnetic cores were studied. For the decoupled inductor, an inductance of 14.2 nH or an equivalent inductance area density greater than 100 nH/mm2 was obtained, which is about 14 times of that of the air-core inductor, and the quality factor is 7.5 at 130 MHz. For the coupled inductor, an even higher peak quality factor of 17 was achieved at 300 MHz, however, the inductance area density decreased to 34 nH/mm2. The reason of the enhanced peak quality factor was attributed to less spike domains on the edge of the closure-loop shaped magnetic core, and therefore higher permeability and more uniform uniaxial anisotropy.
Silicon Nanophotonics for Many-Core On-Chip Networks
NASA Astrophysics Data System (ADS)
Mohamed, Moustafa
Number of cores in many-core architectures are scaling to unprecedented levels requiring ever increasing communication capacity. Traditionally, architects follow the path of higher throughput at the expense of latency. This trend has evolved into being problematic for performance in many-core architectures. Moreover, the trends of power consumption is increasing with system scaling mandating nontraditional solutions. Nanophotonics can address these problems, offering benefits in the three frontiers of many-core processor design: Latency, bandwidth, and power. Nanophotonics leverage circuit-switching flow control allowing low latency; in addition, the power consumption of optical links is significantly lower compared to their electrical counterparts at intermediate and long links. Finally, through wave division multiplexing, we can keep the high bandwidth trends without sacrificing the throughput. This thesis focuses on realizing nanophotonics for communication in many-core architectures at different design levels considering reliability challenges that our fabrication and measurements reveal. First, we study how to design on-chip networks for low latency, low power, and high bandwidth by exploiting the full potential of nanophotonics. The design process considers device level limitations and capabilities on one hand, and system level demands in terms of power and performance on the other hand. The design involves the choice of devices, designing the optical link, the topology, the arbitration technique, and the routing mechanism. Next, we address the problem of reliability in on-chip networks. Reliability not only degrades performance but can block communication. Hence, we propose a reliability-aware design flow and present a reliability management technique based on this flow to address reliability in the system. In the proposed flow reliability is modeled and analyzed for at the device, architecture, and system level. Our reliability management technique is superior to existing solutions in terms of power and performance. In fact, our solution can scale to thousand core with low overhead.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harrison, Cyrus; Larsen, Matt; Brugger, Eric
Strawman is a system designed to explore the in situ visualization and analysis needs of simulation code teams running multi-physics calculations on many-core HPC architectures. It porvides rendering pipelines that can leverage both many-core CPUs and GPUs to render images of simulation meshes.
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
NASA Technical Reports Server (NTRS)
Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.
2012-01-01
In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction
NASA Astrophysics Data System (ADS)
Kulkarni, Amey; Stanislaus, Jerome L.; Mohsenin, Tinoosh
2014-05-01
Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS reconstruction algorithms. We show the implementation results of proposed architectures on FPGA, ASIC and on a custom many-core platform. For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m. The algorithm is divided into three kernels. Each kernel is parallelized to reduce execution time, whereas efficient reuse of the matrix operators allows us to reduce area. Matrix operations are efficiently paralellized by taking advantage of blocked algorithms. For demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 μs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 μs. ASIC implementation reconstructs the signal in 13 μs. However, our custom many-core, operating at 1.18 GHz, takes 18.28 μs to complete. Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines
2014-11-01
architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges
Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio
2014-01-01
We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications. PMID:24710354
High-performance 3D compressive sensing MRI reconstruction.
Kim, Daehyun; Trzasko, Joshua D; Smelyanskiy, Mikhail; Haider, Clifton R; Manduca, Armando; Dubey, Pradeep
2010-01-01
Compressive Sensing (CS) is a nascent sampling and reconstruction paradigm that describes how sparse or compressible signals can be accurately approximated using many fewer samples than traditionally believed. In magnetic resonance imaging (MRI), where scan duration is directly proportional to the number of acquired samples, CS has the potential to dramatically decrease scan time. However, the computationally expensive nature of CS reconstructions has so far precluded their use in routine clinical practice - instead, more-easily generated but lower-quality images continue to be used. We investigate the development and optimization of a proven inexact quasi-Newton CS reconstruction algorithm on several modern parallel architectures, including CPUs, GPUs, and Intel's Many Integrated Core (MIC) architecture. Our (optimized) baseline implementation on a quad-core Core i7 is able to reconstruct a 256 × 160×80 volume of the neurovasculature from an 8-channel, 10 × undersampled data set within 56 seconds, which is already a significant improvement over existing implementations. The latest six-core Core i7 reduces the reconstruction time further to 32 seconds. Moreover, we show that the CS algorithm benefits from modern throughput-oriented architectures. Specifically, our CUDA-base implementation on NVIDIA GTX480 reconstructs the same dataset in 16 seconds, while Intel's Knights Ferry (KNF) of the MIC architecture even reduces the time to 12 seconds. Such level of performance allows the neurovascular dataset to be reconstructed within a clinically viable time.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Event Reconstruction for Many-core Architectures using Java
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graf, Norman A.; /SLAC
Although Moore's Law remains technically valid, the performance enhancements in computing which traditionally resulted from increased CPU speeds ended years ago. Chip manufacturers have chosen to increase the number of core CPUs per chip instead of increasing clock speed. Unfortunately, these extra CPUs do not automatically result in improvements in simulation or reconstruction times. To take advantage of this extra computing power requires changing how software is written. Event reconstruction is globally serial, in the sense that raw data has to be unpacked first, channels have to be clustered to produce hits before those hits are identified as belonging tomore » a track or shower, tracks have to be found and fit before they are vertexed, etc. However, many of the individual procedures along the reconstruction chain are intrinsically independent and are perfect candidates for optimization using multi-core architecture. Threading is perhaps the simplest approach to parallelizing a program and Java includes a powerful threading facility built into the language. We have developed a fast and flexible reconstruction package (org.lcsim) written in Java that has been used for numerous physics and detector optimization studies. In this paper we present the results of our studies on optimizing the performance of this toolkit using multiple threads on many-core architectures.« less
A novel method for calculating relative free energy of similar molecules in two environments
NASA Astrophysics Data System (ADS)
Farhi, Asaf; Singh, Bipin
2017-03-01
Calculating relative free energies is a topic of substantial interest and has many applications including solvation and binding free energies, which are used in computational drug discovery. However, there remain the challenges of accuracy, simple implementation, robustness and efficiency, which prevent the calculations from being automated and limit their use. Here we present an exact and complete decoupling analysis in which the partition functions of the compared systems decompose into the partition functions of the common and different subsystems. This decoupling analysis is applicable to submolecules with coupled degrees of freedom such as the methyl group and to any potential function (including the typical dihedral potentials), enabling to remove less terms in the transformation which results in a more efficient calculation. Then we show mathematically, in the context of partition function decoupling, that the two compared systems can be simulated separately, eliminating the need to design a composite system. We demonstrate the decoupling analysis and the separate transformations in a relative free energy calculation using MD simulations for a general force field and compare to another calculation and to experimental results. We present a unified soft-core technique that ensures the monotonicity of the numerically integrated function (analytical proof) which is important for the selection of intermediates. We show mathematically that in this soft-core technique the numerically integrated function can be non-steep only when we transform the systems separately, which can simplify the numerical integration. Finally, we show that when the systems have rugged energy landscape they can be equilibrated without introducing another sampling dimension which can also enable to use the simulation results for other free energy calculations.
Quantifying loopy network architectures.
Katifori, Eleni; Magnasco, Marcelo O
2012-01-01
Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of approaches have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes) from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
Performance evaluation of OpenFOAM on many-core architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.
Loops in hierarchical channel networks
NASA Astrophysics Data System (ADS)
Katifori, Eleni; Magnasco, Marcelo
2012-02-01
Nature provides us with many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture. Although a number of methods have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated and natural graphs extracted from digitized images of dicotyledonous leaves and animal vasculature. We calculate various metrics on the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Determination of the core promoter regions of the Saccharomyces cerevisiae RPS3 gene.
Joo, Yoo Jin; Kim, Jin-Ha; Baek, Joung Hee; Seong, Ki Moon; Lee, Jae Yung; Kim, Joon
2009-01-01
Ribosomal protein genes (RPG), which are scattered throughout the genomes of all eukaryotes, are subjected to coordinated expression. In yeast, the expression of RPGs is highly regulated, mainly at the transcriptional level. Recent research has found that many ribosomal proteins (RPs) function in multiple processes in addition to protein synthesis. Therefore, detailed knowledge of promoter architecture as well as gene regulation is important in understanding the multiple cellular processes mediated by RPGs. In this study, we investigated the functional architecture of the yeast RPS3 promoter and identified many putative cis-elements. Using beta-galactosidase reporter analysis and EMSA, the core promoter of RPS3 containing UASrpg and T-rich regions was corroborated. Moreover, the promoter occupancy of RPS3 by three transcription factors was confirmed. Taken together, our results further the current understanding of the promoter architecture and trans-elements of the Saccharomyces cerevisiae RPS3 gene.
Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
The science of visual analysis at extreme scale
NASA Astrophysics Data System (ADS)
Nowell, Lucy T.
2011-01-01
Driven by market forces and spanning the full spectrum of computational devices, computer architectures are changing in ways that present tremendous opportunities and challenges for data analysis and visual analytic technologies. Leadership-class high performance computing system will have as many as a million cores by 2020 and support 10 billion-way concurrency, while laptop computers are expected to have as many as 1,000 cores by 2015. At the same time, data of all types are increasing exponentially and automated analytic methods are essential for all disciplines. Many existing analytic technologies do not scale to make full use of current platforms and fewer still are likely to scale to the systems that will be operational by the end of this decade. Furthermore, on the new architectures and for data at extreme scales, validating the accuracy and effectiveness of analytic methods, including visual analysis, will be increasingly important.
Eye-hand coordination during a double-step task: evidence for a common stochastic accumulator
Gopal, Atul
2015-01-01
Many studies of reaching and pointing have shown significant spatial and temporal correlations between eye and hand movements. Nevertheless, it remains unclear whether these correlations are incidental, arising from common inputs (independent model); whether these correlations represent an interaction between otherwise independent eye and hand systems (interactive model); or whether these correlations arise from a single dedicated eye-hand system (common command model). Subjects were instructed to redirect gaze and pointing movements in a double-step task in an attempt to decouple eye-hand movements and causally distinguish between the three architectures. We used a drift-diffusion framework in the context of a race model, which has been previously used to explain redirect behavior for eye and hand movements separately, to predict the pattern of eye-hand decoupling. We found that the common command architecture could best explain the observed frequency of different eye and hand response patterns to the target step. A common stochastic accumulator for eye-hand coordination also predicts comparable variances, despite significant difference in the means of the eye and hand reaction time (RT) distributions, which we tested. Consistent with this prediction, we observed that the variances of the eye and hand RTs were similar, despite much larger hand RTs (∼90 ms). Moreover, changes in mean eye RTs, which also increased eye RT variance, produced a similar increase in mean and variance of the associated hand RT. Taken together, these data suggest that a dedicated circuit underlies coordinated eye-hand planning. PMID:26084906
Pu, Weidan; Luo, Qiang; Jiang, Yali; Gao, Yidian; Ming, Qingsen; Yao, Shuqiao
2017-09-12
Psychopathic traits of conduct disorder (CD) have a core callous-unemotional (CU) component and an impulsive-antisocial component. Previous task-driven fMRI studies have suggested that psychopathic traits are associated with dysfunction of several brain areas involved in different cognitive functions (e.g., empathy, reward, and response inhibition etc.), but the relationship between psychopathic traits and intrinsic brain functional architecture has not yet been explored in CD. Using a holistic brain-wide functional connectivity analysis, this study delineated the alterations in brain functional networks in patients with conduct disorder. Compared with matched healthy controls, we found decreased anti-synchronization between the fronto-parietal network (FPN) and default mode network (DMN), and increased intra-network synchronization within the frontothalamic-basal ganglia, right frontoparietal, and temporal/limbic/visual networks in CD patients. Correlation analysis showed that the weakened FPN-DMN interaction was associated with CU traits, while the heightened intra-network functional connectivity was related to impulsivity traits in CD patients. Our findings suggest that decoupling of cognitive control (FPN) with social understanding of others (DMN) is associated with the CU traits, and hyper-functions of the reward and motor inhibition systems elevate impulsiveness in CD.
A path to practical Solar Pumped Lasers via Radiative Energy Transfer
Reusswig, Philip D.; Nechayev, Sergey; Scherer, Jennifer M.; ...
2015-10-05
The optical conversion of incoherent solar radiation into a bright, coherent laser beam enables the application of nonlinear optics to solar energy conversion and storage. Here, we present an architecture for solar pumped lasers that uses a luminescent solar concentrator to decouple the conventional trade-off between solar absorption efficiency and the mode volume of the optical gain material. We report a 750-μm-thick Nd 3+ -doped YAG planar waveguide sensitized by a luminescent CdSe/CdZnS (core/shell) colloidal nanocrystal, yielding a peak cascade energy transfer of 14%, a broad spectral response in the visible portion of the solar spectrum, and an equivalent quasi-CWmore » solar lasing threshold of 23 W-cm -2, or approximately 230 suns. The efficient coupling of incoherent, spectrally broad sunlight in small gain volumes should allow the generation of coherent laser light from intensities of less than 100 suns.« less
A path to practical Solar Pumped Lasers via Radiative Energy Transfer
Reusswig, Philip D.; Nechayev, Sergey; Scherer, Jennifer M.; Hwang, Gyu Weon; Bawendi, Moungi G.; Baldo, Marc. A.; Rotschild, Carmel
2015-01-01
The optical conversion of incoherent solar radiation into a bright, coherent laser beam enables the application of nonlinear optics to solar energy conversion and storage. Here, we present an architecture for solar pumped lasers that uses a luminescent solar concentrator to decouple the conventional trade-off between solar absorption efficiency and the mode volume of the optical gain material. We report a 750-μm-thick Nd3+-doped YAG planar waveguide sensitized by a luminescent CdSe/CdZnS (core/shell) colloidal nanocrystal, yielding a peak cascade energy transfer of 14%, a broad spectral response in the visible portion of the solar spectrum, and an equivalent quasi-CW solar lasing threshold of 23 W-cm−2, or approximately 230 suns. The efficient coupling of incoherent, spectrally broad sunlight in small gain volumes should allow the generation of coherent laser light from intensities of less than 100 suns. PMID:26434400
A path to practical Solar Pumped Lasers via Radiative Energy Transfer.
Reusswig, Philip D; Nechayev, Sergey; Scherer, Jennifer M; Hwang, Gyu Weon; Bawendi, Moungi G; Baldo, Marc A; Rotschild, Carmel
2015-10-05
The optical conversion of incoherent solar radiation into a bright, coherent laser beam enables the application of nonlinear optics to solar energy conversion and storage. Here, we present an architecture for solar pumped lasers that uses a luminescent solar concentrator to decouple the conventional trade-off between solar absorption efficiency and the mode volume of the optical gain material. We report a 750-μm-thick Nd(3+)-doped YAG planar waveguide sensitized by a luminescent CdSe/CdZnS (core/shell) colloidal nanocrystal, yielding a peak cascade energy transfer of 14%, a broad spectral response in the visible portion of the solar spectrum, and an equivalent quasi-CW solar lasing threshold of 23 W-cm(-2), or approximately 230 suns. The efficient coupling of incoherent, spectrally broad sunlight in small gain volumes should allow the generation of coherent laser light from intensities of less than 100 suns.
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Dobkin, Reuven; Goldberg, Michael
2013-08-01
RC64, a novel 64-core many-core signal processing chip targets DSP performance of 12.8 GIPS, 100 GOPS and 12.8 single precision GFLOS while dissipating only 3 Watts. RC64 employs advanced DSP cores, a multi-bank shared memory and a hardware scheduler, supports DDR2 memory and communicates over five proprietary 6.4 Gbps channels. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 200 MHz ASIC on Tower 130nm CMOS technology, assembled in hermetically sealed ceramic QFP package and qualified to the highest space standards.
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.
High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structuredmore » locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.« less
Scaling Support Vector Machines On Modern HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Fu, Haohuan; Song, Shuaiwen
2015-02-01
We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.-L.
2015-10-01
The schemes of cumulus parameterization are responsible for the sub-grid-scale effects of convective and/or shallow clouds, and intended to represent vertical fluxes due to unresolved updrafts and downdrafts and compensating motion outside the clouds. Some schemes additionally provide cloud and precipitation field tendencies in the convective column, and momentum tendencies due to convective transport of momentum. The schemes all provide the convective component of surface rainfall. Betts-Miller-Janjic (BMJ) is one scheme to fulfill such purposes in the weather research and forecast (WRF) model. National Centers for Environmental Prediction (NCEP) has tried to optimize the BMJ scheme for operational application. As there are no interactions among horizontal grid points, this scheme is very suitable for parallel computation. With the advantage of Intel Xeon Phi Many Integrated Core (MIC) architecture, efficient parallelization and vectorization essentials, it allows us to optimize the BMJ scheme. If compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670, the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.4x and 17.0x, respectively.
NASA Astrophysics Data System (ADS)
Erez, Mattan; Dally, William J.
Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Investigating the impact of the cielo cray XE6 architecture on scientific application codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajan, Mahesh; Barrett, Richard; Pedretti, Kevin Thomas Tauke
2010-12-01
Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, andmore » supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.« less
High-performance, scalable optical network-on-chip architectures
NASA Astrophysics Data System (ADS)
Tan, Xianfang
The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of GWOR in optical communication and BFT in non-uniform traffic communication and three-dimension (3D) implementation. 5. A cycle-accurate NoC simulator is developed to evaluate the performance of proposed HONoC architectures. It is a comprehensive platform that can simulate both electronic and optical NoCs. Different size HONoC architectures are evaluated in terms of throughput, latency and energy dissipation. Simulation results confirm that HONoC achieves good network performance with lower power consumption.
Gregarious Data Re-structuring in a Many Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
NASA Astrophysics Data System (ADS)
Pruhs, Kirk
A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Options for Parallelizing a Planning and Scheduling Algorithm
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Fu, Haohuan; Song, Shuaiwen
2014-07-18
Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...
2016-01-26
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Assembly of metals and nanoparticles into novel nanocomposite superstructures
Xu, Jiaquan; Chen, Lianyi; Choi, Hongseok; Konish, Hiromi; Li, Xiaochun
2013-01-01
Controlled assembly of nanoscale objects into superstructures is of tremendous interests. Many approaches have been developed to fabricate organic-nanoparticle superstructures. However, effective fabrication of inorganic-nanoparticle superstructures (such as nanoparticles linked by metals) remains a difficult challenge. Here we show a novel, general method to assemble metals and nanoparticles rationally into nanocomposite superstructures. Novel metal-nanoparticle superstructures are achieved by self-assembly of liquid metals and nanoparticles in immiscible liquids driven by reduction of free energy. Superstructures with various architectures, such as metal-core/nanoparticle-shell, nanocomposite-core/nanoparticle-shell, network of metal-linked core/shell nanostructures, and network of metal-linked nanoparticles, were successfully fabricated by simply tuning the volume ratio between nanoparticles and liquid metals. Our approach provides a simple, general way for fabrication of numerous metal-nanoparticle superstructures and enables a rational design of these novel superstructures with desired architectures for exciting applications.
Fushing, Hsieh; Jordà, Òscar; Beisner, Brianne; McCowan, Brenda
2015-01-01
What do the behavior of monkeys in captivity and the financial system have in common? The nodes in such social systems relate to each other through multiple and keystone networks, not just one network. Each network in the system has its own topology, and the interactions among the system’s networks change over time. In such systems, the lead into a crisis appears to be characterized by a decoupling of the networks from the keystone network. This decoupling can also be seen in the crumbling of the keystone’s power structure toward a more horizontal hierarchy. This paper develops nonparametric methods for describing the joint model of the latent architecture of interconnected networks in order to describe this process of decoupling, and hence provide an early warning system of an impending crisis. PMID:26056422
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozubal, E.; Woods, J.; Burch, J.
2011-01-01
NREL has developed the novel concept of a desiccant enhanced evaporative air conditioner (DEVap) with the objective of combining the benefits of liquid desiccant and evaporative cooling technologies into an innovative 'cooling core.' Liquid desiccant technologies have extraordinary dehumidification potential, but require an efficient cooling sink. DEVap's thermodynamic potential overcomes many shortcomings of standard refrigeration-based direct expansion cooling. DEVap decouples cooling and dehumidification performance, which results in independent temperature and humidity control. The energy input is largely switched away from electricity to low-grade thermal energy that can be sourced from fuels such as natural gas, waste heat, solar, or biofuels.
Schiffbauer, James D.; Huntley, John Warren; Fike, David A.; Jeffrey, Matthew Jarrell; Gregg, Jay M.; Shelton, Kevin L.
2017-01-01
Several positive carbon isotope excursions in Lower Paleozoic rocks, including the prominent Upper Cambrian Steptoean Positive Carbon Isotope Excursion (SPICE), are thought to reflect intermittent perturbations in the hydrosphere-biosphere system. Models explaining these secular changes are abundant, but the synchronicity and regional variation of the isotope signals are not well understood. Examination of cores across a paleodepth gradient in the Upper Cambrian central Missouri intrashelf basin (United States) reveals a time-transgressive, facies-dependent nature of the SPICE. Although the SPICE event may be a global signal, the manner in which it is recorded in rocks should and does vary as a function of facies and carbonate platform geometry. We call for a paradigm shift to better constrain facies, stratigraphic, and biostratigraphic architecture and to apply these observations to the variability in magnitude, stratigraphic extent, and timing of the SPICE signal, as well as other biogeochemical perturbations, to elucidate the complex processes driving the ocean-carbonate system. PMID:28275734
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Bisson, M.; Salvadore, F.
2014-10-01
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Reference Avionics Architecture for Lunar Surface Systems
NASA Technical Reports Server (NTRS)
Somervill, Kevin M.; Lapin, Jonathan C.; Schmidt, Oron L.
2010-01-01
Developing and delivering infrastructure capable of supporting long-term manned operations to the lunar surface has been a primary objective of the Constellation Program in the Exploration Systems Mission Directorate. Several concepts have been developed related to development and deployment lunar exploration vehicles and assets that provide critical functionality such as transportation, habitation, and communication, to name a few. Together, these systems perform complex safety-critical functions, largely dependent on avionics for control and behavior of system functions. These functions are implemented using interchangeable, modular avionics designed for lunar transit and lunar surface deployment. Systems are optimized towards reuse and commonality of form and interface and can be configured via software or component integration for special purpose applications. There are two core concepts in the reference avionics architecture described in this report. The first concept uses distributed, smart systems to manage complexity, simplify integration, and facilitate commonality. The second core concept is to employ extensive commonality between elements and subsystems. These two concepts are used in the context of developing reference designs for many lunar surface exploration vehicles and elements. These concepts are repeated constantly as architectural patterns in a conceptual architectural framework. This report describes the use of these architectural patterns in a reference avionics architecture for Lunar surface systems elements.
LUMA: A many-core, Fluid-Structure Interaction solver based on the Lattice-Boltzmann Method
NASA Astrophysics Data System (ADS)
Harwood, Adrian R. G.; O'Connor, Joseph; Sanchez Muñoz, Jonathan; Camps Santasmasas, Marta; Revell, Alistair J.
2018-01-01
The Lattice-Boltzmann Method at the University of Manchester (LUMA) project was commissioned to build a collaborative research environment in which researchers of all abilities can study fluid-structure interaction (FSI) problems in engineering applications from aerodynamics to medicine. It is built on the principles of accessibility, simplicity and flexibility. The LUMA software at the core of the project is a capable FSI solver with turbulence modelling and many-core scalability as well as a wealth of input/output and pre- and post-processing facilities. The software has been validated and several major releases benchmarked on supercomputing facilities internationally. The software architecture is modular and arranged logically using a minimal amount of object-orientation to maintain a simple and accessible software.
Portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele
2018-03-01
Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
GW Calculations of Materials on the Intel Xeon-Phi Architecture
NASA Astrophysics Data System (ADS)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek; Biller, Ariel; Chelikowsky, James R.; Louie, Steven G.
Intel Xeon-Phi processors are expected to power a large number of High-Performance Computing (HPC) systems around the United States and the world in the near future. We evaluate the ability of GW and pre-requisite Density Functional Theory (DFT) calculations for materials on utilizing the Xeon-Phi architecture. We describe the optimization process and performance improvements achieved. We find that the GW method, like other higher level Many-Body methods beyond standard local/semilocal approximations to Kohn-Sham DFT, is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-waves, band-pairs and frequencies. Support provided by the SCIDAC program, Department of Energy, Office of Science, Advanced Scientic Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-AC02-05CH11231 (LBNL).
NASA Technical Reports Server (NTRS)
Cox, D. E.; Groom, N. J.
1994-01-01
An implementation of a decoupled, single-input/single-output control approach for a large angle magnetic suspension test fixture is described. Numerical and experimental results are presented. The experimental system is a laboratory model large gap magnetic suspension system which provides five degree-of-freedom control of a cylindrical suspended element. The suspended element contains a core composed of permanent magnet material and is levitated above five electromagnets mounted in a planar array.
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR
NASA Astrophysics Data System (ADS)
Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen
2013-03-01
Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
Nucleation and Crystal Growth in the Formation of Hierarchical Three-Dimensional Nanoarchitecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Xudong
This project is to obtain fundamental understandings of the operation of the Ostwald-Lussac (OL) Law and the oriented attachment (OA) mechanism in nucleation and growth of TiO2 nanorods (NR) via surface-reaction-limited pulsed chemical vapor deposition (SPCVD) process. Three-dimensional (3D) NW networks are a unique type of mesoporous architecture that offers extraordinary surface area density and superior transport properties of electrons, photons, and phonons. It is exceptionally promising for advancing the design and application of functional materials for photovoltaic devices, catalysts beds, hydrogen storage systems, sensors, and battery electrodes. Our group has developed the SPCVD technique by mimicking the mechanism ofmore » atomic layer deposition (ALD), which effectively decoupled the crystal growth from precursor concentration while retaining anisotropic 1D growth. For the first time, this technique realized a 3D NW architecture with ultrahigh density and achieved ~4-5 times enhancement on photo-conversion efficiency. Through the support of our current DOE award, we revealed the governing role of the OL Law in the nucleation stage of SPCVD. The formation of NR morphology in SPCVD was identified following the OA mechanism. We also discovered a unique vapor-phase Kirkendall effect in the evolution of tubular or core-shell NR structures. These understandings opened many new opportunities in designing 3D NW architectures with improved properties or new functionalities. Specifically, our accomplishments from this project include five aspects: (1) Observation of the Ostwald-Lussac Law in high-temperature ALD. (2) Observation of vapor-solid Kirkendall effect in ZnO-to-TiO2 nanostructure conversion. (3) Development of highly-efficient capillary photoelectrochemical (PEC) solar-fuel generation. (4) Development of efficient and stable electrochemical protections for black silicon PEC electrodes. (5) Development of doped polymers with tunable electrical properties. This project brings a new level of transformative knowledge on nucleation and crystal growth in the SPCVD NR growth processes. Specifically, quantification of the activation energy landscape guided by the OL law will allow us to establish a critical knowledge base of nucleation kinetics for SPCVD synthesis of NR branches on different material surfaces. Studying the OA kinetics will establish a transformative knowledge base to support this new crystal growth mechanism that can be applied to many functional material systems. This research will pave the road toward a capable and versatile synthesis technology for creating 3D hierarchical mesoscale structures.« less
Virtual decoupling flight control via real-time trajectory synthesis and tracking
NASA Astrophysics Data System (ADS)
Zhang, Xuefu
The production of the General Aviation industry has declined in the past 25 years. Ironically, however, the increasing demand for air travel as a fast, safe, and high-quality mode of transportation has been far from satisfied. Addressing this demand shortfall with personal air transportation necessitates advanced systems for navigation, guidance, control, flight management, and flight traffic control. Among them, an effective decoupling flight control system will not only improve flight quality, safety, and simplicity, and increase air space usage, but also reduce expenses on pilot initial and current training, and thus expand the current market and explore new markets. Because of the formidable difficulties encountered in the actual decoupling of non-linear, time-variant, and highly coupled flight control systems through traditional approaches, a new approach, which essentially converts the decoupling problem into a real-time trajectory synthesis and tracking problem, is employed. Then, the converted problem is solved and a virtual decoupling effect is achieved. In this approach, a trajectory in inertial space can be predefined and dynamically modified based on the flight mission and the pilot's commands. A feedforward-feedback control architecture is constructed to guide the airplane along the trajectory as precisely as possible. Through this approach, the pilot has much simpler, virtually decoupled control of the airplane in terms of speed, flight path angle and horizontal radius of curvature. To verify and evaluate this approach, extensive computer simulation is performed. A great deal of test cases are designed for the flight control under different flight conditions. The simulation results show that our decoupling strategy is satisfactory and promising, and therefore the research can serve as a consolidated foundation for future practical applications.
Tiled architecture of a CNN-mostly IP system
NASA Astrophysics Data System (ADS)
Spaanenburg, Lambert; Malki, Suleyman
2009-05-01
Multi-core architectures have been popularized with the advent of the IBM CELL. On a finer grain the problems in scheduling multi-cores have already existed in the tiled architectures, such as the EPIC and Da Vinci. It is not easy to evaluate the performance of a schedule on such architecture as historical data are not available. One solution is to compile algorithms for which an optimal schedule is known by analysis. A typical example is an algorithm that is already defined in terms of many collaborating simple nodes, such as a Cellular Neural Network (CNN). A simple node with a local register stack together with a 'rotating wheel' internal communication mechanism has been proposed. Though the basic CNN allows for a tiled implementation of a tiled algorithm on a tiled structure, a practical CNN system will have to disturb this regularity by the additional need for arithmetical and logical operations. Arithmetic operations are needed for instance to accommodate for low-level image processing, while logical operations are needed to fork and merge different data streams without use of the external memory. It is found that the 'rotating wheel' internal communication mechanism still handles such mechanisms without the need for global control. Overall the CNN system provides for a practical network size as implemented on a FPGA, can be easily used as embedded IP and provides a clear benchmark for a multi-core compiler.
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
NASA Astrophysics Data System (ADS)
Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.
2016-10-01
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.
Genten: Software for Generalized Tensor Decompositions v. 1.0.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phipps, Eric T.; Kolda, Tamara G.; Dunlavy, Daniel
Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.
Theoretical constraints in the design of multivariable control systems
NASA Technical Reports Server (NTRS)
Rynaski, E. G.; Mook, D. J.
1993-01-01
The theoretical constraints inherent in the design of multivariable control systems were defined and investigated. These constraints are manifested by the system transmission zeros that limit or bound the areas in which closed loop poles and individual transfer function zeros may be placed. These constraints were investigated primarily in the context of system decoupling or non-interaction. It was proven that decoupling requires the placement of closed loop poles at the system transmission zeros. Therefore, the system transmission zeros must be minimum phase to guarantee a stable decoupled system. Once decoupling has been accomplished, the remaining part of the system exhibits transmission zeros at infinity, so nearly complete design freedom is possible in terms of placing both poles and zeros of individual closed loop transfer functions. A general, dynamic inversion model following system architecture was developed that encompasses both the implicit and explicit configuration. Robustness properties are developed along with other attributes of this type of system. Finally, a direct design is developed for the longitudinal-vertical degrees of freedom of aircraft motion to show how a direct lift flap can be used to improve the pitch-heave maneuvering coordination for enhanced flying qualities.
Multi-threaded ATLAS simulation on Intel Knights Landing processors
NASA Astrophysics Data System (ADS)
Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration
2017-10-01
The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
Hunt, Sean T; Román-Leshkov, Yuriy
2018-05-15
Conspecuts Commercial and emerging renewable energy technologies are underpinned by precious metal catalysts, which enable the transformation of reactants into useful products. However, the noble metals (NMs) comprise the least abundant elements in the lithosphere, making them prohibitively scarce and expensive for future global-scale technologies. As such, intense research efforts have been devoted to eliminating or substantially reducing the loadings of NMs in various catalytic applications. These efforts have resulted in a plethora of heterogeneous NM catalyst morphologies beyond the traditional supported spherical nanoparticle. In many of these new architectures, such as shaped, high index, and bimetallic particles, less than 20% of the loaded NMs are available to perform catalytic turnovers. The majority of NM atoms are subsurface, providing only a secondary catalytic role through geometric and ligand effects with the active surface NM atoms. A handful of architectures can approach 100% NM utilization, but severe drawbacks limit general applicability. For example, in addition to problems with stability and leaching, single atom and ultrasmall cluster catalysts have extreme metal-support interactions, discretized d-bands, and a lack of adjacent NM surface sites. While monolayer thin films do not possess these features, they exhibit such low surface areas that they are not commercially relevant, serving predominantly as model catalysts. This Account champions core-shell nanoparticles (CS NPs) as a vehicle to design highly active, stable, and low-cost materials with high NM utilization for both thermo- and electrocatalysis. The unique benefits of the many emerging NM architectures could be preserved while their fundamental limitations could be overcome through reformulation via a core-shell morphology. However, the commercial realization of CS NPs remains challenging, requiring concerted advances in theory and manufacturing. We begin by formulating seven constraints governing proper core material design, which naturally point to early transition metal ceramics as suitable core candidates. Two constraints prove extremely challenging. The first relates to the core modifying the shell work function and d-band. To properly investigate materials that could satisfy this constraint, we discuss our development of a new heat, quench, and exfoliation (HQE) density functional theory (DFT) technique to model heterometallic interfaces. This technique is used to predict how transition metal carbides can favorably tune the catalytic properties of various NM monolayer shell configurations. The second challenging constraint relates to the scalable manufacturing of CS NP architectures with independent synthetic control of the thickness and composition of the shell and the size and composition of the core. We discuss our development of a synthetic method that enables high temperature self-assembly of tunable CS NP configurations. Finally, we discuss how these principles and methods were used to design catalysts for a variety of applications. These include the design of a thermally stable sub-monolayer CS catalyst, a highly active methanol electrooxidation catalyst, CO-tolerant Pt catalysts, and a hydrogen evolution catalyst that is less expensive than state-of-the-art NM-free catalysts. Such core-shell architectures offer the promise of ultralow precious metal loadings while ceramic cores hold the promise of thermodynamic stability and access to unique catalytic activity/tunability.
SecureCore Security Architecture: Authority Mode and Emergency Management
2007-10-16
can shield first responders from social vultures (e.g., “ambulance chasers”) or malicious parties who could intentionally interfere with emergency...hierarchical design Communications Management: network communication Process Management...and Emergency Management 1 I. Introduction During many crises, first- responder access to sensitive, restricted emergency information is
A Comparison of Methods for Decoupling Tongue and Lower Lip from Jaw Movements in 3D Articulography
ERIC Educational Resources Information Center
Henriques, Rafael Neto; van Lieshout, Pascal
2013-01-01
Purpose: One popular method to study the motion of oral articulators is 3D electromagnetic articulography. For many studies, it is important to use an algorithm to decouple the motion of the tongue and the lower lip from the motion of the mandible. In this article, the authors describe and compare 4 methods for decoupling jaw motion by using 3D…
Performance of VPIC on Sequoia
NASA Astrophysics Data System (ADS)
Nystrom, William
2014-10-01
Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
ERIC Educational Resources Information Center
O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.
2003-01-01
Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
Ma, Yuanyuan; Dong, Xiaoli; Wang, Yonggang; Xia, Yongyao
2018-03-05
Hydrogen production through water splitting is considered a promising approach for solar energy harvesting. However, the variable and intermittent nature of solar energy and the co-production of H 2 and O 2 significantly reduce the flexibility of this approach, increasing the costs of its use in practical applications. Herein, using the reversible n-type doping/de-doping reaction of the solid-state polytriphenylamine-based battery electrode, we decouple the H 2 and O 2 production in acid water electrolysis. In this architecture, the H 2 and O 2 production occur at different times, which eliminates the issue of gas mixing and adapts to the variable and intermittent nature of solar energy, facilitating the conversion of solar energy to hydrogen (STH). Furthermore, for the first time, we demonstrate a membrane-free solar water splitting through commercial photovoltaics and the decoupled acid water electrolysis, which potentially paves the way for a new approach for solar water splitting. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Eisenbach, Markus
2017-01-01
A major impediment to deploying next-generation high-performance computational systems is the required electrical power, often measured in units of megawatts. The solution to this problem is driving the introduction of novel machine architectures, such as those employing many-core processors and specialized accelerators. In this article, we describe the use of a hybrid accelerated architecture to achieve both reduced time to solution and the associated reduction in the electrical cost for a state-of-the-art materials science computation.
Exploring Gigabyte Datasets in Real Time: Architectures, Interfaces and Time-Critical Design
NASA Technical Reports Server (NTRS)
Bryson, Steve; Gerald-Yamasaki, Michael (Technical Monitor)
1998-01-01
Architectures and Interfaces: The implications of real-time interaction on software architecture design: decoupling of interaction/graphics and computation into asynchronous processes. The performance requirements of graphics and computation for interaction. Time management in such an architecture. Examples of how visualization algorithms must be modified for high performance. Brief survey of interaction techniques and design, including direct manipulation and manipulation via widgets. talk discusses how human factors considerations drove the design and implementation of the virtual wind tunnel. Time-Critical Design: A survey of time-critical techniques for both computation and rendering. Emphasis on the assignment of a time budget to both the overall visualization environment and to each individual visualization technique in the environment. The estimation of the benefit and cost of an individual technique. Examples of the modification of visualization algorithms to allow time-critical control.
Solving the Software Legacy Problem with RISA
NASA Astrophysics Data System (ADS)
Ibarra, A.; Gabriel, C.
2012-09-01
Nowadays hardware and system infrastructure evolve on time scales much shorter than the typical duration of space astronomy missions. Data processing software capabilities have to evolve to preserve the scientific return during the entire experiment life time. Software preservation is a key issue that has to be tackled before the end of the project to keep the data usable over many years. We present RISA (Remote Interface to Science Analysis) as a solution to decouple data processing software and infrastructure life-cycles, using JAVA applications and web-services wrappers to existing software. This architecture employs embedded SAS in virtual machines assuring a homogeneous job execution environment. We will also present the first studies to reactivate the data processing software of the EXOSAT mission, the first ESA X-ray astronomy mission launched in 1983, using the generic RISA approach.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-13
... Architecture Proposal Review Meetings and Webinars; Notice of Public Meeting AGENCY: Research and Innovative... webinars to discuss the Vehicle to Infrastructure (V2I) Core System Requirements and Architecture Proposal... review of System Requirements Specification and Architecture Proposal. The second meeting will be a...
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Zircon ages in granulite facies rocks: decoupling from geochemistry above 850 °C?
NASA Astrophysics Data System (ADS)
Kunz, Barbara E.; Regis, Daniele; Engi, Martin
2018-03-01
Granulite facies rocks frequently show a large spread in their zircon ages, the interpretation of which raises questions: Has the isotopic system been disturbed? By what process(es) and conditions did the alteration occur? Can the dates be regarded as real ages, reflecting several growth episodes? Furthermore, under some circumstances of (ultra-)high-temperature metamorphism, decoupling of zircon U-Pb dates from their trace element geochemistry has been reported. Understanding these processes is crucial to help interpret such dates in the context of the P-T history. Our study presents evidence for decoupling in zircon from the highest grade metapelites (> 850 °C) taken along a continuous high-temperature metamorphic field gradient in the Ivrea Zone (NW Italy). These rocks represent a well-characterised segment of Permian lower continental crust with a protracted high-temperature history. Cathodoluminescence images reveal that zircons in the mid-amphibolite facies preserve mainly detrital cores with narrow overgrowths. In the upper amphibolite and granulite facies, preserved detrital cores decrease and metamorphic zircon increases in quantity. Across all samples we document a sequence of four rim generations based on textures. U-Pb dates, Th/U ratios and Ti-in-zircon concentrations show an essentially continuous evolution with increasing metamorphic grade, except in the samples from the granulite facies, which display significant scatter in age and chemistry. We associate the observed decoupling of zircon systematics in high-grade non-metamict zircon with disturbance processes related to differences in behaviour of non-formula elements (i.e. Pb, Th, U, Ti) at high-temperature conditions, notably differences in compatibility within the crystal structure.
Roofline model toolkit: A practical tool for architectural and program analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Meta-awareness, perceptual decoupling and the wandering mind.
Schooler, Jonathan W; Smallwood, Jonathan; Christoff, Kalina; Handy, Todd C; Reichle, Erik D; Sayette, Michael A
2011-07-01
Mind wandering (i.e. engaging in cognitions unrelated to the current demands of the external environment) reflects the cyclic activity of two core processes: the capacity to disengage attention from perception (known as perceptual decoupling) and the ability to take explicit note of the current contents of consciousness (known as meta-awareness). Research on perceptual decoupling demonstrates that mental events that arise without any external precedent (known as stimulus independent thoughts) often interfere with the online processing of sensory information. Findings regarding meta-awareness reveal that the mind is only intermittently aware of engaging in mind wandering. These basic aspects of mind wandering are considered with respect to the activity of the default network, the role of executive processes, the contributions of meta-awareness and the functionality of mind wandering. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ion Thermal Decoupling and Species Separation in Shock-Driven Implosions
Rinderknecht, Hans G.; Rosenberg, M. J.; Li, C. K.; ...
2015-01-14
Here, anomalous reduction of the fusion yields by 50% and anomalous scaling of the burn-averaged ion temperatures with the ion-species fraction has been observed for the first time in D 3He-filled shock-driven inertial confinement fusion implosions. Two ion kinetic mechanisms are used to explain the anomalous observations: thermal decoupling of the D and 3He populations and diffusive species separation. The observed insensitivity of ion temperature to a varying deuterium fraction is shown to be a signature of ion thermal decoupling in shock-heated plasmas. The burn-averaged deuterium fraction calculated from the experimental data demonstrates a reduction in the average core deuteriummore » density, as predicted by simulations that use a diffusion model. Accounting for each of these effects in simulations reproduces the observed yield trends.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cobb, Corie L.; Solberg, Scott E.
3-dimensional (3D) electrode architectures have been explored as a means to decouple power and energy trade-offs in thick battery electrodes. Limited work has been published which systematically examines the impact of these architectures at the pouch cell level. This paper conducts an analysis on the potential capacity gains that can be realized with thick co-extruded electrodes in a pouch cell. Moreover, our findings show that despite lower active material composition for each cathode layer, the effective gain in thickness and active material loading enables pouch cell capacity gains greater than 10% with a Lithium Nickel Manganese Cobalt Oxide (NMC) materialsmore » system.« less
The deployment of routing protocols in distributed control plane of SDN.
Jingjing, Zhou; Di, Cheng; Weiming, Wang; Rong, Jin; Xiaochun, Wu
2014-01-01
Software defined network (SDN) provides a programmable network through decoupling the data plane, control plane, and application plane from the original closed system, thus revolutionizing the existing network architecture to improve the performance and scalability. In this paper, we learned about the distributed characteristics of Kandoo architecture and, meanwhile, improved and optimized Kandoo's two levels of controllers based on ideological inspiration of RCP (routing control platform). Finally, we analyzed the deployment strategies of BGP and OSPF protocol in a distributed control plane of SDN. The simulation results show that our deployment strategies are superior to the traditional routing strategies.
Extreme-Scale Stochastic Particle Tracing for Uncertain Unsteady Flow Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Hanqi; He, Wenbin; Seo, Sangmin
2016-11-13
We present an efficient and scalable solution to estimate uncertain transport behaviors using stochastic flow maps (SFM,) for visualizing and analyzing uncertain unsteady flows. SFM computation is extremely expensive because it requires many Monte Carlo runs to trace densely seeded particles in the flow. We alleviate the computational cost by decoupling the time dependencies in SFMs so that we can process adjacent time steps independently and then compose them together for longer time periods. Adaptive refinement is also used to reduce the number of runs for each location. We then parallelize over tasks—packets of particles in our design—to achieve highmore » efficiency in MPI/thread hybrid programming. Such a task model also enables CPU/GPU coprocessing. We show the scalability on two supercomputers, Mira (up to 1M Blue Gene/Q cores) and Titan (up to 128K Opteron cores and 8K GPUs), that can trace billions of particles in seconds.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Armstrong,N.; Jasti, J.; Beich-Frandsen, M.
2006-01-01
The canonical conformational states occupied by most ligand-gated ion channels, and many cell-surface receptors, are the resting, activated, and desensitized states. While the resting and activated states of multiple receptors are well characterized, elaboration of the structural properties of the desensitized state, a state that is by definition inactive, has proven difficult. Here we use electrical, chemical, and crystallographic experiments on the AMPA-sensitive GluR2 receptor, defining the conformational rearrangements of the agonist binding cores that occur upon desensitization of this ligand-gated ion channel. These studies demonstrate that desensitization involves the rupture of an extensive interface between domain 1 of 2-foldmore » related glutamate-binding core subunits, compensating for the ca. 21{sup o} of domain closure induced by glutamate binding. The rupture of the domain 1 interface allows the ion channel to close and thereby provides a simple explanation to the long-standing question of how agonist binding is decoupled from ion channel gating upon receptor desensitization.« less
Information Management for Unmanned Systems: Combining DL-Reasoning with Publish/Subscribe
NASA Astrophysics Data System (ADS)
Moser, Herwig; Reichelt, Toni; Oswald, Norbert; Förster, Stefan
Sharing capabilities and information between collaborating entities by using modem information- and communication-technology is a core principle in complex distributed civil or military mission scenarios. Previous work proved the suitability of Service-oriented Architectures for modelling and sharing the participating entities' capabilities. Albeit providing a satisfactory model for capabilities sharing, pure service-orientation curtails expressiveness for information exchange as opposed to dedicated data-centric communication principles. In this paper we introduce an Information Management System which combines OWL-Ontologies and automated reasoning with Publish/Subscribe-Systems, providing for a shared but decoupled data model. While confirming existing related research results, we emphasise the novel application and lack of practical experience of using Semantic Web technologies in areas other than originally intended. That is, aiding decision support and software design in the context of a mission scenario for an unmanned system. Experiments within a complex simulation environment show the immediate benefits of a semantic information-management and -dissemination platform: Clear separation of concerns in code and data model, increased service re-usability and extensibility as well as regulation of data flow and respective system behaviour through declarative rules.
Catching the electron in action in real space inside a Ge-Si core-shell nanowire transistor.
Jaishi, Meghnath; Pati, Ranjit
2017-09-21
Catching the electron in action in real space inside a semiconductor Ge-Si core-shell nanowire field effect transistor (FET), which has been demonstrated (J. Xiang, W. Lu, Y. Hu, Y. Wu, H. Yan and C. M. Lieber, Nature, 2006, 441, 489) to outperform the state-of-the-art metal oxide semiconductor FET, is central to gaining unfathomable access into the origin of its functionality. Here, using a quantum transport approach that does not make any assumptions on electronic structure, charge, and potential profile of the device, we unravel the most probable tunneling pathway for electrons in a Ge-Si core-shell nanowire FET with orbital level spatial resolution, which demonstrates gate bias induced decoupling of electron transport between the core and the shell region. Our calculation yields excellent transistor characteristics as noticed in the experiment. Upon increasing the gate bias beyond a threshold value, we observe a rapid drop in drain current resulting in a gate bias driven negative differential resistance behavior and switching in the sign of trans-conductance. We attribute this anomalous behavior in drain current to the gate bias induced modification of the carrier transport pathway from the Ge core to the Si shell region of the nanowire channel. A new experiment involving a four probe junction is proposed to confirm our prediction on gate bias induced decoupling.
NASA Astrophysics Data System (ADS)
Cederman, Daniel; Hellstrom, Daniel
2016-08-01
The VxWorks operating system together with the Cobham Grislier LEON architectural port provides an efficient platform for the development of software for space applications. It supports both uni-and multiprocessor mode (SMP or AMP) and comes with an integrated development environment with several debugging and analysis tools. The LEON architectural port from Cobham Grislier supports LEON2/3/4 systems and includes drivers for all standard on-chip peripherals, as well as support for RASTA boards. In this paper we will highlight some the many features of VxWorks and the LEON architectural port. The latest version of the architectural port now supports VxWorks 6.9 (the previous version was for VxWorks 6.7) and has the support for the GR740, the commercially available quad-core LEON system, designed as the European Space Agency's Next Generation Microprocessor (NGMP).
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.
Study on establishment of Body of Knowledge of Taiwan's Traditional Wooden Structure Technology
NASA Astrophysics Data System (ADS)
Huang, M. T.; Chiou, S. C.; Hsu, T. W.; Su, P. C.
2015-08-01
The timber technology of the Taiwan traditional architecture is brought by the immigrants in the Southern Fujian of China in the early, which has been inherited for a hundred years. In the past, these traditional timber technologies were taught by mentoring, however, due to the change of the social form, the construction of the traditional architecture was faded away, and what is gradually replaced is the repair work of the traditional architecture, therefore, the construction method of the timber technology, use form of the tool and other factors are very different from previous one, and the core technology is faced with the dilemma of endangered loss. There are many relevant studies on architectural style, construction method of technology, schools of craftsman, technical capacity of craftsman and other timber technologies, or the technology preservation is carried out by dictating the historical record, studying the skills and other ways, but for the timber craftsman repairing the traditional architecture on the front line, there is still space for discussing whether to maintain the original construction method and maintain the due repair quality for the core technology. This paper classified the timber technology knowledge with the document analysis method and expert interview method, carried out the architecture analysis of knowledge hierarchy, and finally, built the preliminary framework of the timber technology knowledge system of the Taiwan traditional architecture, and built the standard formulation available for craftsman training and skills identification by virtue of the knowledge system, so that the craftsman did not affect the technical capacity due to the change of the knowledge instruction system, thus, affecting the repair quality of the traditional architecture; and in addition, the building of the database system can also be derived by means of the knowledge structure, so as to integrate the consistency of the contents of core technical capacity. It can be used as the interpretation data; the knowledge is standardized and the authority file is established, which is regarded as a technical specification, so that the technology is standardized, thus, avoid loss or distort.
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.
2014-10-01
The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
NASA Technical Reports Server (NTRS)
Seasly, Elaine
2015-01-01
To combat contamination of physical assets and provide reliable data to decision makers in the space and missile defense community, a modular open system architecture for creation of contamination models and standards is proposed. Predictive tools for quantifying the effects of contamination can be calibrated from NASA data of long-term orbiting assets. This data can then be extrapolated to missile defense predictive models. By utilizing a modular open system architecture, sensitive data can be de-coupled and protected while benefitting from open source data of calibrated models. This system architecture will include modules that will allow the designer to trade the effects of baseline performance against the lifecycle degradation due to contamination while modeling the lifecycle costs of alternative designs. In this way, each member of the supply chain becomes an informed and active participant in managing contamination risk early in the system lifecycle.
A research on the application of software defined networking in satellite network architecture
NASA Astrophysics Data System (ADS)
Song, Huan; Chen, Jinqiang; Cao, Suzhi; Cui, Dandan; Li, Tong; Su, Yuxing
2017-10-01
Software defined network is a new type of network architecture, which decouples control plane and data plane of traditional network, has the feature of flexible configurations and is a direction of the next generation terrestrial Internet development. Satellite network is an important part of the space-ground integrated information network, while the traditional satellite network has the disadvantages of difficult network topology maintenance and slow configuration. The application of SDN technology in satellite network can solve these problems that traditional satellite network faces. At present, the research on the application of SDN technology in satellite network is still in the stage of preliminary study. In this paper, we start with introducing the SDN technology and satellite network architecture. Then we mainly introduce software defined satellite network architecture, as well as the comparison of different software defined satellite network architecture and satellite network virtualization. Finally, the present research status and development trend of SDN technology in satellite network are analyzed.
Resolution enhancement in 13C and 15N magic-angle turning experiments with TPPM decoupling.
McGeorge, G; Alderman, D W; Grant, D M
1999-03-01
Many solid-state spectra have been shown to have problems related to the poor proton decoupling of carbon nuclei in methylene groups under conditions of slow magic-angle turning. Two-pulse phase-modulation (TPPM) decoupling during the 2D PHORMAT chemical shift separation experiment is shown to be more effective in comparison to that obtainable at much higher spin rates using conventional CW decoupling. TPPM decoupling can also alleviate similar inadequacies when observing the 15N nucleus, particularly with NH2 groups. This is demonstrated in the 15N resonances of fully labeled l-arginine hydrochloride, where a line narrowing of about a factor of two was observed at moderate rotation rates. This significant advantage was also obtained at turning frequencies as low as 500 Hz. Copyright 1999 Academic Press.
Toward performance portability of the Albany finite element analysis code using the Kokkos library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...
2018-02-05
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
Large Scale GW Calculations on the Cori System
NASA Astrophysics Data System (ADS)
Deslippe, Jack; Del Ben, Mauro; da Jornada, Felipe; Canning, Andrew; Louie, Steven
The NERSC Cori system, powered by 9000+ Intel Xeon-Phi processors, represents one of the largest HPC systems for open-science in the United States and the world. We discuss the optimization of the GW methodology for this system, including both node level and system-scale optimizations. We highlight multiple large scale (thousands of atoms) case studies and discuss both absolute application performance and comparison to calculations on more traditional HPC architectures. We find that the GW method is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism across many layers of the system. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program.
A high-order Lagrangian-decoupling method for the incompressible Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Ho, Lee-Wing; Maday, Yvon; Patera, Anthony T.; Ronquist, Einar M.
1989-01-01
A high-order Lagrangian-decoupling method is presented for the unsteady convection-diffusion and incompressible Navier-Stokes equations. The method is based upon: (1) Lagrangian variational forms that reduce the convection-diffusion equation to a symmetric initial value problem; (2) implicit high-order backward-differentiation finite-difference schemes for integration along characteristics; (3) finite element or spectral element spatial discretizations; and (4) mesh-invariance procedures and high-order explicit time-stepping schemes for deducing function values at convected space-time points. The method improves upon previous finite element characteristic methods through the systematic and efficient extension to high order accuracy, and the introduction of a simple structure-preserving characteristic-foot calculation procedure which is readily implemented on modern architectures. The new method is significantly more efficient than explicit-convection schemes for the Navier-Stokes equations due to the decoupling of the convection and Stokes operators and the attendant increase in temporal stability. Numerous numerical examples are given for the convection-diffusion and Navier-Stokes equations for the particular case of a spectral element spatial discretization.
Thin-film decoupling capacitors for multi-chip modules
NASA Astrophysics Data System (ADS)
Dimos, D.; Lockwood, S. J.; Schwartz, R. W.; Rogers, M. S.
Thin-film decoupling capacitors based on ferroelectric lead lanthanum zirconate titanate (PLZT) films are being developed for use in advanced packages, such as multi-chip modules. These thin-film decoupling capacitors are intended to replace multi-layer ceramic capacitors for certain applications, since they can be more fully integrated into the packaging architecture. The increased integration that can be achieved should lead to decreased package volume and improved high-speed performance, due to a decrease in interconnect inductance. PLZT films are fabricated by spin coating using metal carboxylate/alkoxide solutions. These films exhibit very high dielectric constants ((var epsilon) greater than or equal to 900), low dielectric losses (tan(delta) = 0.01), excellent insulation resistances (rho greater than 10(exp 13) (Omega)-cm at 125 C), and good breakdown field strengths (E(sub B) = 900 kV/cm). For integrated circuit applications, the PLZT dielectric is less than 1 micron thick, which results in a large capacitance/area (8-9 nF/sq mm). The thin-film geometry and processing conditions also make these capacitors suitable for direct incorporation onto integrated circuits and for packages that require embedded components.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Early Dynamics of the Moon's Core
NASA Astrophysics Data System (ADS)
Cuk, Matija; Hamilton, Douglas; Stewart, Sarah T.
2018-04-01
The Moon has a small molten iron core (Williams et al. 2006). Remanent magnetization in lunar rocks likely derives from a past lunar dynamo (Wieczorek 2018 and references therein), which may have been powered by differential precession between the mantle and the core. The rotations of the lunar mantle and core were largely decoupled for much of lunar history, with a large mutual offset during the Cassini State Transition (Meyer and Wisdom, 2011). It is likely that the past work underestimated lunar obliquities, and therefore core offsets, during early lunar history (Cuk et al. 2016). Here we investigate the dynamics of the lunar core and mantle using a Lie-Poisson numerical integrator (Touma and Wisdom 2001) which includes interactions between triaxial core and mantle, as well as all gravitational and tidal effects included in the model of Cuk et al. (2016). Since we assume a rigid triaxial mantle, this model is applicable to the Moon only once it has acquired its current shape, which probably happened before the Moon reached 25 Earth radii. While some details of the core dynamics depend on our assumptions about the shape of the lunar core-mantle boundary, we can report some robust preliminary findings. The presence of the core does not change significantly the evolutionary scenario of Cuk et al. (2016). The core and mantle are indeed decoupled, with the core having a much smaller obliquity to the ecliptic than the mantle for almost all of the lunar history. The core was largely in an equivalent of Cassini State 2, with the vernal equinoxes (wrt the ecliptic) of the core and the mantle being anti-aligned. The core-mantle spin axis offset has been very large during the Moon's first billion years (this is true both in canonical and high-inclination tidal evolution), causing the lunar core to be sub-synchronous. If the ancient lunar magnetic dipole was rotating around the core axis that was inclined to the Moon's spin axis, then the magnetic poles would move across the lunar surface as the mantle rotates independently. This relative motion would dilute the average dipole field over much of the lunar surface, and would would restrict meaningful average fields to low lunar latitudes.
NASA Astrophysics Data System (ADS)
Buterakos, Donovan; Throckmorton, Robert E.; Das Sarma, S.
2018-01-01
In addition to magnetic field and electric charge noise adversely affecting spin-qubit operations, performing single-qubit gates on one of multiple coupled singlet-triplet qubits presents a new challenge: crosstalk, which is inevitable (and must be minimized) in any multiqubit quantum computing architecture. We develop a set of dynamically corrected pulse sequences that are designed to cancel the effects of both types of noise (i.e., field and charge) as well as crosstalk to leading order, and provide parameters for these corrected sequences for all 24 of the single-qubit Clifford gates. We then provide an estimate of the error as a function of the noise and capacitive coupling to compare the fidelity of our corrected gates to their uncorrected versions. Dynamical error correction protocols presented in this work are important for the next generation of singlet-triplet qubit devices where coupling among many qubits will become relevant.
Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Domino, Stefan P.; Ananthan, Shreyas; Knaus, Robert C.
The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability onmore » many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.« less
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen
2015-10-01
The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures
Manolakos, Elias S.
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.
Sharma, Anuj; Manolakos, Elias S
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
The Deployment of Routing Protocols in Distributed Control Plane of SDN
Jingjing, Zhou; Di, Cheng; Weiming, Wang; Rong, Jin; Xiaochun, Wu
2014-01-01
Software defined network (SDN) provides a programmable network through decoupling the data plane, control plane, and application plane from the original closed system, thus revolutionizing the existing network architecture to improve the performance and scalability. In this paper, we learned about the distributed characteristics of Kandoo architecture and, meanwhile, improved and optimized Kandoo's two levels of controllers based on ideological inspiration of RCP (routing control platform). Finally, we analyzed the deployment strategies of BGP and OSPF protocol in a distributed control plane of SDN. The simulation results show that our deployment strategies are superior to the traditional routing strategies. PMID:25250395
Drilling and Caching Architecture for the Mars2020 Mission
NASA Astrophysics Data System (ADS)
Zacny, K.
2013-12-01
We present a Sample Acquisition and Caching (SAC) architecture for the Mars2020 mission and detail how the architecture meets the sampling requirements described in the Mars2020 Science Definition Team (SDT) report. The architecture uses 'One Bit per Core' approach. Having dedicated bit for each rock core allows a reduction in the number of core transfer steps and actuators and this reduces overall mission risk. It also alleviates the bit life problem, eliminates cross contamination, and aids in hermetic sealing. An added advantage is faster drilling time, lower power, lower energy, and lower Weight on Bit (which reduces Arm preload requirements). To enable replacing of core samples, the drill bits are based on the BigTooth bit design. The BigTooth bit cuts a core diameter slightly smaller than the imaginary hole inscribed by the inner surfaces of the bits. Hence the rock core could be much easier ejected along the gravity vector. The architecture also has three additional types of bits that allow analysis of rocks. Rock Abrasion and Brushing Bit (RABBit) allows brushing and grinding of rocks in the same was as Rock Abrasion Tool does on MER. PreView bit allows viewing and analysis of rock core surfaces. Powder and Regolith Acquisition Bit (PRABit) captures regolith and rock powder either for in situ analysis or sample return. PRABit also allows sieving capabilities. The architecture can be viewed here: http://www.youtube.com/watch?v=_-hOO4-zDtE
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.
Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros
2014-06-25
The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions
2014-01-01
Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
Sandwich-Architectured Poly(lactic acid)-Graphene Composite Food Packaging Films.
Goh, Kunli; Heising, Jenneke K; Yuan, Yang; Karahan, Huseyin E; Wei, Li; Zhai, Shengli; Koh, Jia-Xuan; Htin, Nanda M; Zhang, Feimo; Wang, Rong; Fane, Anthony G; Dekker, Matthijs; Dehghani, Fariba; Chen, Yuan
2016-04-20
Biodegradable food packaging promises a more sustainable future. Among the many different biopolymers used, poly(lactic acid) (PLA) possesses the good mechanical property and cost-effectiveness necessary of a biodegradable food packaging. However, PLA food packaging suffers from poor water vapor and oxygen barrier properties compared to many petroleum-derived ones. A key challenge is, therefore, to simultaneously enhance both the water vapor and oxygen barrier properties of the PLA food packaging. To address this issue, we design a sandwich-architectured PLA-graphene composite film, which utilizes an impermeable reduced graphene oxide (rGO) as the core barrier and commercial PLA films as the outer protective encapsulation. The synergy between the barrier and the protective encapsulation results in a significant 87.6% reduction in the water vapor permeability. At the same time, the oxygen permeability is reduced by two orders of magnitude when evaluated under both dry and humid conditions. The excellent barrier properties can be attributed to the compact lamellar microstructure and the hydrophobicity of the rGO core barrier. Mechanistic analysis shows that the large rGO lateral dimension and the small interlayer spacing between the rGO sheets have created an extensive and tortuous diffusion pathway, which is up to 1450-times the thickness of the rGO barrier. In addition, the sandwiched architecture has imbued the PLA-rGO composite film with good processability, which increases the manageability of the film and its competency to be tailored. Simulations using the PLA-rGO composite food packaging film for edible oil and potato chips also exhibit at least eight-fold extension in the shelf life of these oxygen and moisture sensitive food products. Overall, these qualities have demonstrated the high potential of a sandwich-architectured PLA-graphene composite film for food packaging applications.
Souris, Kevin; Lee, John Aldo; Sterpin, Edmond
2016-04-01
Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Merolla, Paul A; Arthur, John V; Alvarez-Icaza, Rodrigo; Cassidy, Andrew S; Sawada, Jun; Akopyan, Filipp; Jackson, Bryan L; Imam, Nabil; Guo, Chen; Nakamura, Yutaka; Brezzo, Bernard; Vo, Ivan; Esser, Steven K; Appuswamy, Rathinakumar; Taba, Brian; Amir, Arnon; Flickner, Myron D; Risk, William P; Manohar, Rajit; Modha, Dharmendra S
2014-08-08
Inspired by the brain's structure, we have developed an efficient, scalable, and flexible non-von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts. Copyright © 2014, American Association for the Advancement of Science.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
Modular multiplication in GF(p) for public-key cryptography
NASA Astrophysics Data System (ADS)
Olszyna, Jakub
Modular multiplication forms the basis of modular exponentiation which is the core operation of the RSA cryptosystem. It is also present in many other cryptographic algorithms including those based on ECC and HECC. Hence, an efficient implementation of PKC relies on efficient implementation of modular multiplication. The paper presents a survey of most common algorithms for modular multiplication along with hardware architectures especially suitable for cryptographic applications in energy constrained environments. The motivation for studying low-power and areaefficient modular multiplication algorithms comes from enabling public-key security for ultra-low power devices that can perform under constrained environments like wireless sensor networks. Serial architectures for GF(p) are analyzed and presented. Finally proposed architectures are verified and compared according to the amount of power dissipated throughout the operation.
EASEE: an open architecture approach for modeling battlespace signal and sensor phenomenology
NASA Astrophysics Data System (ADS)
Waldrop, Lauren E.; Wilson, D. Keith; Ekegren, Michael T.; Borden, Christian T.
2017-04-01
Open architecture in the context of defense applications encourages collaboration across government agencies and academia. This paper describes a success story in the implementation of an open architecture framework that fosters transparency and modularity in the context of Environmental Awareness for Sensor and Emitter Employment (EASEE), a complex physics-based software package for modeling the effects of terrain and atmospheric conditions on signal propagation and sensor performance. Among the highlighted features in this paper are: (1) a code refactorization to separate sensitive parts of EASEE, thus allowing collaborators the opportunity to view and interact with non-sensitive parts of the EASEE framework with the end goal of supporting collaborative innovation, (2) a data exchange and validation effort to enable the dynamic addition of signatures within EASEE thus supporting a modular notion that components can be easily added or removed to the software without requiring recompilation by developers, and (3) a flexible and extensible XML interface, which aids in decoupling graphical user interfaces from EASEE's calculation engine, and thus encourages adaptability to many different defense applications. In addition to the outlined points above, this paper also addresses EASEE's ability to interface with both proprietary systems such as ArcGIS. A specific use case regarding the implementation of an ArcGIS toolbar that leverages EASEE's XML interface and enables users to set up an EASEE-compliant configuration for probability of detection or optimal sensor placement calculations in various modalities is discussed as well.
Manipulating the architecture of bimetallic nanostructures and their plasmonic properties
NASA Astrophysics Data System (ADS)
DeSantis, Christopher John
There has been much interest in colloidal noble metal nanoparticles due to their fascinating plasmonic and catalytic properties. These properties make noble metal nanoparticles potentially useful for applications such as targeted drug delivery agents and hydrogen storage devices. Historically, shape-controlled noble metal nanoparticles have been predominantly monometallic. Recent synthetic advances provide access to bimetallic noble metal nanoparticles wherein their inherent multifunctionality and ability to fine tune or expand their surface chemistry and light scattering properties of metal nanoparticles make them popular candidates for many applications. Even so, there are currently few synthetic strategies to rationally design shape-controlled bimetallic nanocrystals; for this reason, few architectures are accessible. For example, the "seed-mediated method" is a popular means of achieving monodisperse shape-controlled bimetallic nanocrystals. In this process, small metal seeds are used as platforms for additional metal addition, allowing for conformal core shell nanostructures. However, this method has only been applied to single metal core/single metal shell structures; therefore, the surface compositions and architectures achievable are limited. This thesis expands upon the seed-mediated method by coupling it with co-reduction. In short, two metal precursors are simultaneously reduced to deposit metal onto pre-formed seeds in hopes that the interplay between two metal species facilitates bimetallic shell nanocrystals. Au/Pd was used as a test system due to favorable reduction potentials of metal precursors and good lattice match between Au and Pd. Alloyed shelled Au Au/Pd nanocrystals were achieved using this "seed-mediated co-reduction" approach. Symmetric eight-branched Au/Pd nanocrystals (octopods) are also prepared using this method. This thesis investigates many synthetic parameters that determine the shape outcome in Au/Pd nanocrystals during seed-mediated co-reduction. Plasmonic, catalytic, and assembly properties are also investigated in relation to nanocrystal shape and architecture. This work provides a foundation for the rational design of architecturally defined bimetallic nanostructures.
2012-10-01
REPORT 3. DATES COVERED (From - To) MAR 2010 – APR 2012 4 . TITLE AND SUBTITLE IMPLICATIONS OF MULT-CORE ARCHITECTURES ON THE DEVELOPMENT OF...Framework for Multicore Information Flow Analysis ...................................... 23 4 4.1 A Hypothetical Reference Architecture... 4 Figure 2: Pentium II Block Diagram
Stretching of Hot Lithosphe: A Significant Mode of Crustal Stretching in Southeast Asia
NASA Astrophysics Data System (ADS)
de Montserrat Navarro, A.; Morgan, J. P.; Hall, R.; White, L. T.
2017-12-01
SE Asia roughly covers roughly 15% of the Earth's surface and represents one of the most tectonically active regions in the world, yet its tectonic evolution remains relatively poorly studied and constrained in comparison with other regions. Recent episodes of extension have been associated with sedimentary basin growth and phases of crustal melting, uplift and extremely rapid exhumation of young (<7Ma) metamorphic core complexes. This is recorded by seismic imagery of basins offshore Sulawesi and New Guinea as well as through new field studies of the onshore geology in these regions. A growing body of new geochronological and biostratigraphic data provides some control on the rates of processes. We use two-dimensional numerical models to investigate the evolution of the distinctive extensional basins in SE Asia. Our models suggest that, at the onset of stretching, the lithosphere was considerably hotter than in more typically studied rift settings (e.g. Atlantic opening, East African Rift, Australia-Antarctica opening). High Moho temperatures are key in shaping the architecture of the stretched lithosphere: A) hot and week lower crust fails to transmit the stress and brittle deformation, thus resulting in a strong decoupling between crust and lithospheric mantle; B) the mode of deformation is dominated by the ductile flow and boudinage of lower crust, yielding the exhumation of one-to-several partially molten lower crustal bodies, including metamorphic core complexes; C) continental break-up is often inhibited by the ductile behaviour of the crust, and it is only achieved after considerable cooling of the lithosphere. To better constrain the extension rates in which these basins formed, we compare P-T and cooling paths of lower crustal material in a suite of models with newly available data from the Palu and Malino metamorphic core complexes in Sulawesi, Indonesia.
The parallel algorithm for the 2D discrete wavelet transform
NASA Astrophysics Data System (ADS)
Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel
2018-04-01
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.
2015-10-01
The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land's state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.
Influence of magnetic materials on the transport properties of superconducting composite conductors
NASA Astrophysics Data System (ADS)
Glowacki, B. A.; Majoros, M.; Campbell, A. M.; Hopkins, S. C.; Rutter, N. A.; Kozlowski, G.; Peterson, T. L.
2009-03-01
Magnetic materials can help to improve the performance of practical superconductors on the macro/microscale as magnetic diverters and also on the nanoscale as effective pinning centres. It has been established by numerical modelling that magnetic shielding of the filaments reduces ac losses in self-field conditions due to decoupling of the filaments and, at the same time, it increases the critical current of the composite. This effect is especially beneficial for coated conductors, in which the anisotropic properties of the superconductor are amplified by the conductor architecture. However, ferromagnetic coatings are often chemically incompatible with YBa2Cu3O7 and (Pb,Bi)2Sr2Ca2Cu3O9 conductors, and buffer layers have to be used. In contrast, in MgB2 conductors an iron matrix may remain in direct contact with the superconducting core. The application of superconducting-magnetic heterostructures requires consideration of the thermal and electromagnetic stability of the superconducting materials used. On the one hand, magnetic components reduce the critical current gradient across the individual filaments but, on the other hand, they often reduce the thermal conductivity between the superconducting core and the cryogen, which may cause the destruction of the conductor in the event of thermal instability. A possible nanoscale method of improving the critical current density of superconducting conductors is the introduction of sub-micron magnetic pinning centres. However, the volumetric density and chemical compatibility of magnetic inclusions has to be controlled to avoid suppression of the superconducting properties.
Asif, Rameez
2016-01-01
Space division multiplexing (SDM), incorporating multi-core fibers (MCFs), has been demonstrated for effectively maximizing the data capacity in an impending capacity crunch. To achieve high spectral-density through multi-carrier encoding while simultaneously maintaining transmission reach, benefits from inter-core crosstalk (XT) and non-linear compensation must be utilized. In this report, we propose a proof-of-concept unified receiver architecture that jointly compensates optical Kerr effects, intra- and inter-core XT in MCFs. The architecture is analysed in multi-channel 512 Gbit/s dual-carrier DP-16QAM system over 800 km 19-core MCF to validate the digital compensation of inter-core XT. Through this architecture: (a) we efficiently compensates the inter-core XT improving Q-factor by 4.82 dB and (b) achieve a momentous gain in transmission reach, increasing the maximum achievable distance from 480 km to 1208 km, via analytical analysis. Simulation results confirm that inter-core XT distortions are more relentless for cores fabricated around the central axis of cladding. Predominantly, XT induced Q-penalty can be suppressed to be less than 1 dB up-to −11.56 dB of inter-core XT over 800 km MCF, offering flexibility to fabricate dense core structures with same cladding diameter. Moreover, this report outlines the relationship between core pitch and forward-error correction (FEC). PMID:27270381
Real-space decoupling transformation for quantum many-body systems.
Evenbly, G; Vidal, G
2014-06-06
We propose a real-space renormalization group method to explicitly decouple into independent components a many-body system that, as in the phenomenon of spin-charge separation, exhibits separation of degrees of freedom at low energies. Our approach produces a branching holographic description of such systems that opens the path to the efficient simulation of the most entangled phases of quantum matter, such as those whose ground state violates a boundary law for entanglement entropy. As in the coarse-graining transformation of Vidal [Phys. Rev. Lett. 99, 220405 (2007).
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors
NASA Astrophysics Data System (ADS)
Sourouri, Mohammed; Birger Raknes, Espen
2017-04-01
In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
NASA Astrophysics Data System (ADS)
Lidar, Daniel A.; Brun, Todd A.
2013-09-01
Prologue; Preface; Part I. Background: 1. Introduction to decoherence and noise in open quantum systems Daniel Lidar and Todd Brun; 2. Introduction to quantum error correction Dave Bacon; 3. Introduction to decoherence-free subspaces and noiseless subsystems Daniel Lidar; 4. Introduction to quantum dynamical decoupling Lorenza Viola; 5. Introduction to quantum fault tolerance Panos Aliferis; Part II. Generalized Approaches to Quantum Error Correction: 6. Operator quantum error correction David Kribs and David Poulin; 7. Entanglement-assisted quantum error-correcting codes Todd Brun and Min-Hsiu Hsieh; 8. Continuous-time quantum error correction Ognyan Oreshkov; Part III. Advanced Quantum Codes: 9. Quantum convolutional codes Mark Wilde; 10. Non-additive quantum codes Markus Grassl and Martin Rötteler; 11. Iterative quantum coding systems David Poulin; 12. Algebraic quantum coding theory Andreas Klappenecker; 13. Optimization-based quantum error correction Andrew Fletcher; Part IV. Advanced Dynamical Decoupling: 14. High order dynamical decoupling Zhen-Yu Wang and Ren-Bao Liu; 15. Combinatorial approaches to dynamical decoupling Martin Rötteler and Pawel Wocjan; Part V. Alternative Quantum Computation Approaches: 16. Holonomic quantum computation Paolo Zanardi; 17. Fault tolerance for holonomic quantum computation Ognyan Oreshkov, Todd Brun and Daniel Lidar; 18. Fault tolerant measurement-based quantum computing Debbie Leung; Part VI. Topological Methods: 19. Topological codes Héctor Bombín; 20. Fault tolerant topological cluster state quantum computing Austin Fowler and Kovid Goyal; Part VII. Applications and Implementations: 21. Experimental quantum error correction Dave Bacon; 22. Experimental dynamical decoupling Lorenza Viola; 23. Architectures Jacob Taylor; 24. Error correction in quantum communication Mark Wilde; Part VIII. Critical Evaluation of Fault Tolerance: 25. Hamiltonian methods in QEC and fault tolerance Eduardo Novais, Eduardo Mucciolo and Harold Baranger; 26. Critique of fault-tolerant quantum information processing Robert Alicki; References; Index.
NASA Astrophysics Data System (ADS)
Jenkins, David R.; Basden, Alastair; Myers, Richard M.
2018-05-01
We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Locality Aware Concurrent Start for Stencil Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.
Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
2017-01-01
Adenovirus (AdV) morphogenesis is a complex process, many aspects of which remain unclear. In particular, it is not settled where in the nucleus assembly and packaging occur, and whether these processes occur in a sequential or a concerted manner. Here we use immunofluorescence and immunoelectron microscopy (immunoEM) to trace packaging factors and structural proteins at late times post infection by either wildtype virus or a delayed packaging mutant. We show that representatives of all assembly factors are present in the previously recognized peripheral replicative zone, which therefore is the AdV assembly factory. Assembly intermediates and abortive products observed in this region favor a concurrent assembly and packaging model comprising two pathways, one for capsid proteins and another one for core components. Only when both pathways are coupled by correct interaction between packaging proteins and the genome is the viral particle produced. Decoupling generates accumulation of empty capsids and unpackaged cores. PMID:28448571
NASA Astrophysics Data System (ADS)
Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John
The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.
NASA Astrophysics Data System (ADS)
Wattawa, Scott
1995-11-01
Offering interactive services and data in a hybrid fiber/coax cable system requires the coordination of a host of operations and business support systems. New service offerings and network growth and evolution create never-ending changes in the network infrastructure. Agent-based enterprise models provide a flexible mechanism for systems integration of service and support systems. Agent models also provide a mechanism to decouple interactive services from network architecture. By using the Java programming language, agents may be made safe, portable, and intelligent. This paper investigates the application of the Object Management Group's Common Object Request Brokering Architecture to the integration of a multiple services metropolitan area network.
Cobb, Corie L.; Solberg, Scott E.
2017-04-29
3-dimensional (3D) electrode architectures have been explored as a means to decouple power and energy trade-offs in thick battery electrodes. Limited work has been published which systematically examines the impact of these architectures at the pouch cell level. This paper conducts an analysis on the potential capacity gains that can be realized with thick co-extruded electrodes in a pouch cell. Moreover, our findings show that despite lower active material composition for each cathode layer, the effective gain in thickness and active material loading enables pouch cell capacity gains greater than 10% with a Lithium Nickel Manganese Cobalt Oxide (NMC) materialsmore » system.« less
High performance in silico virtual drug screening on many-core processors.
McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-05-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
High performance in silico virtual drug screening on many-core processors
Price, James; Sessions, Richard B; Ibarra, Amaurys A
2015-01-01
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.
Benchmarking high performance computing architectures with CMS’ skeleton framework
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-11-23
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Architecture-Based Unit Testing of the Flight Software Product Line
NASA Technical Reports Server (NTRS)
Ganesan, Dharmalingam; Lindvall, Mikael; McComas, David; Bartholomew, Maureen; Slegel, Steve; Medina, Barbara
2010-01-01
This paper presents an analysis of the unit testing approach developed and used by the Core Flight Software (CFS) product line team at the NASA GSFC. The goal of the analysis is to understand, review, and reconunend strategies for improving the existing unit testing infrastructure as well as to capture lessons learned and best practices that can be used by other product line teams for their unit testing. The CFS unit testing framework is designed and implemented as a set of variation points, and thus testing support is built into the product line architecture. The analysis found that the CFS unit testing approach has many practical and good solutions that are worth considering when deciding how to design the testing architecture for a product line, which are documented in this paper along with some suggested innprovennents.
Impact of memory bottleneck on the performance of graphics processing units
NASA Astrophysics Data System (ADS)
Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong
2015-12-01
Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.
NASA Astrophysics Data System (ADS)
Schneider, E. A.; Deinert, M. R.; Cady, K. B.
2006-10-01
The balance of isotopes in a nuclear reactor core is key to understanding the overall performance of a given fuel cycle. This balance is in turn most strongly affected by the time and energy-dependent neutron flux. While many large and involved computer packages exist for determining this spectrum, a simplified approach amenable to rapid computation is missing from the literature. We present such a model, which accepts as inputs the fuel element/moderator geometry and composition, reactor geometry, fuel residence time and target burnup and we compare it to OECD/NEA benchmarks for homogeneous MOX and UOX LWR cores. Collision probability approximations to the neutron transport equation are used to decouple the spatial and energy variables. The lethargy dependent neutron flux, governed by coupled integral equations for the fuel and moderator/coolant regions is treated by multigroup thermalization methods, and the transport of neutrons through space is modeled by fuel to moderator transport and escape probabilities. Reactivity control is achieved through use of a burnable poison or adjustable control medium. The model calculates the buildup of 24 actinides, as well as fission products, along with the lethargy dependent neutron flux and the results of several simulations are compared with benchmarked standards.
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures.
Kim, Daehyun; Trzasko, Joshua; Smelyanskiy, Mikhail; Haider, Clifton; Dubey, Pradeep; Manduca, Armando
2011-01-01
Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
The Eukaryotic Replisome Goes Under the Microscope
O'Donnell, Mike; Li, Huilin
2016-03-21
The machinery at the eukaryotic replication fork has seen many new structural advances using EM and crystallography. Recent structures of eukaryotic replisome components include the Mcm2-7 complex, the CMG helicase, DNA polymerases, a Ctf4 trimer hub and the first look at a core replisome of 20 different proteins containing the helicase, primase, leading polymerase and a lagging strand polymerase. The eukaryotic core replisome shows an unanticipated architecture, with one polymerase sitting above the helicase and the other below. Additionally, structures of Mcm2 bound to an H3/H4 tetramer suggest a direct role of the replisome in handling nucleosomes, which are importantmore » to DNA organization and gene regulation. This review provides a summary of some of the many recent advances in the structure of the eukaryotic replisome.« less
NASA Technical Reports Server (NTRS)
Wilmot, Jonathan
2005-01-01
The contents include the following: High availability. Hardware is in harsh environment. Flight processor (constraints) very widely due to power and weight constraints. Software must be remotely modifiable and still operate while changes are being made. Many custom one of kind interfaces for one of a kind missions. Sustaining engineering. Price of failure is high, tens to hundreds of millions of dollars.
Architectures for Cognitive Systems
2010-02-01
highly modular many- node chip was designed which addressed power efficiency to the maximum extent possible. Each node contains an Asynchronous Field...optimization to perform complex cognitive computing operations. This project focused on the design of the core and integration across a four node chip . A...follow on project will focus on creating a 3 dimensional stack of chips that is enabled by the low power usage. The chip incorporates structures to
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava
Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems ismore » expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.« less
Concise, Stereocontrolled Synthesis of the Citrinadin B Core Architecture
Guerrero, Carlos A.; Sorensen, Erik J.
2011-01-01
A concise, stereocontrolled synthesis of the citrinadin B core architecture from scalemic, readily available starting materials is disclosed. Highlights include ready access to both cyclic tryptophan tautomer and TRANS-2,6-disubstituted piperidine fragments, an efficient, stereoretentive mixed Claisen acylation for the coupling of these halves, and further diastereoselective carbonyl addition and oxidative rearrangement for assembly of the core. PMID:21894952
Performance of GeantV EM Physics Models
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2017-10-01
The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.
NASA Astrophysics Data System (ADS)
Ristau, Henry
Many tasks in smart environments can be implemented using message based communication paradigms that decouple applications in time, space, synchronization and semantics. Current solutions for decoupled message based communication either do not support message processing and thus semantic decoupling or rely on clearly defined network structures. In this paper we present ASP, a novel concept for such communication that can directly operate on neighbor relations between brokers and does not rely on a homogeneous addressing scheme or anymore than simple link layer communication. We show by simulation that ASP performs well in a heterogeneous scenario with mobile nodes and decreases network or processor load significantly compared to message flooding.
Effects of stochastic noise on dynamical decoupling procedures
NASA Astrophysics Data System (ADS)
Bernád, J. Z.; Frydrych, H.
2014-06-01
Dynamical decoupling is an important tool to counter decoherence and dissipation effects in quantum systems originating from environmental interactions. It has been used successfully in many experiments; however, there is still a gap between fidelity improvements achieved in practice compared to theoretical predictions. We propose a model for imperfect dynamical decoupling based on a stochastic Ito differential equation which could explain the observed gap. We discuss the impact of our model on the time evolution of various quantum systems in finite- and infinite-dimensional Hilbert spaces. Analytical results are given for the limit of continuous control, whereas we present numerical simulations and upper bounds for the case of finite control.
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
NASA Technical Reports Server (NTRS)
Pena, Joaquin; Hinchey, Michael G.; Ruiz-Cortes, Antonio
2006-01-01
The field of Software Product Lines (SPL) emphasizes building a core architecture for a family of software products from which concrete products can be derived rapidly. This helps to reduce time-to-market, costs, etc., and can result in improved software quality and safety. Current AOSE methodologies are concerned with developing a single Multiagent System. We propose an initial approach to developing the core architecture of a Multiagent Systems Product Line (MAS-PL), exemplifying our approach with reference to a concept NASA mission based on multiagent technology.
Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Panyala, Ajay R.; Halappanavar, Mahantesh
Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) {more » on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.« less
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Conserved and host-specific features of influenza virion architecture.
Hutchinson, Edward C; Charles, Philip D; Hester, Svenja S; Thomas, Benjamin; Trudgian, David; Martínez-Alonso, Mónica; Fodor, Ervin
2014-09-16
Viruses use virions to spread between hosts, and virion composition is therefore the primary determinant of viral transmissibility and immunogenicity. However, the virions of many viruses are complex and pleomorphic, making them difficult to analyse in detail. Here we address this by identifying and quantifying virion proteins with mass spectrometry, producing a complete and quantified model of the hundreds of host-encoded and viral proteins that make up the pleomorphic virions of influenza viruses. We show that a conserved influenza virion architecture is maintained across diverse combinations of virus and host. This 'core' architecture, which includes substantial quantities of host proteins as well as the viral protein NS1, is elaborated with abundant host-dependent features. As a result, influenza virions produced by mammalian and avian hosts have distinct protein compositions. Finally, we note that influenza virions share an underlying protein composition with exosomes, suggesting that influenza virions form by subverting microvesicle production.
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research
2013-09-30
STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
Distributed Antenna-Coupled TES for FIR Detectors Arrays
NASA Technical Reports Server (NTRS)
Day, Peter K.; Leduc, Henry G.; Dowell, C. Darren; Lee, Richard A.; Zmuidzinas, Jonas
2007-01-01
We describe a new architecture for a superconducting detector for the submillimeter and far-infrared. This detector uses a distributed hot-electron transition edge sensor (TES) to collect the power from a focal-plane-filling slot antenna array. The sensors lay directly across the slots of the antenna and match the antenna impedance of about 30 ohms. Each pixel contains many sensors that are wired in parallel as a single distributed TES, which results in a low impedance that readily matches to a multiplexed SQUID readout These detectors are inherently polarization sensitive, with very low cross-polarization response, but can also be configured to sum both polarizations. The dual-polarization design can have a bandwidth of 50The use of electron-phonon decoupling eliminates the need for micro-machining, making the focal plane much easier to fabricate than with absorber-coupled, mechanically isolated pixels. We discuss applications of these detectors and a hybridization scheme compatible with arrays of tens of thousands of pixels.
Architectural Improvements and New Processing Tools for the Open XAL Online Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, Christopher K; Pelaia II, Tom; Freed, Jonathan M
The online model is the component of Open XAL providing accelerator modeling, simulation, and dynamic synchronization to live hardware. Significant architectural changes and feature additions have been recently made in two separate areas: 1) the managing and processing of simulation data, and 2) the modeling of RF cavities. Simulation data and data processing have been completely decoupled. A single class manages all simulation data while standard tools were developed for processing the simulation results. RF accelerating cavities are now modeled as composite structures where parameter and dynamics computations are distributed. The beam and hardware models both maintain their relative phasemore » information, which allows for dynamic phase slip and elapsed time computation.« less
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo; Sterpin, Edmond
2016-04-15
Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithmmore » of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.« less
A Core Knowledge Architecture of Visual Working Memory
ERIC Educational Resources Information Center
Wood, Justin N.
2011-01-01
Visual working memory (VWM) is widely thought to contain specialized buffers for retaining spatial and object information: a "spatial-object architecture." However, studies of adults, infants, and nonhuman animals show that visual cognition builds on core knowledge systems that retain more specialized representations: (1) spatiotemporal…
Antarctic Testing of the European Ultrasonic Planetary Core Drill (UPCD)
NASA Astrophysics Data System (ADS)
Timoney, R.; Worrall, K.; Li, X.; Firstbrook, D.; Harkness, P.
2018-04-01
An overview of a series of field testing in Antarctica where the Ultrasonic Planetary Core Drill (UPCD) architecture was tested. The UPCD system is the product an EC FP7 award to develop a Mars Sample Return architecture based around the ultrasonic technique.
Large longitude libration of Mercury reveals a molten core.
Margot, J L; Peale, S J; Jurgens, R F; Slade, M A; Holin, I V
2007-05-04
Observations of radar speckle patterns tied to the rotation of Mercury establish that the planet occupies a Cassini state with obliquity of 2.11 +/- 0.1 arc minutes. The measurements show that the planet exhibits librations in longitude that are forced at the 88-day orbital period, as predicted by theory. The large amplitude of the oscillations, 35.8 +/- 2 arc seconds, together with the Mariner 10 determination of the gravitational harmonic coefficient C22, indicates that the mantle of Mercury is decoupled from a core that is at least partially molten.
Introducing Argonne’s Theta Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
Wilk, S; Michalowski, W; O'Sullivan, D; Farion, K; Sayyad-Shirabad, J; Kuziemsky, C; Kukawka, B
2013-01-01
The purpose of this study was to create a task-based support architecture for developing clinical decision support systems (CDSSs) that assist physicians in making decisions at the point-of-care in the emergency department (ED). The backbone of the proposed architecture was established by a task-based emergency workflow model for a patient-physician encounter. The architecture was designed according to an agent-oriented paradigm. Specifically, we used the O-MaSE (Organization-based Multi-agent System Engineering) method that allows for iterative translation of functional requirements into architectural components (e.g., agents). The agent-oriented paradigm was extended with ontology-driven design to implement ontological models representing knowledge required by specific agents to operate. The task-based architecture allows for the creation of a CDSS that is aligned with the task-based emergency workflow model. It facilitates decoupling of executable components (agents) from embedded domain knowledge (ontological models), thus supporting their interoperability, sharing, and reuse. The generic architecture was implemented as a pilot system, MET3-AE--a CDSS to help with the management of pediatric asthma exacerbation in the ED. The system was evaluated in a hospital ED. The architecture allows for the creation of a CDSS that integrates support for all tasks from the task-based emergency workflow model, and interacts with hospital information systems. Proposed architecture also allows for reusing and sharing system components and knowledge across disease-specific CDSSs.
NASA Astrophysics Data System (ADS)
Glowacki, B. A.; Majoros, M.
2009-06-01
Magnetic materials can help to improve the performance of practical superconductors on the macroscale/microscale as magnetic diverters and also on the nanoscale as effective pinning centres. It has been established by numerical modelling that magnetic shielding of the filaments reduces AC losses in self-field conditions due to decoupling of the filaments and, at the same time, it increases the critical current of the composite. This effect is especially beneficial for coated conductors, in which the anisotropic properties of the superconductor are amplified by the conductor architecture. However, ferromagnetic coatings are often chemically incompatible with YBa2Cu3O7 and (Pb,Bi)2Sr2Ca2Cu3O9 conductors, and buffer layers have to be used. In contrast, in MgB2 conductors an iron matrix may remain in direct contact with the superconducting core. The application of superconducting-magnetic heterostructures requires consideration of the thermal and electromagnetic stability of the superconducting materials used. On one hand, magnetic materials reduce the critical current gradient across the individual filaments but, on the other hand, they often reduce the thermal conductivity between the superconducting core and the cryogen, which may cause destruction of the conductor in the event of thermal instability. A possible nanoscale method of improving the critical current density of superconducting conductors is the introduction of sub-micron magnetic pinning centres. However, the volumetric density and chemical compatibility of magnetic inclusions has to be controlled to avoid suppression of the superconducting properties.
Using the CoRE Requirements Method with ADARTS. Version 01.00.05
1994-03-01
requirements; combining ADARTS processes and objects derived from CoRE requirements into an ADARTS software architecture design ; and taking advantage of...CoRE’s precision in the ADARTS process structuring, class structuring, and software architecture design activities. Object-oriented requirements and
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Sankaranarayanan, Ganesh; Halic, Tansel; Arikatla, Venkata Sreekanth; Lu, Zhonghua; De, Suvranu
2010-01-01
Purpose Surgical simulations require haptic interactions and collaboration in a shared virtual environment. A software framework for decoupled surgical simulation based on a multi-controller and multi-viewer model-view-controller (MVC) pattern was developed and tested. Methods A software framework for multimodal virtual environments was designed, supporting both visual interactions and haptic feedback while providing developers with an integration tool for heterogeneous architectures maintaining high performance, simplicity of implementation, and straightforward extension. The framework uses decoupled simulation with updates of over 1,000 Hz for haptics and accommodates networked simulation with delays of over 1,000 ms without performance penalty. Results The simulation software framework was implemented and was used to support the design of virtual reality-based surgery simulation systems. The framework supports the high level of complexity of such applications and the fast response required for interaction with haptics. The efficacy of the framework was tested by implementation of a minimally invasive surgery simulator. Conclusion A decoupled simulation approach can be implemented as a framework to handle simultaneous processes of the system at the various frame rates each process requires. The framework was successfully used to develop collaborative virtual environments (VEs) involving geographically distributed users connected through a network, with the results comparable to VEs for local users. PMID:20714933
Maciel, Anderson; Sankaranarayanan, Ganesh; Halic, Tansel; Arikatla, Venkata Sreekanth; Lu, Zhonghua; De, Suvranu
2011-07-01
Surgical simulations require haptic interactions and collaboration in a shared virtual environment. A software framework for decoupled surgical simulation based on a multi-controller and multi-viewer model-view-controller (MVC) pattern was developed and tested. A software framework for multimodal virtual environments was designed, supporting both visual interactions and haptic feedback while providing developers with an integration tool for heterogeneous architectures maintaining high performance, simplicity of implementation, and straightforward extension. The framework uses decoupled simulation with updates of over 1,000 Hz for haptics and accommodates networked simulation with delays of over 1,000 ms without performance penalty. The simulation software framework was implemented and was used to support the design of virtual reality-based surgery simulation systems. The framework supports the high level of complexity of such applications and the fast response required for interaction with haptics. The efficacy of the framework was tested by implementation of a minimally invasive surgery simulator. A decoupled simulation approach can be implemented as a framework to handle simultaneous processes of the system at the various frame rates each process requires. The framework was successfully used to develop collaborative virtual environments (VEs) involving geographically distributed users connected through a network, with the results comparable to VEs for local users.
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Molecular Architecture of the Retroviral Capsid.
Perilla, Juan R; Gronenborn, Angela M
2016-05-01
Retroviral capsid cores are proteinaceous containers that self-assemble to encase the viral genome and a handful of proteins that promote infection. Their function is to protect and aid in the delivery of viral genes to the nucleus of the host, and, in many cases, infection pathways are influenced by capsid-cellular interactions. From a mathematical perspective, capsid cores are polyhedral cages and, as such, follow well-defined geometric rules. However, marked morphological differences in shapes exist, depending on virus type. Given the specific roles of capsid in the viral life cycle, the availability of detailed molecular structures, particularly at assembly interfaces, opens novel avenues for targeted drug development against these pathogens. Here, we summarize recent advances in the structure and understanding of retroviral capsid, with particular emphasis on assemblies and the capsid cores. Copyright © 2016 Elsevier Ltd. All rights reserved.
Graphical Representation and Origin of Piezoresistance Effect in Germanium
NASA Astrophysics Data System (ADS)
Matsuda, K.; Nagaoka, S.; Kanda, Y.
2017-06-01
The longitudinal and transverse piezoresistance coefficients of Ge at room temperature are represented graphically as a function of the crystal directions for orientation (001), (110) and (211) planes. Many valley model of conduction band and stress decoupling decoupling of the degenerate valence band into two bands of prolate and oblate ellipsoidal energy surface are shown to explain origin of the piezoresistance. One this basis, comparison between piezoresistance coefficient and theoretical model is discussed.
The Core Services of the European Plate Observing System (EPOS)
NASA Astrophysics Data System (ADS)
Hoffmann, T. L.; Euteneuer, F. H.; Lauterjung, J.
2013-12-01
The ESFRI project European Plate Observing System (EPOS) was launched in November 2010 and has now completed its year 3 of the four-year preparatory phase. EPOS will create a single sustainable, permanent observation infrastructure, integrating existing geophysical monitoring networks, local observatories and experimental laboratories in Europe and adjacent regions. EPOS' technical Work Package 6 has developed a three layer architectural model for the construction of the EPOS Core Services (CS) during the subsequent implementation phase. The Poster will present and detail on these three layers, consisting of the EPOS Integrated Core Services (ICS), the Thematic Core Services (TCS) and the existing National Research Infrastructures & Data Centers. The basic layer of the architecture is established by the National Research Infrastructures (RIs) & Data Centers, which generate data and information and are responsible for the operation of the instrumentation. National RIs will provide their data to the Thematic Cores Services. The Thematic Core Services constitute the community layer of EPOS architecture and they will: 1) consist of existing (e.g. ORFEUS, EMSC), developing (e.g. EUREF/GNSS) or still to be developed Service Providers for specific thematic communities, as represented within EPOS through the technical EPOS Working Groups (e.g., seismology, volcanology, geodesy, geology, analytic labs for rock physics, geomagnetism, geo-resources ... and many others), 2) provide data services to specific communities, 3) link the National Research Infrastructures to the EPOS Integrated Services, 4) include Service Providers (e.g. OneGeology+, Intermagnet) that may be merely linked or partially integrated and 5) consist of Integrated Laboratories and RIs spanning multiple EPOS disciplines and taking advantage of other existing Thematic Services. The EPOS Integrated Services constitute the ICT layer of the EPOS portal and they will: 1) provide access to multidisciplinary data from different EPOS Thematic Core Services and from the National RIs & Data Centers, 2) provide access to data products, synthetic data from simulations, data processing and data visualization tools, 3) serve science, industry, education, government, legal and other stakeholders in an integrated fashion through the EPOS User Interface, and 4) provide a variety of ICT technological services including (but not being limited) to discovery functions, data mining, access to modeling tools and high performance computing, and training & tutorials.
Computational Cosmology at the Bleeding Edge
NASA Astrophysics Data System (ADS)
Habib, Salman
2013-04-01
Large-area sky surveys are providing a wealth of cosmological information to address the mysteries of dark energy and dark matter. Observational probes based on tracking the formation of cosmic structure are essential to this effort, and rely crucially on N-body simulations that solve the Vlasov-Poisson equation in an expanding Universe. As statistical errors from survey observations continue to shrink, and cosmological probes increase in number and complexity, simulations are entering a new regime in their use as tools for scientific inference. Changes in supercomputer architectures provide another rationale for developing new parallel simulation and analysis capabilities that can scale to computational concurrency levels measured in the millions to billions. In this talk I will outline the motivations behind the development of the HACC (Hardware/Hybrid Accelerated Cosmology Code) extreme-scale cosmological simulation framework and describe its essential features. By exploiting a novel algorithmic structure that allows flexible tuning across diverse computer architectures, including accelerated and many-core systems, HACC has attained a performance of 14 PFlops on the IBM BG/Q Sequoia system at 69% of peak, using more than 1.5 million cores.
NASA Astrophysics Data System (ADS)
Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying
2017-05-01
In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amadio, G.; et al.
An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
HYDRA : High-speed simulation architecture for precision spacecraft formation simulation
NASA Technical Reports Server (NTRS)
Martin, Bryan J.; Sohl, Garett.
2003-01-01
e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.
2015-09-30
DISTRIBUTION STATEMENT A: Distribution approved for public release; distribution is unlimited. NPS-NRL- Rice -UIUC Collaboration on Navy Atmosphere...portability. There is still a gap in the OCCA support for Fortran programmers who do not have accelerator experience. Activities at Rice /Virginia Tech are...for automated data movement and for kernel optimization using source code analysis and run-time detective work. In this quarter the Rice /Virginia
2012-08-01
The first phase consisted of Shared Services , Threat Detection and Reporting, and the Remote Weapon Station (RWS) build up and validation. The...Awareness build up and validation. The first phase consisted of the development of the shared services or core services that are required by many...C4ISR/EW systems. The shared services include: time synchronization, position, direction of travel, and orientation. Time synchronization is
NASA Astrophysics Data System (ADS)
Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka
A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.
Recent advances in nuclear magnetic resonance quantum information processing.
Criger, Ben; Passante, Gina; Park, Daniel; Laflamme, Raymond
2012-10-13
Quantum information processors have the potential to drastically change the way we communicate and process information. Nuclear magnetic resonance (NMR) has been one of the first experimental implementations of quantum information processing (QIP) and continues to be an excellent testbed to develop new QIP techniques. We review the recent progress made in NMR QIP, focusing on decoupling, pulse engineering and indirect nuclear control. These advances have enhanced the capabilities of NMR QIP, and have useful applications in both traditional NMR and other QIP architectures.
Trusted measurement model based on multitenant behaviors.
Ning, Zhen-Hu; Shen, Chang-Xiang; Zhao, Yong; Liang, Peng
2014-01-01
With a fast growing pervasive computing, especially cloud computing, the behaviour measurement is at the core and plays a vital role. A new behaviour measurement tailored for Multitenants in cloud computing is needed urgently to fundamentally establish trust relationship. Based on our previous research, we propose an improved trust relationship scheme which captures the world of cloud computing where multitenants share the same physical computing platform. Here, we first present the related work on multitenant behaviour; secondly, we give the scheme of behaviour measurement where decoupling of multitenants is taken into account; thirdly, we explicitly explain our decoupling algorithm for multitenants; fourthly, we introduce a new way of similarity calculation for deviation control, which fits the coupled multitenants under study well; lastly, we design the experiments to test our scheme.
Trusted Measurement Model Based on Multitenant Behaviors
Ning, Zhen-Hu; Shen, Chang-Xiang; Zhao, Yong; Liang, Peng
2014-01-01
With a fast growing pervasive computing, especially cloud computing, the behaviour measurement is at the core and plays a vital role. A new behaviour measurement tailored for Multitenants in cloud computing is needed urgently to fundamentally establish trust relationship. Based on our previous research, we propose an improved trust relationship scheme which captures the world of cloud computing where multitenants share the same physical computing platform. Here, we first present the related work on multitenant behaviour; secondly, we give the scheme of behaviour measurement where decoupling of multitenants is taken into account; thirdly, we explicitly explain our decoupling algorithm for multitenants; fourthly, we introduce a new way of similarity calculation for deviation control, which fits the coupled multitenants under study well; lastly, we design the experiments to test our scheme. PMID:24987731
The Transition to a Many-core World
NASA Astrophysics Data System (ADS)
Mattson, T. G.
2012-12-01
The need to increase performance within a fixed energy budget has pushed the computer industry to many core processors. This is grounded in the physics of computing and is not a trend that will just go away. It is hard to overestimate the profound impact of many-core processors on software developers. Virtually every facet of the software development process will need to change to adapt to these new processors. In this talk, we will look at many-core hardware and consider its evolution from a perspective grounded in the CPU. We will show that the number of cores will inevitably increase, but in addition, a quest to maximize performance per watt will push these cores to be heterogeneous. We will show that the inevitable result of these changes is a computing landscape where the distinction between the CPU and the GPU is blurred. We will then consider the much more pressing problem of software in a many core world. Writing software for heterogeneous many core processors is well beyond the ability of current programmers. One solution is to support a software development process where programmer teams are split into two distinct groups: a large group of domain-expert productivity programmers and much smaller team of computer-scientist efficiency programmers. The productivity programmers work in terms of high level frameworks to express the concurrency in their problems while avoiding any details for how that concurrency is exploited. The second group, the efficiency programmers, map applications expressed in terms of these frameworks onto the target many-core system. In other words, we can solve the many-core software problem by creating a software infrastructure that only requires a small subset of programmers to become master parallel programmers. This is different from the discredited dream of automatic parallelism. Note that productivity programmers still need to define the architecture of their software in a way that exposes the concurrency inherent in their problem. We submit that domain-expert programmers understand "what is concurrent". The parallel programming problem emerges from the complexity of "how that concurrency is utilized" on real hardware. The research described in this talk was carried out in collaboration with the ParLab at UC Berkeley. We use a design pattern language to define the high level frameworks exposed to domain-expert, productivity programmers. We then use tools from the SEJITS project (Selective embedded Just In time Specializers) to build the software transformation tool chains thst turn these framework-oriented designs into highly efficient code. The final ingredient is a software platform to serve as a target for these tools. One such platform is the OpenCL industry standard for programming heterogeneous systems. We will briefly describe OpenCL and show how it provides a vendor-neutral software target for current and future many core systems; both CPU-based, GPU-based, and heterogeneous combinations of the two.
Progress in a novel architecture for high performance processing
NASA Astrophysics Data System (ADS)
Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin
2018-04-01
The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Gädt, Torben; Ieong, Nga Sze; Cambridge, Graeme; Winnik, Mitchell A; Manners, Ian
2009-02-01
Block copolymers consist of two or more chemically distinct polymer segments, or blocks, connected by a covalent link. In a selective solvent for one of the blocks, core-corona micelle structures are formed. We demonstrate that living polymerizations driven by the epitaxial crystallization of a core-forming metalloblock represent a synthetic tool that can be used to generate complex and hierarchical micelle architectures from diblock copolymers. The use of platelet micelles as initiators enables the formation of scarf-like architectures in which cylindrical micelle tassels of controlled length are grown from specific crystal faces. A similar process enables the fabrication of brushes of cylindrical micelles on a crystalline homopolymer substrate. Living polymerizations driven by heteroepitaxial growth can also be accomplished and are illustrated by the formation of tri- and pentablock and scarf architectures with cylinder-cylinder and platelet-cylinder connections, respectively, that involve different core-forming metalloblocks.
UMA/GAN network architecture analysis
NASA Astrophysics Data System (ADS)
Yang, Liang; Li, Wensheng; Deng, Chunjian; Lv, Yi
2009-07-01
This paper is to critically analyze the architecture of UMA which is one of Fix Mobile Convergence (FMC) solutions, and also included by the third generation partnership project(3GPP). In UMA/GAN network architecture, UMA Network Controller (UNC) is the key equipment which connects with cellular core network and mobile station (MS). UMA network could be easily integrated into the existing cellular networks without influencing mobile core network, and could provides high-quality mobile services with preferentially priced indoor voice and data usage. This helps to improve subscriber's experience. On the other hand, UMA/GAN architecture helps to integrate other radio technique into cellular network which includes WiFi, Bluetooth, and WiMax and so on. This offers the traditional mobile operators an opportunity to integrate WiMax technique into cellular network. In the end of this article, we also give an analysis of potential influence on the cellular core networks ,which is pulled by UMA network.
Mooneyham, Benjamin W; Schooler, Jonathan W
2016-08-01
Mind wandering is associated with perceptual decoupling: the disengagement of attention from perception. This decoupling is deleterious to performance in many situations; however, we sought to determine whether it might occur in the service of performance in certain circumstances. In two studies, we examined the role of mind wandering in a test of "semantic satiation," a phenomenon in which the repeated presentation of a word reduces semantic priming for a subsequently presented semantic associate. We posited that the attentional and perceptual decoupling associated with mind wandering would reduce the amount of satiation in the semantic representations of repeatedly presented words, thus leading to a reduced semantic-satiation effect. Our results supported this hypothesis: Self-reported mind-wandering episodes (Study 1) and behavioral indices of decoupled attention (Study 2) were both predictive of maintained semantic priming in situations predicted to induce semantic satiation. Additionally, our results suggest that moderate inattention to repetitive stimuli is not sufficient to enable "dishabituation": the refreshment of cognitive performance that results from diverting attention away from the task at hand. Rather, full decoupling is necessary to reap the benefits of mind wandering and to minimize mind numbing.
Abdelkarim, Noha; Mohamed, Amr E; El-Garhy, Ahmed M; Dorrah, Hassen T
2016-01-01
The two-coupled distillation column process is a physically complicated system in many aspects. Specifically, the nested interrelationship between system inputs and outputs constitutes one of the significant challenges in system control design. Mostly, such a process is to be decoupled into several input/output pairings (loops), so that a single controller can be assigned for each loop. In the frame of this research, the Brain Emotional Learning Based Intelligent Controller (BELBIC) forms the control structure for each decoupled loop. The paper's main objective is to develop a parameterization technique for decoupling and control schemes, which ensures robust control behavior. In this regard, the novel optimization technique Bacterial Swarm Optimization (BSO) is utilized for the minimization of summation of the integral time-weighted squared errors (ITSEs) for all control loops. This optimization technique constitutes a hybrid between two techniques, which are the Particle Swarm and Bacterial Foraging algorithms. According to the simulation results, this hybridized technique ensures low mathematical burdens and high decoupling and control accuracy. Moreover, the behavior analysis of the proposed BELBIC shows a remarkable improvement in the time domain behavior and robustness over the conventional PID controller.
Mohamed, Amr E.; Dorrah, Hassen T.
2016-01-01
The two-coupled distillation column process is a physically complicated system in many aspects. Specifically, the nested interrelationship between system inputs and outputs constitutes one of the significant challenges in system control design. Mostly, such a process is to be decoupled into several input/output pairings (loops), so that a single controller can be assigned for each loop. In the frame of this research, the Brain Emotional Learning Based Intelligent Controller (BELBIC) forms the control structure for each decoupled loop. The paper's main objective is to develop a parameterization technique for decoupling and control schemes, which ensures robust control behavior. In this regard, the novel optimization technique Bacterial Swarm Optimization (BSO) is utilized for the minimization of summation of the integral time-weighted squared errors (ITSEs) for all control loops. This optimization technique constitutes a hybrid between two techniques, which are the Particle Swarm and Bacterial Foraging algorithms. According to the simulation results, this hybridized technique ensures low mathematical burdens and high decoupling and control accuracy. Moreover, the behavior analysis of the proposed BELBIC shows a remarkable improvement in the time domain behavior and robustness over the conventional PID controller. PMID:27807444
High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures
Kim, Daehyun; Trzasko, Joshua; Smelyanskiy, Mikhail; Haider, Clifton; Dubey, Pradeep; Manduca, Armando
2011-01-01
Compressive sensing (CS) describes how sparse signals can be accurately reconstructed from many fewer samples than required by the Nyquist criterion. Since MRI scan duration is proportional to the number of acquired samples, CS has been gaining significant attention in MRI. However, the computationally intensive nature of CS reconstructions has precluded their use in routine clinical practice. In this work, we investigate how different throughput-oriented architectures can benefit one CS algorithm and what levels of acceleration are feasible on different modern platforms. We demonstrate that a CUDA-based code running on an NVIDIA Tesla C2050 GPU can reconstruct a 256 × 160 × 80 volume from an 8-channel acquisition in 19 seconds, which is in itself a significant improvement over the state of the art. We then show that Intel's Knights Ferry can perform the same 3D MRI reconstruction in only 12 seconds, bringing CS methods even closer to clinical viability. PMID:21922017
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Wyszogrodzka, Monika; Haag, Rainer
2008-01-01
Dendrimers are an important class of polymeric materials for a broad range of applications in which monodispersity and multivalency are of interest. Here we report on a highly efficient synthetic route towards bifunctional polyglycerol dendrons on a multigram scale. Commercially available triglycerol (1), which is highly biocompatible, was used as starting material. By applying Williamson ether synthesis followed by an ozonolysis/reduction procedure, glycerol-based dendrons up to the fourth generation were prepared. The obtained products have a reactive core, which was further functionalized to the corresponding monoazido derivatives. By applying copper(I)-catalyzed 1,3-dipolar cycloaddition, so-called "click" coupling, a library of core-shell architectures was prepared. After removal of the 1,2-diol protecting groups, water-soluble core-shell architectures 24-27 of different generations were obtained in high yields. In the structure-transport relationship with Nile red we observe a clear dependence on core size and generation of the polyglycerol dendrons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Histomorphometric analysis of collagen architecture of auricular keloids in an Asian population.
Chong, Yosep; Park, Tae Hwan; Seo, Sang won; Chang, Choong Hyun
2015-03-01
Keloids are a pathologic condition of the reparative process, which present as excessive scar formation that involves various cells and cytokines. Many studies focusing on the histologic feature of keloids, however, have shown discordant results without consideration of architectural aspect of collagen structure. The purpose of this study was to demonstrate a schematic illustration of collagen architecture of keloids, specifically auricular keloids, and to analyze each part on the histomorphologic and morphometric basis. Thirty-nine surgically excised auricular keloids were retrieved from the file of Kangbuk Samsung Hospital. After exhaustive histomorphologic analysis, 3 distinctive structural parts, keloidal collagen, organizing collagen, and proliferating core collagen, were identified and mapped in every case. Cellularity of fibroblasts, blood vessel density, degree of inflammatory cell infiltration, and mast cells counts using Masson trichrome stain, Van Gieson stain, toluidine blue stain, and immunohistochemical stains for CD31 and smooth muscle actin were analyzed in each part of each case. Morphometric analysis on these parameters using ImageJ software was performed using 3 representative images of each part. Three parts were histomorphologically distinct by shape and array of collagen bundles, fibroblasts cellularity, blood vessel density, degree of inflammatory cells, and mast cell infiltration. Morphometric analysis revealed statistically significant difference between each part in fibroblasts cellularity, blood vessel density, degree of inflammatory cell infiltration, and mast cells count. All parameters were exceedingly high in whorling hypercellular fibrous nodules in proliferating core collagen showing simultaneous changes in other parts. Morphologically and morphometrically, 3 distinctive parts were identified in auricular keloids. Mast cell infiltrations, blood vessel density, and fibroblast cellularity are simultaneously increased or decreased according to these parts. Proliferating core collagen might serve as a proliferating center of keloids and might be a key portion for tumor growth and recurrence.
Huang, Chi-Hsin; Chang, Wen-Chih; Huang, Jian-Shiou; Lin, Shih-Ming; Chueh, Yu-Lun
2017-05-25
Core-shell NWs offer an innovative approach to achieve nanoscale metal-insulator-metal (MIM) heterostructures along the wire radial direction, realizing three-dimensional geometry architecture rather than planar type thin film devices. This work demonstrated the tunable resistive switching characteristics of ITO/HfO 2 core-shell nanowires with controllable shell thicknesses by the atomic layer deposition (ALD) process for the first time. Compared to planar HfO 2 thin film device configuration, ITO/HfO 2 core-shell nanowire shows a prominent resistive memory behavior, including lower power consumption with a smaller SET voltage of ∼0.6 V and better switching voltage uniformity with variations (standard deviation(σ)/mean value (μ)) of V SET and V RESET from 0.38 to 0.14 and from 0.33 to 0.05 for ITO/HfO 2 core-shell nanowire and planar HfO 2 thin film, respectively. In addition, endurance over 10 3 cycles resulting from the local electric field enhancement can be achieved, which is attributed to geometry architecture engineering. The concept of geometry architecture engineering provides a promising strategy to modify the electric-field distribution for solving the non-uniformity issue of future RRAM.
AthenaMT: upgrading the ATLAS software framework for the many-core world with multi-threading
NASA Astrophysics Data System (ADS)
Leggett, Charles; Baines, John; Bold, Tomasz; Calafiura, Paolo; Farrell, Steven; van Gemmeren, Peter; Malon, David; Ritsch, Elmar; Stewart, Graeme; Snyder, Scott; Tsulaia, Vakhtang; Wynne, Benjamin; ATLAS Collaboration
2017-10-01
ATLAS’s current software framework, Gaudi/Athena, has been very successful for the experiment in LHC Runs 1 and 2. However, its single threaded design has been recognized for some time to be increasingly problematic as CPUs have increased core counts and decreased available memory per core. Even the multi-process version of Athena, AthenaMP, will not scale to the range of architectures we expect to use beyond Run2. After concluding a rigorous requirements phase, where many design components were examined in detail, ATLAS has begun the migration to a new data-flow driven, multi-threaded framework, which enables the simultaneous processing of singleton, thread unsafe legacy Algorithms, cloned Algorithms that execute concurrently in their own threads with different Event contexts, and fully re-entrant, thread safe Algorithms. In this paper we report on the process of modifying the framework to safely process multiple concurrent events in different threads, which entails significant changes in the underlying handling of features such as event and time dependent data, asynchronous callbacks, metadata, integration with the online High Level Trigger for partial processing in certain regions of interest, concurrent I/O, as well as ensuring thread safety of core services. We also report on upgrading the framework to handle Algorithms that are fully re-entrant.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
2012-08-01
ELECTRONICS AND ARCHITECTURE (VEA) MINI-SYMPOSIUM AUGUST 14-16, TROY MICHIGAN Performance of an Embedded Platform Aggregating and Executing...NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) UBT Technologies,3250 W. Big Beaver Rd.,Ste. 329, Troy ,MI...Technology Symposium August 14-16 Troy , Michigan 14. ABSTRACT The Vehicular Integration for C4ISR/EW Interoperability (VICTORY) Standard adopts many
GPU and APU computations of Finite Time Lyapunov Exponent fields
NASA Astrophysics Data System (ADS)
Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
2012-03-01
We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.
ERIC Educational Resources Information Center
Erickson, Mary; Delahunt, Michael
2010-01-01
Most art teachers would agree that architecture is an important form of visual art, but they do not always include it in their curriculums. In this article, the authors share core ideas from "Architecture and Environment," a teaching resource that they developed out of a long-term interest in teaching architecture and their fascination with the…
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Blue guardian: an open architecture for rapid ISR demonstration
NASA Astrophysics Data System (ADS)
Barrett, Donald A.; Borntrager, Luke A.; Green, David M.
2016-05-01
Throughout the Department of Defense (DoD), acquisition, platform integration, and life cycle costs for weapons systems have continued to rise. Although Open Architecture (OA) interface standards are one of the primary methods being used to reduce these costs, the Air Force Rapid Capabilities Office (AFRCO) has extended the OA concept and chartered the Open Mission System (OMS) initiative with industry to develop and demonstrate a consensus-based, non-proprietary, OA standard for integrating subsystems and services into airborne platforms. The new OMS standard provides the capability to decouple vendor-specific sensors, payloads, and service implementations from platform-specific architectures and is still in the early stages of maturation and demonstration. The Air Force Research Laboratory (AFRL) - Sensors Directorate has developed the Blue Guardian program to demonstrate advanced sensing technology utilizing open architectures in operationally relevant environments. Over the past year, Blue Guardian has developed a platform architecture using the Air Force's OMS reference architecture and conducted a ground and flight test program of multiple payload combinations. Systems tested included a vendor-unique variety of Full Motion Video (FMV) systems, a Wide Area Motion Imagery (WAMI) system, a multi-mode radar system, processing and database functions, multiple decompression algorithms, multiple communications systems, and a suite of software tools. Initial results of the Blue Guardian program show the promise of OA to DoD acquisitions, especially for Intelligence, Surveillance and Reconnaissance (ISR) payload applications. Specifically, the OMS reference architecture was extremely useful in reducing the cost and time required for integrating new systems.
Witzel, Wayne; Montano, Ines; Muller, Richard P.; ...
2015-08-19
In this paper, we present a strategy for producing multiqubit gates that promise high fidelity with minimal tuning requirements. Our strategy combines gap protection from the adiabatic theorem with dynamical decoupling in a complementary manner. Energy-level transition errors are protected by adiabaticity and remaining phase errors are mitigated via dynamical decoupling. This is a powerful way to divide and conquer the various error channels. In order to accomplish this without violating a no-go theorem regarding black-box dynamically corrected gates [Phys. Rev. A 80, 032314 (2009)], we require a robust operating point (sweet spot) in control space where the qubits interactmore » with little sensitivity to noise. There are also energy gap requirements for effective adiabaticity. We apply our strategy to an architecture in Si with P donors where we assume we can shuttle electrons between different donors. Electron spins act as mobile ancillary qubits and P nuclear spins act as long-lived data qubits. Furthermore, this system can have a very robust operating point where the electron spin is bound to a donor in the quadratic Stark shift regime. High fidelity single qubit gates may be performed using well-established global magnetic resonance pulse sequences. Single electron-spin preparation and measurement has also been demonstrated. Thus, putting this all together, we present a robust universal gate set for quantum computation.« less
Structural basis for diversity in the SAM clan of riboswitches.
Trausch, Jeremiah J; Xu, Zhenjiang; Edwards, Andrea L; Reyes, Francis E; Ross, Phillip E; Knight, Rob; Batey, Robert T
2014-05-06
In bacteria, sulfur metabolism is regulated in part by seven known families of riboswitches that bind S-adenosyl-l-methionine (SAM). Direct binding of SAM to these mRNA regulatory elements governs a downstream secondary structural switch that communicates with the transcriptional and/or translational expression machinery. The most widely distributed SAM-binding riboswitches belong to the SAM clan, comprising three families that share a common SAM-binding core but differ radically in their peripheral architecture. Although the structure of the SAM-I member of this clan has been extensively studied, how the alternative peripheral architecture of the other families supports the common SAM-binding core remains unknown. We have therefore solved the X-ray structure of a member of the SAM-I/IV family containing the alternative "PK-2" subdomain shared with the SAM-IV family. This structure reveals that this subdomain forms extensive interactions with the helix housing the SAM-binding pocket, including a highly unusual mode of helix packing in which two helices pack in a perpendicular fashion. Biochemical and genetic analysis of this RNA reveals that SAM binding induces many of these interactions, including stabilization of a pseudoknot that is part of the regulatory switch. Despite strong structural similarity between the cores of SAM-I and SAM-I/IV members, a phylogenetic analysis of sequences does not indicate that they derive from a common ancestor.
Overview of Key Saturn Probe Mission Trades
NASA Technical Reports Server (NTRS)
Balint, Tibor S.; Kowalkowski, Theresa; Folkner, Bill
2007-01-01
Ongoing studies, performed at NASA/JPL over the past two years in support of NASA's SSE Roadmap activities, proved the feasibility of a NF class Saturn probe mission. I. This proposed mission could also provide a good opportunity for international collaboration with the proposed Cosmic Vision KRONOS mission: a) With ESA contributed probes (descent modules) on a NASA lead mission; b) Early 2017 launch could be a good programmatic option for ESA-CV/NASA-NF. II. A number of mission architectures could be suitable for this mission: a) Probe Relay based architecture with short flight time (approx. 6.3-7 years); b) DTE probe telecom based architecture with long flight time (-11 years), and low probe data rate, but with the probes decoupled from the carrier, allowing for polar trajectories I orbiter. This option may need technology development for telecom; c) Orbiter would likely impact mission cost over flyby, but would provide significantly higher science return. The Saturn probes mission is expected to be identified in NASA's New Frontiers AO. Thus, further studies are recommended to refine the most suitable architecture. International collaboration is started through the KRONOS proposal work; further collaborated studies will follow once KRONOS is selected in October under ESA's Cosmic Vision Program.
Examination of Multi-Core Architectures
2010-11-01
NOVEMBER 2010 2. REPORT TYPE Interim Technical Report 3. DATES COVERED (From - To) February 2010 – July 2010 4 . TITLE AND SUBTITLE EXAMINATION OF...STATEMENT 1 2.0 BACKGROUND 1 3.0 ARCHITECTURE CHARACTERISTICS 3 3.1 NVIDIA Tesla 3 3.2 TILE64 4 ...1 Tesla Architecture 3 2 TILE64 Architecture 4 3 Single Tile Architecture 4 4 STI Cell Broadband Engine
Komatsoulis, George A; Warzel, Denise B; Hartel, Francis W; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; Coronado, Sherri de; Reeves, Dianne M; Hadfield, Jillaine B; Ludet, Christophe; Covitz, Peter A
2008-02-01
One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service-Oriented Architecture (SSOA) for cancer research by the National Cancer Institute's cancer Biomedical Informatics Grid (caBIG).
Komatsoulis, George A.; Warzel, Denise B.; Hartel, Frank W.; Shanbhag, Krishnakant; Chilukuri, Ram; Fragoso, Gilberto; de Coronado, Sherri; Reeves, Dianne M.; Hadfield, Jillaine B.; Ludet, Christophe; Covitz, Peter A.
2008-01-01
One of the requirements for a federated information system is interoperability, the ability of one computer system to access and use the resources of another system. This feature is particularly important in biomedical research systems, which need to coordinate a variety of disparate types of data. In order to meet this need, the National Cancer Institute Center for Bioinformatics (NCICB) has created the cancer Common Ontologic Representation Environment (caCORE), an interoperability infrastructure based on Model Driven Architecture. The caCORE infrastructure provides a mechanism to create interoperable biomedical information systems. Systems built using the caCORE paradigm address both aspects of interoperability: the ability to access data (syntactic interoperability) and understand the data once retrieved (semantic interoperability). This infrastructure consists of an integrated set of three major components: a controlled terminology service (Enterprise Vocabulary Services), a standards-based metadata repository (the cancer Data Standards Repository) and an information system with an Application Programming Interface (API) based on Domain Model Driven Architecture. This infrastructure is being leveraged to create a Semantic Service Oriented Architecture (SSOA) for cancer research by the National Cancer Institute’s cancer Biomedical Informatics Grid (caBIG™). PMID:17512259
Local, regional and national interoperability in hospital-level systems architecture.
Mykkänen, J; Korpela, M; Ripatti, S; Rannanheimo, J; Sorri, J
2007-01-01
Interoperability of applications in health care is faced with various needs by patients, health professionals, organizations and policy makers. A combination of existing and new applications is a necessity. Hospitals are in a position to drive many integration solutions, but need approaches which combine local, regional and national requirements and initiatives with open standards to support flexible processes and applications on a local hospital level. We discuss systems architecture of hospitals in relation to various processes and applications, and highlight current challenges and prospects using a service-oriented architecture approach. We also illustrate these aspects with examples from Finnish hospitals. A set of main services and elements of service-oriented architectures for health care facilities are identified, with medium-term focus which acknowledges existing systems as a core part of service-oriented solutions. The services and elements are grouped according to functional and interoperability cohesion. A transition towards service-oriented architecture in health care must acknowledge existing health information systems and promote the specification of central processes and software services locally and across organizations. Software industry best practices such as SOA must be combined with health care knowledge to respond to central challenges such as continuous change in health care. A service-oriented approach cannot entirely rely on common standards and frameworks but it must be locally adapted and complemented.
A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation
NASA Astrophysics Data System (ADS)
da Silva Junior, José Ricardo; Gonzalez Clua, Esteban W.; Montenegro, Anselmo; Lage, Marcos; Dreux, Marcelo de Andrade; Joselli, Mark; Pagliosa, Paulo A.; Kuryla, Christine Lucille
2012-03-01
Computational fluid dynamics in simulation has become an important field not only for physics and engineering areas but also for simulation, computer graphics, virtual reality and even video game development. Many efficient models have been developed over the years, but when many contact interactions must be processed, most models present difficulties or cannot achieve real-time results when executed. The advent of parallel computing has enabled the development of many strategies for accelerating the simulations. Our work proposes a new system which uses some successful algorithms already proposed, as well as a data structure organisation based on a heterogeneous architecture using CPUs and GPUs, in order to process the simulation of the interaction of fluids and rigid bodies. This successfully results in a two-way interaction between them and their surrounding objects. As far as we know, this is the first work that presents a computational collaborative environment which makes use of two different paradigms of hardware architecture for this specific kind of problem. Since our method achieves real-time results, it is suitable for virtual reality, simulation and video game fluid simulation problems.
Origin and structure of major orogen-scale exhumed strike-slip
NASA Astrophysics Data System (ADS)
Cao, Shuyun; Neubauer, Franz
2016-04-01
The formation of major exhumed strike-slip faults represents one of the most important dynamic processes affecting the evolution of the Earth's lithosphere and surface. Detailed models of the potential initiation and properties and architecture of orogen-scale exhumed strike-slip faults and how these relate to exhumation are rare. In this study, we deal with key properties controlling the development of major exhumed strike-slip fault systems, which are equivalent to the deep crustal sections of active across fault zones. We also propose two dominant processes for the initiation of orogen-scale exhumed strike-slip faults: (1) pluton-controlled and (2) metamorphic core complex-controlled strike-slip faults. In these tectonic settings, the initiation of faults occurs by rheological weakening along hot-to-cool contacts and guides the overall displacement and ultimate exhumation. These processes result in a specific thermal and structural architecture of such faults. These types of strike-slip dominated fault zones are often subparallel to mountain ranges and expose a wide variety of mylonitic, cataclastic and non-cohesive fault rocks, which were formed at different structural levels of the crust during various stages of faulting. The high variety of distinctive fault rocks is a potential evidence for recognition of these types of strike-slip faults. Exhumation of mylonitic rocks is, therefore, a common feature of such reverse oblique-slip strike-slip faults, implying major transtensive and/or transpressive processes accompanying pure strike-slip motion during exhumation. Some orogen-scale strike-slip faults nucleate and initiate along rheologically weak zones, e.g. at granite intrusions, zones of low-strength minerals, thermally weakened crust due to ascending fluids, and lateral borders of hot metamorphic core complexes. A further mechanism is the juxtaposition of mechanically strong mantle lithosphere to hot asthenosphere in continental transform faults (e.g., San Andreas Fault, Alpine Fault in New Zealand) and transtensional rift zones such as the East African rift. In many cases, subsequent shortening exhumes such faults from depth to the surface. A major aspect of many exhumed strike-slip faults is its lateral thermal gradient induced by the juxtaposition of hot and cool levels of the crust controlling relevant properties of such fault zones, e.g. the overall fault architecture (e.g., fault core, damage zone, shear lenses, fault rocks) and the thermal structure. These properties and the overall fault architecture include strength of fault rocks, permeability and porosity, the hydrological regime, as well as the nature and origin of circulating hydrothermal fluids.
Dhara, Animesh; de Paula Baptista, Rodrigo; Kissinger, Jessica C; Snow, E Charles; Sinai, Anthony P
2017-11-21
The Toxoplasma genome encodes the capacity for distinct architectures underlying cell cycle progression in a life cycle stage-dependent manner. Replication in intermediate hosts occurs by endodyogeny, whereas a hybrid of schizogony and endopolygeny occurs in the gut of the definitive feline host. Here, we characterize the consequence of the loss of a cell cycle-regulated o varian tu mor (OTU family) deubiquitinase, OTUD3A of Toxoplasma gondii (TgOTUD3A; TGGT1_258780), in T. gondii tachyzoites. Rather than the mutation being detrimental, mutant parasites exhibited a fitness advantage, outcompeting the wild type. This phenotype was due to roughly one-third of TgOTUD3A-knockout (TgOTUD3A-KO) tachyzoites exhibiting deviations from endodyogeny by employing replication strategies that produced 3, 4, or 5 viable progeny within a gravid mother instead of the usual 2. We established the mechanistic basis underlying these altered replication strategies to be a dysregulation of centrosome duplication, causing a transient loss of stoichiometry between the inner and outer cores that resulted in a failure to terminate S phase at the attainment of 2N ploidy and/or the decoupling of mitosis and cytokinesis. The resulting dysregulation manifested as deviations in the normal transitions from S phase to mitosis (S/M) (endopolygeny-like) or M phase to cytokinesis (M/C) (schizogony-like). Notably, these imbalances are corrected prior to cytokinesis, resulting in the generation of normal progeny. Our findings suggest that decisions regarding the utilization of specific cell cycle architectures are controlled by a ubiquitin-mediated mechanism that is dependent on the absolute threshold levels of an as-yet-unknown target(s). Analysis of the TgOTUD3A-KO mutant provides new insights into mechanisms underlying the plasticity of apicomplexan cell cycle architecture. IMPORTANCE Replication by Toxoplasma gondii can occur by 3 distinct cell cycle architectures. Endodyogeny is used by asexual stages, while a hybrid of schizogony and endopolygeny is used by merozoites in the definitive feline host. Here, we establish that the disruption of an o varian- tu mor (OTU) family deubiquitinase, TgOTUD3A, in tachyzoites results in dysregulation of the mechanism controlling the selection of replication strategy in a subset of parasites. The mechanistic basis for these altered cell cycles lies in the unique biology of the bipartite centrosome that is associated with the transient loss of stoichiometry between the inner and outer centrosome cores in the TgOTUD3A-KO mutant. This highlights the importance of ubiquitin-mediated regulation in the transition from the nuclear to the budding phases of the cell cycle and provides new mechanistic insights into the regulation of the organization of the apicomplexan cell cycle. Copyright © 2017 Dhara et al.
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.
2014-10-01
For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
3D nanostar dimers with a sub-10-nm gap for single-/few-molecule surface-enhanced raman scattering.
Chirumamilla, Manohar; Toma, Andrea; Gopalakrishnan, Anisha; Das, Gobind; Zaccaria, Remo Proietti; Krahne, Roman; Rondanina, Eliana; Leoncini, Marco; Liberale, Carlo; De Angelis, Francesco; Di Fabrizio, Enzo
2014-04-16
Plasmonic nanostar-dimers, decoupled from the substrate, have been fabricated by combining electron-beam lithography and reactive-ion etching techniques. The 3D architecture, the sharp tips of the nanostars and the sub-10 nm gap size promote the formation of giant electric-field in highly localized hot-spots. The single/few molecule detection capability of the 3D nanostar-dimers has been demonstrated by Surface-Enhanced Raman Scattering. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Lamandé, Mathieu; Schjønning, Per; Dal Ferro, Nicola; Morari, Francesco
2017-04-01
Pore system architecture is a key feature for understanding physical, biological and chemical processes in soils. Development of visualisation technics, especially x-ray CT, during recent years has been useful in describing the complex relationships between soil architecture and soil functions. We believe that combining visualization with physical models is a step further towards a better understanding of these relationships. We conducted a concept study using natural, artificial and 3D-printed soil cores. Eight natural soil cores (100 cm3) were sampled in a cultivated stagnic Luvisol at two depths (topsoil and subsoil), representing contrasting soil pore systems. Cylinders (100 cm3) were produced from plastic or from autoclaved aerated concrete. Holes of diameters 1.5 and 3 mm were drilled in the cylinder direction for the plastic cylinder and for one of the AAC cylinders. All natural and artificial cores were scanned in a micro x-ray CT scanner at a resolution of 35 µm. The reconstructed image of each soil core was printed with 3D multijet printing technology at a resolution of 29 µm. In some reconstructed digital volumes of the natural soil cores, pores of different sizes (equivalent diameter of 35, 70, 100, and 200 µm) were removed before additional 3D printing. Effective air-filled porosity, Darcian air permeability, and oxygen diffusion were measured on all natural, artificial and printed cores. The comparison of the natural and the artificial cores emphasized the difference in pore architecture between topsoil (sponge like) and subsoil (dominated by large vertical macropores). This study showed the high potential of using printed soil cores for understanding soil pore functions. The results confirm the suitability of the Ball model partitioning the pore system into arterial, marginal and remote pores to describe effects of soil structure on gas transport.
Multi-core processing and scheduling performance in CMS
NASA Astrophysics Data System (ADS)
Hernández, J. M.; Evans, D.; Foulkes, S.
2012-12-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
NASA Astrophysics Data System (ADS)
Fiorani, D.; Acierno, M.
2017-05-01
The aim of the present research is to develop an instrument able to adequately support the conservation process by means of a twofold approach, based on both BIM environment and ontology formalisation. Although BIM has been successfully experimented within AEC (Architecture Engineering Construction) field, it has showed many drawbacks for architectural heritage. To cope with unicity and more generally complexity of ancient buildings, applications so far developed have shown to poorly adapt BIM to conservation design with unsatisfactory results (Dore, Murphy 2013; Carrara 2014). In order to combine achievements reached within AEC through BIM environment (design control and management) with an appropriate, semantically enriched and flexible The presented model has at its core a knowledge base developed through information ontologies and oriented around the formalization and computability of all the knowledge necessary for the full comprehension of the object of architectural heritage an its conservation. Such a knowledge representation is worked out upon conceptual categories defined above all within architectural criticism and conservation scope. The present paper aims at further extending the scope of conceptual modelling within cultural heritage conservation already formalized by the model. A special focus is directed on decay analysis and surfaces conservation project.
NASA Technical Reports Server (NTRS)
Smith, Dan
2007-01-01
The Goddard Mission Services Evolution Center, or GMSEC, was started in 2001 to create a new standard approach for managing GSFC missions. Standardized approaches in the past involved selecting and then integrating the most appropriate set of functional tools. Assumptions were made that "one size fits all" and that tool changes would not be necessary for many years. GMSEC took a very different approach and has proven to be very successful. The core of the GMSEC architecture consists of a publish/subscribe message bus, standardized message formats, and an Applications Programming Interface (API). The API supports multiple operating systems, programming languages and messaging middleware products. We use a GMSEC-developed free middleware for low-cost development. A high capacity, robust middleware is used for operations and a messaging system with a very small memory footprint is used for on-board flight software. Software components can use the standard message formats or develop adapters to convert from their native formats to the GMSEC formats. We do not want vendors to modify their core products. Over 50 software components are now available for use with the GMSEC architecture. Most available commercial telemetry and command systems, including the GMV hifly Satellite Control System, have been adapted to run in the GMSEC labs.
A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniquesmore » in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.« less
Reference Architecture for MNE 5 Technical System
2007-05-30
of being available in most experiments. Core Services A core set of applications whi directories, web portal and collaboration applications etc. A...classifications Messages (xml, JMS, content level…) Meta data filtering, who can initiate services Web browsing Collaboration & messaging Border...Exchange Ref Architecture for MNE5 Tech System.doc 9 of 21 audit logging Person and machine Data lev objects, web services, messages rification el
NASA Astrophysics Data System (ADS)
Rodriguez, M.; Brualla, L.
2018-04-01
Monte Carlo simulation of radiation transport is computationally demanding to obtain reasonably low statistical uncertainties of the estimated quantities. Therefore, it can benefit in a large extent from high-performance computing. This work is aimed at assessing the performance of the first generation of the many-integrated core architecture (MIC) Xeon Phi coprocessor with respect to that of a CPU consisting of a double 12-core Xeon processor in Monte Carlo simulation of coupled electron-photonshowers. The comparison was made twofold, first, through a suite of basic tests including parallel versions of the random number generators Mersenne Twister and a modified implementation of RANECU. These tests were addressed to establish a baseline comparison between both devices. Secondly, through the p DPM code developed in this work. p DPM is a parallel version of the Dose Planning Method (DPM) program for fast Monte Carlo simulation of radiation transport in voxelized geometries. A variety of techniques addressed to obtain a large scalability on the Xeon Phi were implemented in p DPM. Maximum scalabilities of 84 . 2 × and 107 . 5 × were obtained in the Xeon Phi for simulations of electron and photon beams, respectively. Nevertheless, in none of the tests involving radiation transport the Xeon Phi performed better than the CPU. The disadvantage of the Xeon Phi with respect to the CPU owes to the low performance of the single core of the former. A single core of the Xeon Phi was more than 10 times less efficient than a single core of the CPU for all radiation transport simulations.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-23
... Architecture Proposal Review Meetings and Webinars; Notice of Public Meeting AGENCY: Research and Innovative... Requirements and Architecture Proposal. The first meeting, June 28-30, 2011, 9 a.m.-4:30 p.m. at the University..., will walk through the review of System Requirements Specification and Architecture Proposal. The second...
Fast Image Subtraction Using Multi-cores and GPUs
NASA Astrophysics Data System (ADS)
Hartung, Steven; Shukla, H.
2013-01-01
Many important image processing techniques in astronomy require a massive number of computations per pixel. Among them is an image differencing technique known as Optimal Image Subtraction (OIS), which is very useful for detecting and characterizing transient phenomena. Like many image processing routines, OIS computations increase proportionally with the number of pixels being processed, and the number of pixels in need of processing is increasing rapidly. Utilizing many-core graphical processing unit (GPU) technology in a hybrid conjunction with multi-core CPU and computer clustering technologies, this work presents a new astronomy image processing pipeline architecture. The chosen OIS implementation focuses on the 2nd order spatially-varying kernel with the Dirac delta function basis, a powerful image differencing method that has seen limited deployment in part because of the heavy computational burden. This tool can process standard image calibration and OIS differencing in a fashion that is scalable with the increasing data volume. It employs several parallel processing technologies in a hierarchical fashion in order to best utilize each of their strengths. The Linux/Unix based application can operate on a single computer, or on an MPI configured cluster, with or without GPU hardware. With GPU hardware available, even low-cost commercial video cards, the OIS convolution and subtraction times for large images can be accelerated by up to three orders of magnitude.
Kryvenko, Oleksandr N
2017-06-01
There is limited literature on renal oncocytic neoplasms diagnosed on core biopsy. All renal oncocytic neoplasm core biopsies from 2006 to 2013 were, retrospectively, reviewed. Morphologic features and an immunohistochemical panel of CK7, c-KIT, and S100A1 were assessed. Concordance with resection diagnosis, statistical analysis including a random forest classification, and follow-up were recorded. The postimmunohistochemical diagnoses of 144 renal oncocytic core biopsies were favor oncocytoma (67%), favor renal cell carcinoma (RCC) (12%), and cannot exclude RCC (21%). Diagnosis was revised following immunohistochemistry in 7% of cases. The most common features for oncocytoma (excluding dense granular cytoplasm) were nested architecture, edematous stroma, binucleation and tubular architecture; the most common features for favor RCC were sheet-like architecture, nuclear pleomorphism, papillary architecture, and prominent cell borders. High nuclear grade, necrosis, extensive papillary architecture, raisinoid nuclei, and frequent mitoses were not seen in oncocytomas. Comparing the pathologist and random forest classification, the overall out-of-bag estimate of classification error dropped from 23% to 13% when favor RCC and cannot exclude RCC was combined into 1 category. Resection was performed in 19% (28 cases) with a 94% concordance (100% of favor RCC biopsies and 90% of cannot exclude RCC biopsies confirmed as RCC; 83% of favor oncocytomas confirmed); ablation in 23%; and surveillance in 46%. Follow-up was available in 92% (median follow-up, 33 months) with no adverse outcomes. Renal oncocytic neoplasms comprise a significant subset (16%) of all core biopsies, and the majority (78%) can be classified as favor oncocytoma or favor RCC. Copyright © 2017 Elsevier Inc. All rights reserved.
Xu, Shangjie; Luo, Ying; Haag, Rainer
2007-08-07
A simple general synthetic concept to build dendritic core-shell architectures with pH-labile linkers based on hyperbranched PEI cores and biocompatible PEG shells is presented. Using these dendritic core-shell architectures as nanocarriers, the encapsulation and transport of polar dyes of different sizes is studied. The results show that the acid-labile nanocarriers exhibit much higher transport capacities for dyes than unfunctionalized hyperbranched PEI. The cleavage of imine bonds and controlled release of the polar dyes revealed that weak acidic condition (pH approximately 5.0) could cleave the imine bonds linker and release the dyes up to five times faster than neutral conditions (pH = 7.4).
Simplified Parallel Domain Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
N-CET: Network-Centric Exploitation and Tracking
2009-10-01
DATES COVERED (From - To) October 2008 – August 2009 4 . TITLE AND SUBTITLE N-CET: NETWORK – CENTRIC EXPLOITATION AND TRACKING 5a. CONTRACT NUMBER...At the core of N-CET are information management services that decouple data producers and consumers , allowing reconfiguration to suit mission needs...Shown around the head-node are different pieces of hardware including the Sony PlayStation R©3 (PS3) nodes used for computationally demanding tasks
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gosink, Luke; Wu, Kesheng; Bethel, E. Wes
2009-06-02
The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less
NASA Astrophysics Data System (ADS)
Rumbaugh, Roy N.; Grealish, Kevin; Kacir, Tom; Arsenault, Barry; Murphy, Robert H.; Miller, Scott
2003-09-01
A new 4th generation MicroIR architecture is introduced as the latest in the highly successful Standard Camera Core (SCC) series by BAE SYSTEMS to offer an infrared imaging engine with greatly reduced size, weight, power, and cost. The advanced SCC500 architecture provides great flexibility in configuration to include multiple resolutions, an industry standard Real Time Operating System (RTOS) for customer specific software application plug-ins, and a highly modular construction for unique physical and interface options. These microbolometer based camera cores offer outstanding and reliable performance over an extended operating temperature range to meet the demanding requirements of real-world environments. A highly integrated lens and shutter is included in the new SCC500 product enabling easy, drop-in camera designs for quick time-to-market product introductions.
Decoupling global biases and local interactions between cell biological variables
Zaritsky, Assaf; Obolski, Uri; Gan, Zhuo; Reis, Carlos R; Kadlecova, Zuzana; Du, Yi; Schmid, Sandra L; Danuser, Gaudenz
2017-01-01
Analysis of coupled variables is a core concept of cell biological inference, with co-localization of two molecules as a proxy for protein interaction being a ubiquitous example. However, external effectors may influence the observed co-localization independently from the local interaction of two proteins. Such global bias, although biologically meaningful, is often neglected when interpreting co-localization. Here, we describe DeBias, a computational method to quantify and decouple global bias from local interactions between variables by modeling the observed co-localization as the cumulative contribution of a global and a local component. We showcase four applications of DeBias in different areas of cell biology, and demonstrate that the global bias encapsulates fundamental mechanistic insight into cellular behavior. The DeBias software package is freely accessible online via a web-server at https://debias.biohpc.swmed.edu. DOI: http://dx.doi.org/10.7554/eLife.22323.001 PMID:28287393
SMT-Aware Instantaneous Footprint Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Probir; Liu, Xu; Song, Shuaiwen
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
2015-06-01
very coarse architectural model proposed in Section 2.4 into something that might be implemented . Figure 11 shows the model we have created based ...interoperability through common data models . So many of the pieces are either in place or are being developed currently. However, SEA still needs: • A core...of knowledge derived through the scientific method. In NATO, S&T is addressed using different business models , namely a collaborative business model
2014-09-30
portability is difficult to achieve on future supercomputers that use various type of accelerators (GPUs, Xeon - Phi , and SIMD etc). All of these...bottlenecks of NUMA. For example, in the CG code the state vector was originally stored as q(1 : Nvar ,1 : Npoin) where Nvar are the number of...a Global Grid Point (GGP) storage. On the other hand, in the DG code the state vector is typically stored as q(1 : Nvar ,1 : Npts,1 : Nelem) where
Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures
Bryson, David M.; Ofria, Charles
2013-01-01
We investigate fundamental decisions in the design of instruction set architectures for linear genetic programs that are used as both model systems in evolutionary biology and underlying solution representations in evolutionary computation. We subjected digital organisms with each tested architecture to seven different computational environments designed to present a range of evolutionary challenges. Our goal was to engineer a general purpose architecture that would be effective under a broad range of evolutionary conditions. We evaluated six different types of architectural features for the virtual CPUs: (1) genetic flexibility: we allowed digital organisms to more precisely modify the function of genetic instructions, (2) memory: we provided an increased number of registers in the virtual CPUs, (3) decoupled sensors and actuators: we separated input and output operations to enable greater control over data flow. We also tested a variety of methods to regulate expression: (4) explicit labels that allow programs to dynamically refer to specific genome positions, (5) position-relative search instructions, and (6) multiple new flow control instructions, including conditionals and jumps. Each of these features also adds complication to the instruction set and risks slowing evolution due to epistatic interactions. Two features (multiple argument specification and separated I/O) demonstrated substantial improvements in the majority of test environments, along with versions of each of the remaining architecture modifications that show significant improvements in multiple environments. However, some tested modifications were detrimental, though most exhibit no systematic effects on evolutionary potential, highlighting the robustness of digital evolution. Combined, these observations enhance our understanding of how instruction architecture impacts evolutionary potential, enabling the creation of architectures that support more rapid evolution of complex solutions to a broad range of challenges. PMID:24376669
NASA Astrophysics Data System (ADS)
Parlett, Christopher M. A.; Isaacs, Mark A.; Beaumont, Simon K.; Bingham, Laura M.; Hondow, Nicole S.; Wilson, Karen; Lee, Adam F.
2016-02-01
The chemical functionality within porous architectures dictates their performance as heterogeneous catalysts; however, synthetic routes to control the spatial distribution of individual functions within porous solids are limited. Here we report the fabrication of spatially orthogonal bifunctional porous catalysts, through the stepwise template removal and chemical functionalization of an interconnected silica framework. Selective removal of polystyrene nanosphere templates from a lyotropic liquid crystal-templated silica sol-gel matrix, followed by extraction of the liquid crystal template, affords a hierarchical macroporous-mesoporous architecture. Decoupling of the individual template extractions allows independent functionalization of macropore and mesopore networks on the basis of chemical and/or size specificity. Spatial compartmentalization of, and directed molecular transport between, chemical functionalities affords control over the reaction sequence in catalytic cascades; herein illustrated by the Pd/Pt-catalysed oxidation of cinnamyl alcohol to cinnamic acid. We anticipate that our methodology will prompt further design of multifunctional materials comprising spatially compartmentalized functions.
Transition in Gas Turbine Control System Architecture: Modular, Distributed, and Embedded
NASA Technical Reports Server (NTRS)
Culley, Dennis
2010-01-01
Controls systems are an increasingly important component of turbine-engine system technology. However, as engines become more capable, the control system itself becomes ever more constrained by the inherent environmental conditions of the engine; a relationship forced by the continued reliance on commercial electronics technology. A revolutionary change in the architecture of turbine-engine control systems will change this paradigm and result in fully distributed engine control systems. Initially, the revolution will begin with the physical decoupling of the control law processor from the hostile engine environment using a digital communications network and engine-mounted high temperature electronics requiring little or no thermal control. The vision for the evolution of distributed control capability from this initial implementation to fully distributed and embedded control is described in a roadmap and implementation plan. The development of this plan is the result of discussions with government and industry stakeholders
Space Generic Open Avionics Architecture (SGOAA): Overview
NASA Technical Reports Server (NTRS)
Wray, Richard B.; Stovall, John R.
1992-01-01
A space generic open avionics architecture created for NASA is described. It will serve as the basis for entities in spacecraft core avionics, capable of being tailored by NASA for future space program avionics ranging from small vehicles such as Moon ascent/descent vehicles to large ones such as Mars transfer vehicles or orbiting stations. The standard consists of: (1) a system architecture; (2) a generic processing hardware architecture; (3) a six class architecture interface model; (4) a system services functional subsystem architectural model; and (5) an operations control functional subsystem architectural model.
Synthesis of multimetallic nanoparticles by seeded methods
NASA Astrophysics Data System (ADS)
Weiner, Rebecca Gayle
This dissertation focuses on the synthesis of metal nanocrystals (NCs) by seeded methods, in which preformed seeds serve as platforms for growth. Metal NCs are of interest due to their tunable optical and catalytic properties, which arise from their composition and crystallite size and shape. Moreover, multimetallic NCs are potentially multifunctional due to the integration of the properties of each metal within one structure. However, such structures are difficult to synthesize with structural definition due to differences in precursor reduction rates and the size-dependent solubility of bimetallic phases. Seed-mediated co-reduction (SMCR) is a method developed in the Skrabalak Laboratory that couples the advantages of a seeded method with co-reduction methods to achieve multimetallic nanomaterials with defined shape and architecture. This approach was originally demonstrated in a model Au-Pd system in which Au and Pd precursors were simultaneously reduced to deposit metal onto shape-controlled Au or Pd NC seeds. Using SMCR, uniformly branched core shell Au Au-Pd and Pd Au-Pd NCs were synthesized, with the shape of the seeds directing the symmetry of the final structures. By varying the seed shape and the temperature at which metal deposition occurs, the roles of adatom diffusion and seed shape on final NC morphology were decoupled. Moreover, by selecting seeds of a composition (Ag) different than the depositing metals (Au and Pd), trimetallic nanostructures are possible, including shape-controlled Ag Au-Pd NCs and hollow Au-Pd-Ag nanoparticles (NPs). The latter architecture arises through galvanic replacement. Shape-controlled core shell NCs with trimetallic shells are also possible by co-reducing three metal precursors (Ag, Au, and Pd) with shape-controlled Au seeds; for example, convex octopods, concave cubes, and truncated octahedra were achieved in this initial demonstration and was enabled by varying the ratio of Ag to Au/Pd in the overgrowth step as well as reaction pH. Ultimately, the final multimetallic nanostructure depends on the kinetics of metal deposition as well as seed composition, shape, reactivity, and crystallinity. In elucidating the roles of these parameters in nanomaterial synthesis, the rational design of new functional NCs becomes possible, which capitalize on the unique optical and catalytic properties of structurally defined multimetallic structures. In fact, branched Au-Pd NCs with high symmetry were found to be effective refractive index-based hydrogen sensors.
Arc4nix: A cross-platform geospatial analytical library for cluster and cloud computing
NASA Astrophysics Data System (ADS)
Tang, Jingyin; Matyas, Corene J.
2018-02-01
Big Data in geospatial technology is a grand challenge for processing capacity. The ability to use a GIS for geospatial analysis on Cloud Computing and High Performance Computing (HPC) clusters has emerged as a new approach to provide feasible solutions. However, users lack the ability to migrate existing research tools to a Cloud Computing or HPC-based environment because of the incompatibility of the market-dominating ArcGIS software stack and Linux operating system. This manuscript details a cross-platform geospatial library "arc4nix" to bridge this gap. Arc4nix provides an application programming interface compatible with ArcGIS and its Python library "arcpy". Arc4nix uses a decoupled client-server architecture that permits geospatial analytical functions to run on the remote server and other functions to run on the native Python environment. It uses functional programming and meta-programming language to dynamically construct Python codes containing actual geospatial calculations, send them to a server and retrieve results. Arc4nix allows users to employ their arcpy-based script in a Cloud Computing and HPC environment with minimal or no modification. It also supports parallelizing tasks using multiple CPU cores and nodes for large-scale analyses. A case study of geospatial processing of a numerical weather model's output shows that arcpy scales linearly in a distributed environment. Arc4nix is open-source software.
Managing the Evolution of an Enterprise Architecture using a MAS-Product-Line Approach
NASA Technical Reports Server (NTRS)
Pena, Joaquin; Hinchey, Michael G.; Resinas, manuel; Sterritt, Roy; Rash, James L.
2006-01-01
We view an evolutionary system ns being n software product line. The core architecture is the unchanging part of the system, and each version of the system may be viewed as a product from the product line. Each "product" may be described as the core architecture with sonre agent-based additions. The result is a multiagent system software product line. We describe an approach to such n Software Product Line-based approach using the MaCMAS Agent-Oriented nzethoclology. The approach scales to enterprise nrchitectures as a multiagent system is an approprinre means of representing a changing enterprise nrchitectclre nnd the inferaction between components in it.
Adaptive Code Division Multiple Access Protocol for Wireless Network-on-Chip Architectures
NASA Astrophysics Data System (ADS)
Vijayakumaran, Vineeth
Massive levels of integration following Moore's Law ushered in a paradigm shift in the way on-chip interconnections were designed. With higher and higher number of cores on the same die traditional bus based interconnections are no longer a scalable communication infrastructure. On-chip networks were proposed enabled a scalable plug-and-play mechanism for interconnecting hundreds of cores on the same chip. Wired interconnects between the cores in a traditional Network-on-Chip (NoC) system, becomes a bottleneck with increase in the number of cores thereby increasing the latency and energy to transmit signals over them. Hence, there has been many alternative emerging interconnect technologies proposed, namely, 3D, photonic and multi-band RF interconnects. Although they provide better connectivity, higher speed and higher bandwidth compared to wired interconnects; they also face challenges with heat dissipation and manufacturing difficulties. On-chip wireless interconnects is one other alternative proposed which doesn't need physical interconnection layout as data travels over the wireless medium. They are integrated into a hybrid NOC architecture consisting of both wired and wireless links, which provides higher bandwidth, lower latency, lesser area overhead and reduced energy dissipation in communication. However, as the bandwidth of the wireless channels is limited, an efficient media access control (MAC) scheme is required to enhance the utilization of the available bandwidth. This thesis proposes using a multiple access mechanism such as Code Division Multiple Access (CDMA) to enable multiple transmitter-receiver pairs to send data over the wireless channel simultaneously. It will be shown that such a hybrid wireless NoC with an efficient CDMA based MAC protocol can significantly increase the performance of the system while lowering the energy dissipation in data transfer. In this work it is shown that the wireless NoC with the proposed CDMA based MAC protocol outperformed the wired counterparts and several other wireless architectures proposed in literature in terms of bandwidth and packet energy dissipation. Significant gains were observed in packet energy dissipation and bandwidth even with scaling the system to higher number of cores. Non-uniform traffic simulations showed that the proposed CDMA-WiNoC was consistent in bandwidth across all traffic patterns. It is also shown that the CDMA based MAC scheme does not introduce additional reliability concerns in data transfer over the on-chip wireless interconnects.
NASA Astrophysics Data System (ADS)
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-07-01
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes.
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-07-28
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes.
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-01-01
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes. PMID:27465296
A Multi-Agent System Architecture for Sensor Networks
Fuentes-Fernández, Rubén; Guijarro, María; Pajares, Gonzalo
2009-01-01
The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work. PMID:22303172
A multi-agent system architecture for sensor networks.
Fuentes-Fernández, Rubén; Guijarro, María; Pajares, Gonzalo
2009-01-01
The design of the control systems for sensor networks presents important challenges. Besides the traditional problems about how to process the sensor data to obtain the target information, engineers need to consider additional aspects such as the heterogeneity and high number of sensors, and the flexibility of these networks regarding topologies and the sensors in them. Although there are partial approaches for resolving these issues, their integration relies on ad hoc solutions requiring important development efforts. In order to provide an effective approach for this integration, this paper proposes an architecture based on the multi-agent system paradigm with a clear separation of concerns. The architecture considers sensors as devices used by an upper layer of manager agents. These agents are able to communicate and negotiate services to achieve the required functionality. Activities are organized according to roles related with the different aspects to integrate, mainly sensor management, data processing, communication and adaptation to changes in the available devices and their capabilities. This organization largely isolates and decouples the data management from the changing network, while encouraging reuse of solutions. The use of the architecture is facilitated by a specific modelling language developed through metamodelling. A case study concerning a generic distributed system for fire fighting illustrates the approach and the comparison with related work.
Multi-core processing and scheduling performance in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hernandez, J. M.; Evans, D.; Foulkes, S.
2012-01-01
Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resultingmore » in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.« less
Recent Developments in Hardware-in-the-Loop Formation Navigation and Control
NASA Technical Reports Server (NTRS)
Mitchell, Jason W.; Luquette, Richard J.
2005-01-01
The Formation Flying Test-Bed (FFTB) at NASA Goddard Space Flight Center (GSFC) provides a hardware-in-the-loop test environment for formation navigation and control. The facility is evolving as a modular, hybrid, dynamic simulation facility for end-tc-end guidance, navigation, and control (GN&C) design and analysis of formation flying spacecraft. The core capabilities of the FFTB, as a platform for testing critical hardware and software algorithms in-the-loop, are reviewed with a focus on many recent improvements. Two significant upgrades to the FFTB are a message-oriented middleware (MOM) architecture, and a software crosslink for inter-spacecraft ranging. The MOM architecture provides a common messaging bus for software agents, easing integration, arid supporting the GSFC Mission Services Evolution Center (GMSEC) architecture via software bridge. Additionally, the FFTB s hardware capabilities are expanding. Recently, two Low-Power Transceivers (LPTs) with ranging capability have been introduced into the FFTB. The LPT crosslinks will be connected to a modified Crosslink Channel Simulator (CCS), which applies realistic space-environment effects to the Radio Frequency (RF) signals produced by the LPTs.
Field studies in architectural acoustics using Tablet PCs
NASA Astrophysics Data System (ADS)
Boye, Daniel
2005-04-01
Core requirements for the sciences within the liberal arts curriculum challenge students to become directly involved in scientific study. These requirements seek to develop scientifically literate leaders and members of society. Formal laboratory periods are not usually associated with these courses. Thus, conceptual discovery and quantitative experimentation must take place outside of the classroom. Physics 115: Musical Technology at Davidson College is such a course and contains a section dealing with architectural acoustics. Field studies in the past have been an awkward and cumbersome activity, especially for non-science majors. The emerging technology of Tablet PCs overcomes many of the problems of mobile data acquisition and analysis, and allows the students to determine the locations of the rooms to be studied. The impulse method for determining reverberation time is used and compared with calculations based on room size and absorption media. The use of Tablet PCs and the publicly available freeware Audacity in field studies investigating architectural acoustics will be discussed. [Work supported in part by the Associated Colleges of the South through their Technology Fellowship program.
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.
Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon
2008-07-28
Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Tier-2 Optimisation for Computational Density/Diversity and Big Data
NASA Astrophysics Data System (ADS)
Fay, R. B.; Bland, J.
2014-06-01
As the number of cores on chip continues to trend upwards and new CPU architectures emerge, increasing CPU density and diversity presents multiple challenges to site administrators. These include scheduling for massively multi-core systems (potentially including Graphical Processing Units (GPU), integrated and dedicated) and Many Integrated Core (MIC)) to ensure a balanced throughput of jobs while preserving overall cluster throughput, as well as the increasing complexity of developing for these heterogeneous platforms, and the challenge in managing this more complex mix of resources. In addition, meeting data demands as both dataset sizes increase and as the rate of demand scales with increased computational power requires additional performance from the associated storage elements. In this report, we evaluate one emerging technology, Solid State Drive (SSD) caching for RAID controllers, with consideration to its potential to assist in meeting evolving demand. We also briefly consider the broader developing trends outlined above in order to identify issues that may develop and assess what actions should be taken in the immediate term to address those.
Evolutionary dynamics of protein domain architecture in plants
2012-01-01
Background Protein domains are the structural, functional and evolutionary units of the protein. Protein domain architectures are the linear arrangements of domain(s) in individual proteins. Although the evolutionary history of protein domain architecture has been extensively studied in microorganisms, the evolutionary dynamics of domain architecture in the plant kingdom remains largely undefined. To address this question, we analyzed the lineage-based protein domain architecture content in 14 completed green plant genomes. Results Our analyses show that all 14 plant genomes maintain similar distributions of species-specific, single-domain, and multi-domain architectures. Approximately 65% of plant domain architectures are universally present in all plant lineages, while the remaining architectures are lineage-specific. Clear examples are seen of both the loss and gain of specific protein architectures in higher plants. There has been a dynamic, lineage-wise expansion of domain architectures during plant evolution. The data suggest that this expansion can be largely explained by changes in nuclear ploidy resulting from rounds of whole genome duplications. Indeed, there has been a decrease in the number of unique domain architectures when the genomes were normalized into a presumed ancestral genome that has not undergone whole genome duplications. Conclusions Our data show the conservation of universal domain architectures in all available plant genomes, indicating the presence of an evolutionarily conserved, core set of protein components. However, the occurrence of lineage-specific domain architectures indicates that domain architecture diversity has been maintained beyond these core components in plant genomes. Although several features of genome-wide domain architecture content are conserved in plants, the data clearly demonstrate lineage-wise, progressive changes and expansions of individual protein domain architectures, reinforcing the notion that plant genomes have undergone dynamic evolution. PMID:22252370
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, Hariswaran; Grout, Ray W
This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
Shultz, Randall W.; Tatineni, Vinaya M.; Hanley-Bowdoin, Linda; Thompson, William F.
2007-01-01
Core DNA replication proteins mediate the initiation, elongation, and Okazaki fragment maturation functions of DNA replication. Although this process is generally conserved in eukaryotes, important differences in the molecular architecture of the DNA replication machine and the function of individual subunits have been reported in various model systems. We have combined genome-wide bioinformatic analyses of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) with published experimental data to provide a comprehensive view of the core DNA replication machinery in plants. Many components identified in this analysis have not been studied previously in plant systems, including the GINS (go ichi ni san) complex (PSF1, PSF2, PSF3, and SLD5), MCM8, MCM9, MCM10, NOC3, POLA2, POLA3, POLA4, POLD3, POLD4, and RNASEH2. Our results indicate that the core DNA replication machinery from plants is more similar to vertebrates than single-celled yeasts (Saccharomyces cerevisiae), suggesting that animal models may be more relevant to plant systems. However, we also uncovered some important differences between plants and vertebrate machinery. For example, we did not identify geminin or RNASEH1 genes in plants. Our analyses also indicate that plants may be unique among eukaryotes in that they have multiple copies of numerous core DNA replication genes. This finding raises the question of whether specialized functions have evolved in some cases. This analysis establishes that the core DNA replication machinery is highly conserved across plant species and displays many features in common with other eukaryotes and some characteristics that are unique to plants. PMID:17556508
Bonsai: an event-based framework for processing and controlling data streams
Lopes, Gonçalo; Bonacchi, Niccolò; Frazão, João; Neto, Joana P.; Atallah, Bassam V.; Soares, Sofia; Moreira, Luís; Matias, Sara; Itskov, Pavel M.; Correia, Patrícia A.; Medina, Roberto E.; Calcaterra, Lorenza; Dreosti, Elena; Paton, Joseph J.; Kampff, Adam R.
2015-01-01
The design of modern scientific experiments requires the control and monitoring of many different data streams. However, the serial execution of programming instructions in a computer makes it a challenge to develop software that can deal with the asynchronous, parallel nature of scientific data. Here we present Bonsai, a modular, high-performance, open-source visual programming framework for the acquisition and online processing of data streams. We describe Bonsai's core principles and architecture and demonstrate how it allows for the rapid and flexible prototyping of integrated experimental designs in neuroscience. We specifically highlight some applications that require the combination of many different hardware and software components, including video tracking of behavior, electrophysiology and closed-loop control of stimulation. PMID:25904861
Proscene: A feature-rich framework for interactive environments
NASA Astrophysics Data System (ADS)
Charalambos, Jean Pierre
We introduce Proscene, a feature-rich, open-source framework for interactive environments. The design of Proscene comprises a three-layered onion-like software architecture, promoting different possible development scenarios. The framework innermost layer decouples user gesture parsing from user-defined actions. The in-between layer implements a feature-rich set of widely-used motion actions allowing the selection and manipulation of objects, including the scene viewpoint. The outermost layer exposes those features as a Processing library. The results have shown the feasibility of our approach together with the simplicity and flexibility of the Proscene framework API.
Leveraging of Open EMR Architecture for Clinical Trial Accrual
Afrin, Lawrence B.; Oates, James C.; Boyd, Caroline K.; Daniels, Mark S.
2003-01-01
Accrual to clinical trials is a major bottleneck in scientific progress in clinical medicine. Many methods for identifying potential subjects and improving accrual have been pursued; few have succeeded, and none have proven generally reproducible or scalable. We leveraged the open architecture of the core clinical data repository of our electronic medical record system to prototype a solution for this problem in a manner consistent with contemporary regulations and research ethics. We piloted the solution with a local investigator-initiated trial for which candidate identification was expected to be difficult. Key results in the eleven months of experience to date include automated screening of 7,296,708 lab results from 69,288 patients, detection of 1,768 screening tests of interest, identification of 70 potential candidates who met all further automated criteria, and accrual of three candidates to the trial. Hypotheses for this disappointing impact on accrual, and directions for future research, are discussed. PMID:14728125
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...
2017-07-03
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek
2016-10-06
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Space Generic Open Avionics Architecture (SGOAA) standard specification
NASA Technical Reports Server (NTRS)
Wray, Richard B.; Stovall, John R.
1994-01-01
This standard establishes the Space Generic Open Avionics Architecture (SGOAA). The SGOAA includes a generic functional model, processing structural model, and an architecture interface model. This standard defines the requirements for applying these models to the development of spacecraft core avionics systems. The purpose of this standard is to provide an umbrella set of requirements for applying the generic architecture models to the design of a specific avionics hardware/software processing system. This standard defines a generic set of system interface points to facilitate identification of critical services and interfaces. It establishes the requirement for applying appropriate low level detailed implementation standards to those interfaces points. The generic core avionics functions and processing structural models provided herein are robustly tailorable to specific system applications and provide a platform upon which the interface model is to be applied.
Estimation and Control for Autonomous Coring from a Rover Manipulator
NASA Technical Reports Server (NTRS)
Hudson, Nicolas; Backes, Paul; DiCicco, Matt; Bajracharya, Max
2010-01-01
A system consisting of a set of estimators and autonomous behaviors has been developed which allows robust coring from a low-mass rover platform, while accommodating for moderate rover slip. A redundant set of sensors, including a force-torque sensor, visual odometry, and accelerometers are used to monitor discrete critical and operational modes, as well as to estimate continuous drill parameters during the coring process. A set of critical failure modes pertinent to shallow coring from a mobile platform is defined, and autonomous behaviors associated with each critical mode are used to maintain nominal coring conditions. Autonomous shallow coring is demonstrated from a low-mass rover using a rotary-percussive coring tool mounted on a 5 degree-of-freedom (DOF) arm. A new architecture of using an arm-stabilized, rotary percussive tool with the robotic arm used to provide the drill z-axis linear feed is validated. Particular attention to hole start using this architecture is addressed. An end-to-end coring sequence is demonstrated, where the rover autonomously detects and then recovers from a series of slip events that exceeded 9 cm total displacement.
Biocompatible magnetic core-shell nanocomposites for engineered magnetic tissues
NASA Astrophysics Data System (ADS)
Rodriguez-Arco, Laura; Rodriguez, Ismael A.; Carriel, Victor; Bonhome-Espinosa, Ana B.; Campos, Fernando; Kuzhir, Pavel; Duran, Juan D. G.; Lopez-Lopez, Modesto T.
2016-04-01
The inclusion of magnetic nanoparticles into biopolymer matrixes enables the preparation of magnetic field-responsive engineered tissues. Here we describe a synthetic route to prepare biocompatible core-shell nanostructures consisting of a polymeric core and a magnetic shell, which are used for this purpose. We show that using a core-shell architecture is doubly advantageous. First, gravitational settling for core-shell nanocomposites is slower because of the reduction of the composite average density connected to the light polymer core. Second, the magnetic response of core-shell nanocomposites can be tuned by changing the thickness of the magnetic layer. The incorporation of the composites into biopolymer hydrogels containing cells results in magnetic field-responsive engineered tissues whose mechanical properties can be controlled by external magnetic forces. Indeed, we obtain a significant increase of the viscoelastic moduli of the engineered tissues when exposed to an external magnetic field. Because the composites are functionalized with polyethylene glycol, the prepared bio-artificial tissue-like constructs also display excellent ex vivo cell viability and proliferation. When implanted in vivo, the engineered tissues show good biocompatibility and outstanding interaction with the host tissue. Actually, they only cause a localized transitory inflammatory reaction at the implantation site, without any effect on other organs. Altogether, our results suggest that the inclusion of magnetic core-shell nanocomposites into biomaterials would enable tissue engineering of artificial substitutes whose mechanical properties could be tuned to match those of the potential target tissue. In a wider perspective, the good biocompatibility and magnetic behavior of the composites could be beneficial for many other applications.The inclusion of magnetic nanoparticles into biopolymer matrixes enables the preparation of magnetic field-responsive engineered tissues. Here we describe a synthetic route to prepare biocompatible core-shell nanostructures consisting of a polymeric core and a magnetic shell, which are used for this purpose. We show that using a core-shell architecture is doubly advantageous. First, gravitational settling for core-shell nanocomposites is slower because of the reduction of the composite average density connected to the light polymer core. Second, the magnetic response of core-shell nanocomposites can be tuned by changing the thickness of the magnetic layer. The incorporation of the composites into biopolymer hydrogels containing cells results in magnetic field-responsive engineered tissues whose mechanical properties can be controlled by external magnetic forces. Indeed, we obtain a significant increase of the viscoelastic moduli of the engineered tissues when exposed to an external magnetic field. Because the composites are functionalized with polyethylene glycol, the prepared bio-artificial tissue-like constructs also display excellent ex vivo cell viability and proliferation. When implanted in vivo, the engineered tissues show good biocompatibility and outstanding interaction with the host tissue. Actually, they only cause a localized transitory inflammatory reaction at the implantation site, without any effect on other organs. Altogether, our results suggest that the inclusion of magnetic core-shell nanocomposites into biomaterials would enable tissue engineering of artificial substitutes whose mechanical properties could be tuned to match those of the potential target tissue. In a wider perspective, the good biocompatibility and magnetic behavior of the composites could be beneficial for many other applications. Electronic supplementary information (ESI) available. See DOI: 10.1039/c6nr00224b
An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm
Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En
2015-01-01
A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction. PMID:26287193
Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En
2015-08-13
A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction.
Mental models for cognitive control
NASA Astrophysics Data System (ADS)
Schilling, Malte; Cruse, Holk; Schmitz, Josef
2007-05-01
Even so called "simple" organisms as insects are able to fastly adapt to changing conditions of their environment. Their behaviour is affected by many external influences and only its variability and adaptivity permits their survival. An intensively studied example concerns hexapod walking. 1,2 Complex walking behaviours in stick insects have been analysed and the results were used to construct a reactive model that controls walking in a robot. This model is now extended by higher levels of control: as a bottom-up approach the low-level reactive behaviours are modulated and activated through a medium level. In addition, the system grows up to an upper level for cognitive control of the robot: Cognition - as the ability to plan ahead - and cognitive skills involve internal representations of the subject itself and its environment. These representations are used for mental simulations: In difficult situations, for which neither motor primitives, nor whole sequences of these exist, available behaviours are varied and applied in the internal model while the body itself is decoupled from the controlling modules. The result of the internal simulation is evaluated. Successful actions are learned and applied to the robot. This constitutes a level for planning. Its elements (movements, behaviours) are embodied in the lower levels, whereby their meaning arises directly from these levels. The motor primitives are situation models represented as neural networks. The focus of this work concerns the general architecture of the framework as well as the reactive basic layer of the bottom-up architecture and its connection to higher level functions and its application on an internal model.
Development of the RANCOR Rotary-Percussive Coring System for Mars Sample Return
NASA Technical Reports Server (NTRS)
Paulsen, Gale; Indyk, Stephen; Zacny, Kris
2014-01-01
A RANCOR drill was designed to fit a Mars Exploration Rover (MER) class vehicle. The low mass of 3 kg was achieved by using the same actuator for three functions: rotation, percussions, and core break-off. Initial testing of the drill exposed an unexpected behavior of an off-the-shelf sprag clutch used to couple and decouple rotary-percussive function from the core break off function. Failure of the sprag was due to the vibration induced during percussive drilling. The sprag clutch would back drive in conditions where it was expected to hold position. Although this did not affect the performance of the drill, it nevertheless reduced the quality of the cores produced. Ultimately, the sprag clutch was replaced with a custom ratchet system that allowed for some angular displacement without advancing in either direction. Replacing the sprag with the ratchet improved the collected core quality. Also, premature failure of a 300-series stainless steel percussion spring was observed. The 300-series percussion spring was ultimately replaced with a music wire spring based on performances of previously designed rotary-percussive drill systems.
FPGA-based RF spectrum merging and adaptive hopset selection
NASA Astrophysics Data System (ADS)
McLean, R. K.; Flatley, B. N.; Silvius, M. D.; Hopkinson, K. M.
The radio frequency (RF) spectrum is a limited resource. Spectrum allotment disputes stem from this scarcity as many radio devices are confined to a fixed frequency or frequency sequence. One alternative is to incorporate cognition within a reconfigurable radio platform, therefore enabling the radio to adapt to dynamic RF spectrum environments. In this way, the radio is able to actively sense the RF spectrum, decide, and act accordingly, thereby sharing the spectrum and operating in more flexible manner. In this paper, we present a novel solution for merging many distributed RF spectrum maps into one map and for subsequently creating an adaptive hopset. We also provide an example of our system in operation, the result of which is a pseudorandom adaptive hopset. The paper then presents a novel hardware design for the frequency merger and adaptive hopset selector, both of which are written in VHDL and implemented as a custom IP core on an FPGA-based embedded system using the Xilinx Embedded Development Kit (EDK) software tool. The design of the custom IP core is optimized for area, and it can process a high-volume digital input via a low-latency circuit architecture. The complete embedded system includes the Xilinx PowerPC microprocessor, UART serial connection, and compact flash memory card IP cores, and our custom map merging/hopset selection IP core, all of which are targeted to the Virtex IV FPGA. This system is then incorporated into a cognitive radio prototype on a Rice University Wireless Open Access Research Platform (WARP) reconfigurable radio.
NASA Technical Reports Server (NTRS)
Hinchey, Michael G. (Inventor); Rash, James L. (Inventor); Pena, Joaquin (Inventor)
2011-01-01
Systems, methods and apparatus are provided through which an evolutionary system is managed and viewed as a software product line. In some embodiments, the core architecture is a relatively unchanging part of the system, and each version of the system is viewed as a product from the product line. Each software product is generated from the core architecture with some agent-based additions. The result may be a multi-agent system software product line.
2012-07-01
and Avoid ( SAA ) testbed that provides some of the core services . This paper describes the general architecture and a SAA testbed implementation that...that provides data and software services to enable a set of Unmanned Aircraft (UA) platforms to operate in a wide range of air domains which may...implemented by MIT Lincoln Laboratory in the form of a Sense and Avoid ( SAA ) testbed that provides some of the core services . This paper describes the general
Programming for 1.6 Millon cores: Early experiences with IBM's BG/Q SMP architecture
NASA Astrophysics Data System (ADS)
Glosli, James
2013-03-01
With the stall in clock cycle improvements a decade ago, the drive for computational performance has continues along a path of increasing core counts on a processor. The multi-core evolution has been expressed in both a symmetric multi processor (SMP) architecture and cpu/GPU architecture. Debates rage in the high performance computing (HPC) community which architecture best serves HPC. In this talk I will not attempt to resolve that debate but perhaps fuel it. I will discuss the experience of exploiting Sequoia, a 98304 node IBM Blue Gene/Q SMP at Lawrence Livermore National Laboratory. The advantages and challenges of leveraging the computational power BG/Q will be detailed through the discussion of two applications. The first application is a Molecular Dynamics code called ddcMD. This is a code developed over the last decade at LLNL and ported to BG/Q. The second application is a cardiac modeling code called Cardioid. This is a code that was recently designed and developed at LLNL to exploit the fine scale parallelism of BG/Q's SMP architecture. Through the lenses of these efforts I'll illustrate the need to rethink how we express and implement our computational approaches. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Spontaneous symmetry breaking in coupled parametrically driven waveguides.
Dror, Nir; Malomed, Boris A
2009-01-01
We introduce a system of linearly coupled parametrically driven damped nonlinear Schrödinger equations, which models a laser based on a nonlinear dual-core waveguide with parametric amplification symmetrically applied to both cores. The model may also be realized in terms of parallel ferromagnetic films, in which the parametric gain is provided by an external field. We analyze spontaneous symmetry breaking (SSB) of fundamental and multiple solitons in this system, which was not studied systematically before in linearly coupled dissipative systems with intrinsic nonlinearity. For fundamental solitons, the analysis reveals three distinct SSB scenarios. Unlike the standard dual-core-fiber model, the present system gives rise to a vast bistability region, which may be relevant to applications. Other noteworthy findings are restabilization of the symmetric soliton after it was destabilized by the SSB bifurcation, and the existence of a generic situation with all solitons unstable in the single-component (decoupled) model, while both symmetric and asymmetric solitons may be stable in the coupled system. The stability of the asymmetric solitons is identified via direct simulations, while for symmetric and antisymmetric ones the stability is verified too through the computation of stability eigenvalues, families of antisymmetric solitons being entirely unstable. In this way, full stability maps for the symmetric solitons are produced. We also investigate the SSB bifurcation of two-soliton bound states (it breaks the symmetry between the two components, while the two peaks in the shape of the soliton remain mutually symmetric). The family of the asymmetric double-peak states may decouple from its symmetric counterpart, being no longer connected to it by the bifurcation, with a large portion of the asymmetric family remaining stable.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
Vincenti, H.; Lobet, M.; Lehe, R.; ...
2016-09-19
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vincenti, H.; Lobet, M.; Lehe, R.
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
Cool Apps: Building Cryospheric Data Applications With Standards-Based Service Oriented Architecture
NASA Astrophysics Data System (ADS)
Collins, J. A.; Truslove, I.; Billingsley, B. W.; Oldenburg, J.; Brodzik, M.; Lewis, S.; Liu, M.
2012-12-01
The National Snow and Ice Data Center (NSIDC) holds a large collection of cryospheric data, and is involved in a number of informatics research and development projects aimed at improving the discoverability and accessibility of these data. To develop high-quality software in a timely manner, we have adopted a Service-Oriented Architecture (SOA) approach for our core technical infrastructure development. Data services at NSIDC are internally exposed to other tools and applications through standards-based service interfaces. These standards include OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), various OGC (Open Geospatial Consortium) standards including WMS (Web Map Service) and WFS (Web Feature Service), ESIP (Federation of Earth Sciences Information Partners) OpenSearch, and NSIDC-specific RESTful services. By taking a standards-based approach, we are able to use off-the-shelf tools and libraries to consume, translate and broker these data services, and thus develop applications faster. Additionally, by exposing public interfaces to these services we provide valuable data services to technical collaborators; for example, NASA Reverb (http://reverb.echo.nasa.gov) uses NSIDC's WMS services. Our latest generation of web applications consume these data services directly. The most complete example of this is the Operation IceBridge Data Portal (http://nsidc.org/icebridge/portal) which depends on many of the aforementioned services, and clearly exhibits many of the advantages of building applications atop a service-oriented architecture. This presentation outlines the architectural approach and components and open standards and protocols adopted at NSIDC, demonstrates the interactions and uses of public and internal service interfaces currently powering applications including the IceBridge Data Portal, and outlines the benefits and challenges of this approach.
NASA Astrophysics Data System (ADS)
Liu, Yin; Wu, Kongyou; Wang, Xi; Liu, Bo; Guo, Jianxun; Du, Yannan
2017-12-01
It is widely accepted that the faults can act as the conduits or the barrier for oil and gas migration. Years of studies suggested that the internal architecture of a fault zone is complicated and composed of distinct components with different physical features, which can highly influence the migration of oil and gas along the fault. The field observation is the most useful methods of observing the fault zone architecture, however, in the petroleum exploration, what should be concerned is the buried faults in the sedimentary basin. Meanwhile, most of the studies put more attention on the strike-slip or normal faults, but the architecture of the reverse faults attracts less attention. In order to solve these questions, the Hong-Che Fault Zone in the northwest margin of the Junggar Basin, Xinjiang Province, is chosen for an example. Combining with the seismic data, well logs and drill core data, we put forward a comprehensive method to recognize the internal architectures of buried faults. High-precision seismic data reflect that the fault zone shows up as a disturbed seismic reflection belt. Four types of well logs, which are sensitive to the fractures, and a comprehensive discriminated parameter, named fault zone index are used in identifying the fault zone architecture. Drill core provides a direct way to identify different components of the fault zone, the fault core is composed of breccia, gouge, and serpentinized or foliated fault rocks and the damage zone develops multiphase of fractures, which are usually cemented. Based on the recognition results, we found that there is an obvious positive relationship between the width of the fault zone and the displacement, and the power-law relationship also exists between the width of the fault core and damage zone. The width of the damage zone in the hanging wall is not apparently larger than that in the footwall in the reverse fault, showing different characteristics with the normal fault. This study provides a comprehensive method in identifying the architecture of buried faults in the sedimentary basin and would be helpful in evaluating the fault sealing behavior.
NASA Astrophysics Data System (ADS)
Cooper, R. F.
2010-12-01
Measurements of redox dynamics in silicate melts and glasses suggest that, for many compositions and for many external environments, the reaction proceeds and is rate-limited by the diffusive flux of divalent-cation network modifiers. Application of ion-backscattering spectrometry either (i) on oxidized or reduced melts (subsequently quenched before analysis) or (ii) on similarly reacted glasses, both of basalt-composition polymerization, demonstrates that the network modifiers move relative to the (first-order-rigid) aluminosilicate network. Thus, the textures associated with such reactions are often surprising, and frequently include metastable or unstable phases and/or spatial compositional differences. This response is only possible if the motion of cations can be decoupled from that of anions. In many cases, decoupling is accomplished by the presence in the melt/glass of transition-metal cations, whose heterovalency creates distortions in the electronic band structure resulting in electronic defects: electron “holes” in the valence band or electrons in the conduction band. (The prevalence of holes or electrons being a function of bulk chemistry and oxygen activity.) These electronic species make the melt/glass a “defect semiconductor.” Because (a) the critical issue in reaction dynamics is the transport coefficient (the product of species mobility and species concentration) and (b) the electronic species are many orders of magnitude more mobile than are the ions, very low concentrations of transition-metal ions are required for flux decoupling. For example, 0.04 at% Fe keeps a magnesium aluminosilicate melt/glass a defect semiconductor down to 800°C [Cook & Cooper, 2000]. Depending on composition, high-temperature melts can see ion species having a high-enough transport coefficient to allow decoupling, e.g., alkali cations in a basaltic melt [e.g., Pommier et al., 2010]. In this presentation, these ideas will be illustrated by examining redox dynamics in basaltic melts [e.g., Burgess et al., 2010; Cooper et al., 2010] and the reaction of magnesium aluminosilicate melts (transition-metal-ion-free and -doped) with liquid bronze (Cu-Sn alloy) [Pettersen et al., 2008], the latter demonstrating the importance of heterovalency in silicon [e.g., Borman et al., 1991] in effecting the reaction dynamics and resultant texture. Borman, V.D. et al. (1991) Phys. Rev. Lett. 67:2387-2390. Burgess, K. et al. (2010) Geochem. Geophys. Geosyst. 11:in press. Cook, G.B., and R.F. Cooper (2000) Am. Mineral. 85:397-406. Cooper, R.F. et al. (2010) Am. Mineral. 95:810-824. Pettersen, C., and R.F. Cooper (2008) J. Non-Crys. Solids 354:3194-3206. Pommier, A. et al. (2010) Geochim. Cosmochim. Acta 74:1653-1671.
Application of ant colony Algorithm and particle swarm optimization in architectural design
NASA Astrophysics Data System (ADS)
Song, Ziyi; Wu, Yunfa; Song, Jianhua
2018-02-01
By studying the development of ant colony algorithm and particle swarm algorithm, this paper expounds the core idea of the algorithm, explores the combination of algorithm and architectural design, sums up the application rules of intelligent algorithm in architectural design, and combines the characteristics of the two algorithms, obtains the research route and realization way of intelligent algorithm in architecture design. To establish algorithm rules to assist architectural design. Taking intelligent algorithm as the beginning of architectural design research, the authors provide the theory foundation of ant colony Algorithm and particle swarm algorithm in architectural design, popularize the application range of intelligent algorithm in architectural design, and provide a new idea for the architects.
Fujii, Ritsuko; Shimonaka, Shozo; Uchida, Naoko; Gardiner, Alastair T; Cogdell, Richard J; Sugisaki, Mitsuru; Hashimoto, Hideki
2008-01-01
Typical purple bacterial photosynthetic units consist of supra-molecular arrays of peripheral (LH2) and core (LH1-RC) antenna complexes. Recent atomic force microscopy pictures of photosynthetic units in intact membranes have revealed that the architecture of these units is variable (Scheuring et al. (2005) Biochim Bhiophys Acta 1712:109-127). In this study, we describe methods for the construction of heterologous photosynthetic units in lipid-bilayers from mixtures of purified LH2 (from Rhodopseudomonas acidophila) and LH1-RC (from Rhodopseudomonas viridis) core complexes. The architecture of these reconstituted photosynthetic units can be varied by controlling ratio of added LH2 to core complexes. The arrangement of the complexes was visualized by electron-microscopy in combination with Fourier analysis. The regular trigonal array of the core complexes seen in the native photosynthetic membrane could be regenerated in the reconstituted membranes by temperature cycling. In the presence of added LH2 complexes, this trigonal symmetry was replaced with orthorhombic symmetry. The small lattice lengths for the latter suggest that the constituent unit of the orthorhombic lattice is the LH2. Fluorescence and fluorescence-excitation spectroscopy was applied to the set of the reconstituted membranes prepared with various proportions of LH2 to core complexes. Remarkably, even though the LH2 complexes contain bacteriochlorophyll a, and the core complexes contain bacteriochlorophyll b, it was possible to demonstrate energy transfer from LH2 to the core complexes. These experiments provide a first step along the path toward investigating how changing the architecture of purple bacterial photosynthetic units affects the overall efficiency of light-harvesting.
NASA Astrophysics Data System (ADS)
Hassan, A. H.; Fluke, C. J.; Barnes, D. G.
2012-09-01
Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.
Low power test architecture for dynamic read destructive fault detection in SRAM
NASA Astrophysics Data System (ADS)
Takher, Vikram Singh; Choudhary, Rahul Raj
2018-06-01
Dynamic Read Destructive Fault (dRDF) is the outcome of resistive open defects in the core cells of static random-access memories (SRAMs). The sensitisation of dRDF involves either performing multiple read operations or creation of number of read equivalent stress (RES), on the core cell under test. Though the creation of RES is preferred over the performing multiple read operation on the core cell, cell dissipates more power during RES than during the read or write operation. This paper focuses on the reduction in power dissipation by optimisation of number of RESs, which are required to sensitise the dRDF during test mode of operation of SRAM. The novel pre-charge architecture has been proposed in order to reduce the power dissipation by limiting the number of RESs to an optimised number of two. The proposed low power architecture is simulated and analysed which shows reduction in power dissipation by reducing the number of RESs up to 18.18%.
NASA Astrophysics Data System (ADS)
Limaye, A. B.; Komatsu, Y.; Suzuki, K.; Paola, C.
2017-12-01
Turbidity currents deliver clastic sediment from continental margins to the deep ocean, and are the main driver of landscape and stratigraphic evolution in many low-relief, submarine environments. The sedimentary architecture of turbidites—including the spatial organization of coarse and fine sediments—is closely related to the aggradation, scour, and lateral shifting of channels. Seismic stratigraphy indicates that submarine, meandering channels often aggrade rapidly relative to lateral shifting, and develop channel sand bodies with high vertical connectivity. In comparison, the stratigraphic architecture developed by submarine, braided is relatively uncertain. We present a new stratigraphic model for submarine braided channels that integrates predictions from laboratory experiments and flow modeling with constraints from sediment cores. In the laboratory experiments, a saline density current developed subaqueous channels in plastic sediment. The channels aggraded to form a deposit with a vertical scale of approximately five channel depths. We collected topography data during aggradation to (1) establish relative stratigraphic age, and (2) estimate the sorting patterns of a hypothetical grain size distribution. We applied a numerical flow model to each topographic surface and used modeled flow depth as a proxy for relative grain size. We then conditioned the resulting stratigraphic model to observed grain size distributions using sediment core data from the Nankai Trough, offshore Japan. Using this stratigraphic model, we establish new, quantitative predictions for the two- and three-dimensional connectivity of coarse sediment as a function of fine-sediment fraction. Using this case study as an example, we will highlight outstanding challenges in relating the evolution of low-relief landscapes to the stratigraphic record.
GPU: the biggest key processor for AI and parallel processing
NASA Astrophysics Data System (ADS)
Baji, Toru
2017-07-01
Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo
2012-02-01
We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications
NASA Astrophysics Data System (ADS)
Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei
2007-04-01
In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Enhancing power density of biophotovoltaics by decoupling storage and power delivery
NASA Astrophysics Data System (ADS)
Saar, Kadi L.; Bombelli, Paolo; Lea-Smith, David J.; Call, Toby; Aro, Eva-Mari; Müller, Thomas; Howe, Christopher J.; Knowles, Tuomas P. J.
2018-01-01
Biophotovoltaic devices (BPVs), which use photosynthetic organisms as active materials to harvest light, have a range of attractive features relative to synthetic and non-biological photovoltaics, including their environmentally friendly nature and ability to self-repair. However, efficiencies of BPVs are currently lower than those of synthetic analogues. Here, we demonstrate BPVs delivering anodic power densities of over 0.5 W m-2, a value five times that for previously described BPVs. We achieved this through the use of cyanobacterial mutants with increased electron export characteristics together with a microscale flow-based design that allowed independent optimization of the charging and power delivery processes, as well as membrane-free operation by exploiting laminar flow to separate the catholyte and anolyte streams. These results suggest that miniaturization of active elements and flow control for decoupled operation and independent optimization of the core processes involved in BPV design are effective strategies for enhancing power output and thus the potential of BPVs as viable systems for sustainable energy generation.
A High Rigidity and Precision Scanning Tunneling Microscope with Decoupled XY and Z Scans.
Chen, Xu; Guo, Tengfei; Hou, Yubin; Zhang, Jing; Meng, Wenjie; Lu, Qingyou
2017-01-01
A new scan-head structure for the scanning tunneling microscope (STM) is proposed, featuring high scan precision and rigidity. The core structure consists of a piezoelectric tube scanner of quadrant type (for XY scans) coaxially housed in a piezoelectric tube with single inner and outer electrodes (for Z scan). They are fixed at one end (called common end). A hollow tantalum shaft is coaxially housed in the XY -scan tube and they are mutually fixed at both ends. When the XY scanner scans, its free end will bring the shaft to scan and the tip which is coaxially inserted in the shaft at the common end will scan a smaller area if the tip protrudes short enough from the common end. The decoupled XY and Z scans are desired for less image distortion and the mechanically reduced scan range has the superiority of reducing the impact of the background electronic noise on the scanner and enhancing the tip positioning precision. High quality atomic resolution images are also shown.
Liu, Wenbo; Chen, Long; Dong, Xin; Yan, Jiazhen; Li, Ning; Shi, Sanqiang; Zhang, Shichao
2016-01-01
In this report, a facile and effective one-pot oxidation-assisted dealloying protocol has been developed to massively synthesize monolithic core-shell architectured nanoporous copper@cuprous oxide nanonetworks (C-S NPC@Cu2O NNs) by chemical dealloying of melt-spun Al 37 at.% Cu alloy in an oxygen-rich alkaline solution at room temperature, which possesses superior photocatalytic activity towards photodegradation of methyl orange (MO). The experimental results show that the as-prepared nanocomposite exhibits an open, bicontinuous interpenetrating ligament-pore structure with length scales of 20 ± 5 nm, in which the ligaments comprising Cu and Cu2O are typical of core-shell architecture with uniform shell thickness of ca. 3.5 nm. The photodegradation experiments of C-S NPC@Cu2O NNs show their superior photocatalytic activities for the MO degradation under visible light irradiation with degradation rate as high as 6.67 mg min−1 gcat−1, which is a diffusion-controlled kinetic process in essence in light of the good linear correlation between photodegradation ratio and square root of irradiation time. The excellent photocatalytic activity can be ascribed to the synergistic effects between unique core-shell architecture and 3D nanoporous network with high specific surface area and fast mass transfer channel, indicating that the C-S NPC@Cu2O NNs will be a promising candidate for photocatalysts of MO degradation. PMID:27830720
NASA Astrophysics Data System (ADS)
Liu, Wenbo; Chen, Long; Dong, Xin; Yan, Jiazhen; Li, Ning; Shi, Sanqiang; Zhang, Shichao
2016-11-01
In this report, a facile and effective one-pot oxidation-assisted dealloying protocol has been developed to massively synthesize monolithic core-shell architectured nanoporous copper@cuprous oxide nanonetworks (C-S NPC@Cu2O NNs) by chemical dealloying of melt-spun Al 37 at.% Cu alloy in an oxygen-rich alkaline solution at room temperature, which possesses superior photocatalytic activity towards photodegradation of methyl orange (MO). The experimental results show that the as-prepared nanocomposite exhibits an open, bicontinuous interpenetrating ligament-pore structure with length scales of 20 ± 5 nm, in which the ligaments comprising Cu and Cu2O are typical of core-shell architecture with uniform shell thickness of ca. 3.5 nm. The photodegradation experiments of C-S NPC@Cu2O NNs show their superior photocatalytic activities for the MO degradation under visible light irradiation with degradation rate as high as 6.67 mg min-1 gcat-1, which is a diffusion-controlled kinetic process in essence in light of the good linear correlation between photodegradation ratio and square root of irradiation time. The excellent photocatalytic activity can be ascribed to the synergistic effects between unique core-shell architecture and 3D nanoporous network with high specific surface area and fast mass transfer channel, indicating that the C-S NPC@Cu2O NNs will be a promising candidate for photocatalysts of MO degradation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morscher, Meagan; Pattabiraman, Bharath; Rodriguez, Carl
Our current understanding of the stellar initial mass function and massive star evolution suggests that young globular clusters (GCs) may have formed hundreds to thousands of stellar-mass black holes (BHs), the remnants of stars with initial masses from ∼20-100 M {sub ☉}. Birth kicks from supernova explosions may eject some BHs from their birth clusters, but most should be retained. Using a Monte Carlo method we investigate the long-term dynamical evolution of GCs containing large numbers of stellar BHs. We describe numerical results for 42 models, covering a broad range of realistic initial conditions, including up to 1.6 × 10{supmore » 6} stars. In almost all models we find that significant numbers of BHs (up to ∼10{sup 3}) are retained all the way to the present. This is in contrast to previous theoretical expectations that most BHs should be ejected dynamically within a few gigayears The main reason for this difference is that core collapse driven by BHs (through the Spitzer {sup m}ass segregation instability{sup )} is easily reverted through three-body processes, and involves only a small number of the most massive BHs, while lower-mass BHs remain well-mixed with ordinary stars far from the central cusp. Thus the rapid segregation of stellar BHs does not lead to a long-term physical separation of most BHs into a dynamically decoupled inner core, as often assumed previously. Combined with the recent detections of several BH X-ray binary candidates in Galactic GCs, our results suggest that stellar BHs could still be present in large numbers in many GCs today, and that they may play a significant role in shaping the long-term dynamical evolution and the present-day dynamical structure of many clusters.« less
Adaptive and technology-independent architecture for fault-tolerant distributed AAL solutions.
Schmidt, Michael; Obermaisser, Roman
2018-04-01
Today's architectures for Ambient Assisted Living (AAL) must cope with a variety of challenges like flawless sensor integration and time synchronization (e.g. for sensor data fusion) while abstracting from the underlying technologies at the same time. Furthermore, an architecture for AAL must be capable to manage distributed application scenarios in order to support elderly people in all situations of their everyday life. This encompasses not just life at home but in particular the mobility of elderly people (e.g. when going for a walk or having sports) as well. Within this paper we will introduce a novel architecture for distributed AAL solutions whose design follows a modern Microservices approach by providing small core services instead of a monolithic application framework. The architecture comprises core services for sensor integration, and service discovery while supporting several communication models (periodic, sporadic, streaming). We extend the state-of-the-art by introducing a fault-tolerance model for our architecture on the basis of a fault-hypothesis describing the fault-containment regions (FCRs) with their respective failure modes and failure rates in order to support safety-critical AAL applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
Selective randomized load balancing and mesh networks with changing demands
NASA Astrophysics Data System (ADS)
Shepherd, F. B.; Winzer, P. J.
2006-05-01
We consider the problem of building cost-effective networks that are robust to dynamic changes in demand patterns. We compare several architectures using demand-oblivious routing strategies. Traditional approaches include single-hop architectures based on a (static or dynamic) circuit-switched core infrastructure and multihop (packet-switched) architectures based on point-to-point circuits in the core. To address demand uncertainty, we seek minimum cost networks that can carry the class of hose demand matrices. Apart from shortest-path routing, Valiant's randomized load balancing (RLB), and virtual private network (VPN) tree routing, we propose a third, highly attractive approach: selective randomized load balancing (SRLB). This is a blend of dual-hop hub routing and randomized load balancing that combines the advantages of both architectures in terms of network cost, delay, and delay jitter. In particular, we give empirical analyses for the cost (in terms of transport and switching equipment) for the discussed architectures, based on three representative carrier networks. Of these three networks, SRLB maintains the resilience properties of RLB while achieving significant cost reduction over all other architectures, including RLB and multihop Internet protocol/multiprotocol label switching (IP/MPLS) networks using VPN-tree routing.
Accuracy of dynamical-decoupling-based spectroscopy of Gaussian noise
NASA Astrophysics Data System (ADS)
Szańkowski, Piotr; Cywiński, Łukasz
2018-03-01
The fundamental assumption of dynamical-decoupling-based noise spectroscopy is that the coherence decay rate of qubit (or qubits) driven with a sequence of many pulses, is well approximated by the environmental noise spectrum spanned on frequency comb defined by the sequence. Here we investigate the precise conditions under which this commonly used spectroscopic approach is quantitatively correct. To this end we focus on two representative examples of spectral densities: the long-tailed Lorentzian, and finite-ranged Gaussian—both expected to be encountered when using the qubit for nanoscale nuclear resonance imaging. We have found that, in contrast to Lorentz spectrum, for which the corrections to the standard spectroscopic formulas can easily be made negligible, the spectra with finite range are more challenging to reconstruct accurately. For Gaussian line shape of environmental spectral density, direct application of the standard dynamical-decoupling-based spectroscopy leads to erroneous attribution of long-tail behavior to the reconstructed spectrum. Fortunately, artifacts such as this, can be completely avoided with the simple extension to standard reconstruction method.
Titan's rotation reveals an internal ocean and changing zonal winds
Lorenz, R.D.; Stiles, B.W.; Kirk, R.L.; Allison, M.D.; Del Marmo, P.P.; Iess, L.; Lunine, J.I.; Ostro, S.J.; Hensley, S.
2008-01-01
Cassini radar observations of Saturn's moon Titan over several years show that its rotational period is changing and is different from its orbital period. The present-day rotation period difference from synchronous spin leads to a shift of ???0.36?? per year in apparent longitude and is consistent with seasonal exchange of angular momentum between the surface and Titan's dense superrotating atmosphere, but only if Titan's crust is decoupled from the core by an internal water ocean like that on Europa.
A differential memristive synapse circuit for on-line learning in neuromorphic computing systems
NASA Astrophysics Data System (ADS)
Nair, Manu V.; Muller, Lorenz K.; Indiveri, Giacomo
2017-12-01
Spike-based learning with memristive devices in neuromorphic computing architectures typically uses learning circuits that require overlapping pulses from pre- and post-synaptic nodes. This imposes severe constraints on the length of the pulses transmitted in the network, and on the network’s throughput. Furthermore, most of these circuits do not decouple the currents flowing through memristive devices from the one stimulating the target neuron. This can be a problem when using devices with high conductance values, because of the resulting large currents. In this paper, we propose a novel circuit that decouples the current produced by the memristive device from the one used to stimulate the post-synaptic neuron, by using a novel differential scheme based on the Gilbert normalizer circuit. We show how this circuit is useful for reducing the effect of variability in the memristive devices, and how it is ideally suited for spike-based learning mechanisms that do not require overlapping pre- and post-synaptic pulses. We demonstrate the features of the proposed synapse circuit with SPICE simulations, and validate its learning properties with high-level behavioral network simulations which use a stochastic gradient descent learning rule in two benchmark classification tasks.
Evaluation of CDMA system capacity for mobile satellite system applications
NASA Technical Reports Server (NTRS)
Smith, Partrick O.; Geraniotis, Evaggelos A.
1988-01-01
A specific Direct-Sequence/Pseudo-Noise (DS/PN) Code-Division Multiple-Access (CDMA) mobile satellite system (MSAT) architecture is discussed. The performance of this system is evaluated in terms of the maximum number of active MSAT subscribers that can be supported at a given uncoded bit-error probability. The evaluation decouples the analysis of the multiple-access capability (i.e., the number of instantaneous user signals) from the analysis of the multiple-access mutliplier effect allowed by the use of CDMA with burst-modem operation. We combine the results of these two analyses and present numerical results for scenarios of interest to the mobile satellite system community.
Separating hydrogen and oxygen evolution in alkaline water electrolysis using nickel hydroxide
Chen, Long; Dong, Xiaoli; Wang, Yonggang; Xia, Yongyao
2016-01-01
Low-cost alkaline water electrolysis has been considered a sustainable approach to producing hydrogen using renewable energy inputs, but preventing hydrogen/oxygen mixing and efficiently using the instable renewable energy are challenging. Here, using nickel hydroxide as a redox mediator, we decouple the hydrogen and oxygen production in alkaline water electrolysis, which overcomes the gas-mixing issue and may increase the use of renewable energy. In this architecture, the hydrogen production occurs at the cathode by water reduction, and the anodic Ni(OH)2 is simultaneously oxidized into NiOOH. The subsequent oxygen production involves a cathodic NiOOH reduction (NiOOH→Ni(OH)2) and an anodic OH− oxidization. Alternatively, the NiOOH formed during hydrogen production can be coupled with a zinc anode to form a NiOOH-Zn battery, and its discharge product (that is, Ni(OH)2) can be used to produce hydrogen again. This architecture brings a potential solution to facilitate renewables-to-hydrogen conversion. PMID:27199009
NASA Astrophysics Data System (ADS)
Chung, Pil Seung; Song, Wonyup; Biegler, Lorenz T.; Jhon, Myung S.
2017-05-01
During the operation of hard disk drive (HDD), the perfluoropolyether (PFPE) lubricant experiences elastic or viscous shear/elongation deformations, which affect the performance and reliability of the HDD. Therefore, the viscoelastic responses of PFPE could provide a finger print analysis in designing optimal molecular architecture of lubricants to control the tribological phenomena. In this paper, we examine the rheological responses of PFPEs including storage (elastic) and loss (viscous) moduli (G' and G″) by monitoring the time-dependent-stress-strain relationship via non-equilibrium molecular dynamics simulations. We analyzed the rheological responses by using Cox-Merz rule, and investigated the molecular structural and thermal effects on the solid-like and liquid-like behaviors of PFPEs. The temperature dependence of the endgroup agglomeration phenomena was examined, where the functional endgroups are decoupled as the temperature increases. By analyzing the relaxation processes, the molecular rheological studies will provide the optimal lubricant selection criteria to enhance the HDD performance and reliability for the heat-assisted magnetic recording applications.
Bending strain engineering in quantum spin hall system for controlling spin currents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Bing; Jin, Kyung-Hwan; Cui, Bin
Quantum spin Hall system can exhibit exotic spin transport phenomena, mediated by its topological edge states. The concept of bending strain engineering to tune the spin transport properties of a quantum spin Hall system is demonstrated. Here, we show that bending strain can be used to control the spin orientation of counter-propagating edge states of a quantum spin system to generate a non-zero spin current. This physics mechanism can be applied to effectively tune the spin current and pure spin current decoupled from charge current in a quantum spin Hall system by control of its bending curvature. Moreover, the curvedmore » quantum spin Hall system can be achieved by the concept of topological nanomechanical architecture in a controllable way, as demonstrated by the material example of Bi/Cl/Si(111) nanofilm. This concept of bending strain engineering of spins via topological nanomechanical architecture affords a promising route towards the realization of topological nano-mechanospintronics.« less
Bending strain engineering in quantum spin hall system for controlling spin currents
Huang, Bing; Jin, Kyung-Hwan; Cui, Bin; ...
2017-06-16
Quantum spin Hall system can exhibit exotic spin transport phenomena, mediated by its topological edge states. The concept of bending strain engineering to tune the spin transport properties of a quantum spin Hall system is demonstrated. Here, we show that bending strain can be used to control the spin orientation of counter-propagating edge states of a quantum spin system to generate a non-zero spin current. This physics mechanism can be applied to effectively tune the spin current and pure spin current decoupled from charge current in a quantum spin Hall system by control of its bending curvature. Moreover, the curvedmore » quantum spin Hall system can be achieved by the concept of topological nanomechanical architecture in a controllable way, as demonstrated by the material example of Bi/Cl/Si(111) nanofilm. This concept of bending strain engineering of spins via topological nanomechanical architecture affords a promising route towards the realization of topological nano-mechanospintronics.« less
Pica, G.; Lovett, B. W.; Bhatt, R. N.; ...
2016-01-14
A scaled quantum computer with donor spins in silicon would benefit from a viable semiconductor framework and a strong inherent decoupling of the qubits from the noisy environment. Coupling neighboring spins via the natural exchange interaction according to current designs requires gate control structures with extremely small length scales. In this work, we present a silicon architecture where bismuth donors with long coherence times are coupled to electrons that can shuttle between adjacent quantum dots, thus relaxing the pitch requirements and allowing space between donors for classical control devices. An adiabatic SWAP operation within each donor/dot pair solves the scalabilitymore » issues intrinsic to exchange-based two-qubit gates, as it does not rely on subnanometer precision in donor placement and is robust against noise in the control fields. In conclusion, we use this SWAP together with well established global microwave Rabi pulses and parallel electron shuttling to construct a surface code that needs minimal, feasible local control.« less
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs
NASA Astrophysics Data System (ADS)
Dias, Tiago; Roma, Nuno; Sousa, Leonel
2014-12-01
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
NASA Astrophysics Data System (ADS)
Abeywickrama, Sandu; Furdek, Marija; Monti, Paolo; Wosinska, Lena; Wong, Elaine
2016-12-01
Core network survivability affects the reliability performance of telecommunication networks and remains one of the most important network design considerations. This paper critically examines the benefits arising from utilizing dual-homing in the optical access networks to provide resource-efficient protection against link and node failures in the optical core segment. Four novel, heuristic-based RWA algorithms that provide dedicated path protection in networks with dual-homing are proposed and studied. These algorithms protect against different failure scenarios (i.e. single link or node failures) and are implemented with different optimization objectives (i.e., minimization of wavelength usage and path length). Results obtained through simulations and comparison with baseline architectures indicate that exploiting dual-homed architecture in the access segment can bring significant improvements in terms of core network resource usage, connection availability, and power consumption.
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceriani, Marco; Palermo, Gianluca; Secchi, Simone
We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
Hardware/software codesign for embedded RISC core
NASA Astrophysics Data System (ADS)
Liu, Peng
2001-12-01
This paper describes hardware/software codesign method of the extendible embedded RISC core VIRGO, which based on MIPS-I instruction set architecture. VIRGO is described by Verilog hardware description language that has five-stage pipeline with shared 32-bit cache/memory interface, and it is controlled by distributed control scheme. Every pipeline stage has one small controller, which controls the pipeline stage status and cooperation among the pipeline phase. Since description use high level language and structure is distributed, VIRGO core has highly extension that can meet the requirements of application. We take look at the high-definition television MPEG2 MPHL decoder chip, constructed the hardware/software codesign virtual prototyping machine that can research on VIRGO core instruction set architecture, and system on chip memory size requirements, and system on chip software, etc. We also can evaluate the system on chip design and RISC instruction set based on the virtual prototyping machine platform.
Krull, Sandra; Thyberg, Johan; Björkroth, Birgitta; Rackwitz, Hans-Richard; Cordes, Volker C
2004-09-01
The vertebrate nuclear pore complex (NPC) is a macromolecular assembly of protein subcomplexes forming a structure of eightfold radial symmetry. The NPC core consists of globular subunits sandwiched between two coaxial ring-like structures of which the ring facing the nuclear interior is capped by a fibrous structure called the nuclear basket. By postembedding immunoelectron microscopy, we have mapped the positions of several human NPC proteins relative to the NPC core and its associated basket, including Nup93, Nup96, Nup98, Nup107, Nup153, Nup205, and the coiled coil-dominated 267-kDa protein Tpr. To further assess their contributions to NPC and basket architecture, the genes encoding Nup93, Nup96, Nup107, and Nup205 were posttranscriptionally silenced by RNA interference (RNAi) in HeLa cells, complementing recent RNAi experiments on Nup153 and Tpr. We show that Nup96 and Nup107 are core elements of the NPC proper that are essential for NPC assembly and docking of Nup153 and Tpr to the NPC. Nup93 and Nup205 are other NPC core elements that are important for long-term maintenance of NPCs but initially dispensable for the anchoring of Nup153 and Tpr. Immunogold-labeling for Nup98 also results in preferential labeling of NPC core regions, whereas Nup153 is shown to bind via its amino-terminal domain to the nuclear coaxial ring linking the NPC core structures and Tpr. The position of Tpr in turn is shown to coincide with that of the nuclear basket, with different Tpr protein domains corresponding to distinct basket segments. We propose a model in which Tpr constitutes the central architectural element that forms the scaffold of the nuclear basket.
Model-Unified Planning and Execution for Distributed Autonomous System Control
NASA Technical Reports Server (NTRS)
Aschwanden, Pascal; Baskaran, Vijay; Bernardini, Sara; Fry, Chuck; Moreno, Maria; Muscettola, Nicola; Plaunt, Chris; Rijsman, David; Tompkins, Paul
2006-01-01
The Intelligent Distributed Execution Architecture (IDEA) is a real-time architecture that exploits artificial intelligence planning as the core reasoning engine for interacting autonomous agents. Rather than enforcing separate deliberation and execution layers, IDEA unifies them under a single planning technology. Deliberative and reactive planners reason about and act according to a single representation of the past, present and future domain state. The domain state behaves the rules dictated by a declarative model of the subsystem to be controlled, internal processes of the IDEA controller, and interactions with other agents. We present IDEA concepts - modeling, the IDEA core architecture, the unification of deliberation and reaction under planning - and illustrate its use in a simple example. Finally, we present several real-world applications of IDEA, and compare IDEA to other high-level control approaches.
Multi-level Hierarchical Poly Tree computer architectures
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug
1990-01-01
Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
NASA Astrophysics Data System (ADS)
Heinzeller, Dominikus; Duda, Michael G.; Kunstmann, Harald
2017-04-01
With strong financial and political support from national and international initiatives, exascale computing is projected for the end of this decade. Energy requirements and physical limitations imply the use of accelerators and the scaling out to orders of magnitudes larger numbers of cores then today to achieve this milestone. In order to fully exploit the capabilities of these Exascale computing systems, existing applications need to undergo significant development. The Model for Prediction Across Scales (MPAS) is a novel set of Earth system simulation components and consists of an atmospheric core, an ocean core, a land-ice core and a sea-ice core. Its distinct features are the use of unstructured Voronoi meshes and C-grid discretisation to address shortcomings of global models on regular grids and the use of limited area models nested in a forcing data set, with respect to parallel scalability, numerical accuracy and physical consistency. Here, we present work towards the application of the atmospheric core (MPAS-A) on current and future high performance computing systems for problems at extreme scale. In particular, we address the issue of massively parallel I/O by extending the model to support the highly scalable SIONlib library. Using global uniform meshes with a convection-permitting resolution of 2-3km, we demonstrate the ability of MPAS-A to scale out to half a million cores while maintaining a high parallel efficiency. We also demonstrate the potential benefit of a hybrid parallelisation of the code (MPI/OpenMP) on the latest generation of Intel's Many Integrated Core Architecture, the Intel Xeon Phi Knights Landing.
Evolution of dynamo-generated magnetic fields in accretion disks around compact and young stars
NASA Technical Reports Server (NTRS)
Stepinski, Tomasz F.
1994-01-01
Geometrically thin, optically thick, turbulent accretion disks are believed to surround many stars. Some of them are the compact components of close binaries, while the others are throught to be T Tauri stars. These accretion disks must be magnetized objects because the accreted matter, whether it comes from the companion star (binaries) or from a collapsing molecular cloud core (single young stars), carries an embedded magnetic field. In addition, most accretion disks are hot and turbulent, thus meeting the condition for the MHD turbulent dynamo to maintain and amplify any seed field magnetic field. In fact, for a disk's magnetic field to persist long enough in comparison with the disk viscous time it must be contemporaneously regenerated because the characteristic diffusion time of a magnetic field is typically much shorter than a disk's viscous time. This is true for most thin accretion disks. Consequently, studying magentic fields in thin disks is usually synonymous with studying magnetic dynamos, a fact that is not commonly recognized in the literature. Progress in studying the structure of many accretion disks was achieved mainly because most disks can be regarded as two-dimensional flows in which vertical and radial structures are largely decoupled. By analogy, in a thin disk, one may expect that vertical and radial structures of the magnetic field are decoupled because the magnetic field diffuses more rapidly to the vertical boundary of the disk than along the radius. Thus, an asymptotic method, called an adiabatic approximation, can be applied to accretion disk dynamo. We can represent the solution to the dynamo equation in the form B = Q(r)b(r,z), where Q(r) describes the field distribution along the radius, while the field distribution across the disk is included in the vector function b, which parametrically depends on r and is normalized by the condition max (b(z)) = 1. The field distribution across the disk is established rapidly, while the radial distribution Q(r) evolves on a considerably longer timescale. It is this evolution that is the subject of this paper.
NASA Astrophysics Data System (ADS)
Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki
This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gebis, Joseph; Oliker, Leonid; Shalf, John
The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changesmore » to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x-13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.« less
Constellation Architecture Team-Lunar: Lunar Habitat Concepts
NASA Technical Reports Server (NTRS)
Toups, Larry; Kennedy, Kriss J.
2008-01-01
This paper will describe lunar habitat concepts that were defined as part of the Constellation Architecture Team-Lunar (CxAT-Lunar) in support of the Vision for Space Exploration. There are many challenges to designing lunar habitats such as mission objectives, launch packaging, lander capability, and risks. Surface habitats are required in support of sustaining human life to meet the mission objectives of lunar exploration, operations, and sustainability. Lunar surface operations consist of crew operations, mission operations, EVA operations, science operations, and logistics operations. Habitats are crewed pressurized vessels that include surface mission operations, science laboratories, living support capabilities, EVA support, logistics, and maintenance facilities. The challenge is to deliver, unload, and deploy self-contained habitats and laboratories to the lunar surface. The CxAT-Lunar surface campaign analysis focused on three primary trade sets of analysis. Trade set one (TS1) investigated sustaining a crew of four for six months with full outpost capability and the ability to perform long surface mission excursions using large mobility systems. Two basic habitat concepts of a hard metallic horizontal cylinder and a larger inflatable torus concept were investigated as options in response to the surface exploration architecture campaign analysis. Figure 1 and 2 depicts the notional outpost configurations for this trade set. Trade set two (TS2) investigated a mobile architecture approach with the campaign focused on early exploration using two small pressurized rovers and a mobile logistics support capability. This exploration concept will not be described in this paper. Trade set three (TS3) investigated delivery of a "core' habitation capability in support of an early outpost that would mature into the TS1 full outpost capability. Three core habitat concepts were defined for this campaign analysis. One with a four port core habitat, another with a 2 port core habitat, and the third investigated leveraging commonality of the lander ascent module and airlock pressure vessel hard shell. The paper will describe an overview of the various habitat concepts and their functionality. The Crew Operations area includes basic crew accommodations such as sleeping, eating, hygiene and stowage. The EVA Operations area includes additional EVA capability beyond the suit-port airlock function such as redundant airlock(s), suit maintenance, spares stowage, and suit stowage. The Logistics Operations area includes the enhanced accommodations for 180 days such as closed loop life support systems hardware, consumable stowage, spares stowage, interconnection to the other Hab units, and a common interface mechanism for future growth and mating to a pressurized rover. The Mission & Science Operations area includes enhanced outpost autonomy such as an IVA glove box, life support, and medical operations.
Air core poloidal magnetic field system for a toroidal plasma producing device
Marcus, Frederick B.
1978-01-01
A poloidal magnetics system for a plasma producing device of toroidal configuration is provided that reduces both the total volt-seconds requirement and the magnitude of the field change at the toroidal field coils. The system utilizes an air core transformer wound between the toroidal field (TF) coils and the major axis outside the TF coils. Electric current in the primary windings of this transformer is distributed and the magnetic flux returned by air core windings wrapped outside the toroidal field coils. A shield winding that is closely coupled to the plasma carries a current equal and opposite to the plasma current. This winding provides the shielding function and in addition serves in a fashion similar to a driven conducting shell to provide the equilibrium vertical field for the plasma. The shield winding is in series with a power supply and a decoupling coil located outside the TF coil at the primary winding locations. The present invention requires much less energy than the usual air core transformer and is capable of substantially shielding the toroidal field coils from poloidal field flux.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Learn, Mark Walter
Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not availablemore » to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.« less
A novel virtual hub approach for multisource downstream service integration
NASA Astrophysics Data System (ADS)
Previtali, Mattia; Cuca, Branka; Barazzetti, Luigi
2016-08-01
A large development of downstream services is expected to be stimulated starting from earth observations (EO) datasets acquired by Copernicus satellites. An important challenge connected with the availability of downstream services is the possibility for their integration in order to create innovative applications with added values for users of different categories level. At the moment, the world of geo-information (GI) is extremely heterogeneous in terms of standards and formats used, thus preventing a facilitated access and integration of downstream services. Indeed, different users and data providers have also different requirements in terms of communication protocols and technology advancement. In recent years, many important programs and initiatives have tried to address this issue even on trans-regional and international level (e.g. INSPIRE Directive, GEOSS, Eye on Earth and SEIS). However, a lack of interoperability between systems and services still exists. In order to facilitate the interaction between different downstream services, a new architectural approach (developed within the European project ENERGIC OD) is proposed in this paper. The brokering-oriented architecture introduces a new mediation layer (the Virtual Hub) which works as an intermediary to bridge the gaps linked to interoperability issues. This intermediation layer de-couples the server and the client allowing a facilitated access to multiple downstream services and also Open Data provided by national and local SDIs. In particular, in this paper an application is presented integrating four services on the topic of agriculture: (i) the service given by Space4Agri (providing services based on MODIS and Landsat data); (ii) Gicarus Lab (providing sample services based on Landsat datasets) and (iii) FRESHMON (providing sample services for water quality) and services from a several regional SDIs.
Feature recognition and detection for ancient architecture based on machine vision
NASA Astrophysics Data System (ADS)
Zou, Zheng; Wang, Niannian; Zhao, Peng; Zhao, Xuefeng
2018-03-01
Ancient architecture has a very high historical and artistic value. The ancient buildings have a wide variety of textures and decorative paintings, which contain a lot of historical meaning. Therefore, the research and statistics work of these different compositional and decorative features play an important role in the subsequent research. However, until recently, the statistics of those components are mainly by artificial method, which consumes a lot of labor and time, inefficiently. At present, as the strong support of big data and GPU accelerated training, machine vision with deep learning as the core has been rapidly developed and widely used in many fields. This paper proposes an idea to recognize and detect the textures, decorations and other features of ancient building based on machine vision. First, classify a large number of surface textures images of ancient building components manually as a set of samples. Then, using the convolution neural network to train the samples in order to get a classification detector. Finally verify its precision.
Scout: high-performance heterogeneous computing made simple
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice
2011-01-26
Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less
NASA Astrophysics Data System (ADS)
Postadjian, T.; Le Bris, A.; Sahbi, H.; Mallet, C.
2017-05-01
Semantic classification is a core remote sensing task as it provides the fundamental input for land-cover map generation. The very recent literature has shown the superior performance of deep convolutional neural networks (DCNN) for many classification tasks including the automatic analysis of Very High Spatial Resolution (VHR) geospatial images. Most of the recent initiatives have focused on very high discrimination capacity combined with accurate object boundary retrieval. Therefore, current architectures are perfectly tailored for urban areas over restricted areas but not designed for large-scale purposes. This paper presents an end-to-end automatic processing chain, based on DCNNs, that aims at performing large-scale classification of VHR satellite images (here SPOT 6/7). Since this work assesses, through various experiments, the potential of DCNNs for country-scale VHR land-cover map generation, a simple yet effective architecture is proposed, efficiently discriminating the main classes of interest (namely buildings, roads, water, crops, vegetated areas) by exploiting existing VHR land-cover maps for training.
Polar wander of an ice shell on Europa
NASA Technical Reports Server (NTRS)
Ojakangas, Gregory W.; Stevenson, David J.
1989-01-01
The present consideration of a hypothesized ice shell around Europa, which is decoupled from the silicate core by a liquid water layer and possesses a spatially varying thermal equilibrium thickness profile, proceeds through the development of equations for variations in the inertia tensor of a body when second-harmonic-degree topography is added to the crustal base. Attention is given to a realistic model in which the shell and ocean are assumed to undergo reorientations as a single entity independently of the core, but subject to viscous dissipation within the shell. Shell friction is in this case noted to preclude polar wander, unless a low conductivity regolith increases the near-surface temperature by a few tens of degrees C; the ice beneath the regolith would then behave viscously on the time-scale of polar wander.
NASA Astrophysics Data System (ADS)
Chapela Lara, M.; Schuessler, J. A.; Buss, H. L.; McDowell, W. H.
2017-12-01
During the evolution of the critical zone, the predominant source of nutrients to the vegetation changes from bedrock weathering to atmospheric inputs and biological recycling. In parallel, the architecture of the critical zone changes with time, promoting a change in water flow regime from near-surface porous flow during early weathering stages to more complex flow regimes modulated by clay-rich regolith during the late stages of weathering. As a consequence of these two concurrent processes, we can expect the predominant sources and pathways of solutes to the streams to also change during critical zone evolution. If this is true, we would observe a decoupling between the solutes used by the vegetation and those that determine the composition of the streams during the late stages of weathering, represented by geomorphically stable tropical settings. To test these hypotheses, we are analyzing the elemental and Mg isotopic composition of regolith and streams at the humid tropical Luquillo Critical Zone Observatory. We aim to trace the relative contributions of the surficial, biologically mediated pathways and the deeper, weathering controlled nutrient pathways. We also investigate the role of lithology on the solute decoupling between the vegetation and the stream, by examining two similar headwater catchments draining two different bedrocks (andesitic volcaniclastic and granitic). Our preliminary elemental and Mg isotope results are consistent with atmospheric inputs in the upper 2 m of regolith in both lithologies and with bedrock weathering at depth. During a short storm event ( 6 h), a headwater stream draining volcaniclastic bedrock showed a large variation in Mg and δ26Mg, correlated with total suspended solids, while another similar headwater granitic stream showed a much narrower variation. A larger stream draining volcaniclastic bedrock showed changes in Mg concentration in response to rain during the same storm event, but did not change in δ26Mg, suggesting the surficial-deep decoupling of solutes we observe in regolith profiles and headwater catchments might be overwhelmed by storage effects at increasing water residence times.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arumugam, Kamesh
Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
Verifying Architectural Design Rules of the Flight Software Product Line
NASA Technical Reports Server (NTRS)
Ganesan, Dharmalingam; Lindvall, Mikael; Ackermann, Chris; McComas, David; Bartholomew, Maureen
2009-01-01
This paper presents experiences of verifying architectural design rules of the NASA Core Flight Software (CFS) product line implementation. The goal of the verification is to check whether the implementation is consistent with the CFS architectural rules derived from the developer's guide. The results indicate that consistency checking helps a) identifying architecturally significant deviations that were eluded during code reviews, b) clarifying the design rules to the team, and c) assessing the overall implementation quality. Furthermore, it helps connecting business goals to architectural principles, and to the implementation. This paper is the first step in the definition of a method for analyzing and evaluating product line implementations from an architecture-centric perspective.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2015-10-01
Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earth's climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.
Development of an extensible dual-core wireless sensing node for cyber-physical systems
NASA Astrophysics Data System (ADS)
Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.
2014-04-01
The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.
Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhary, Alok; Samatova, Nagiza; Wu, Kesheng
This project developed a generic and optimized set of core data analytics functions. These functions organically consolidate a broad constellation of high performance analytical pipelines. As the architectures of emerging HPC systems become inherently heterogeneous, there is a need to design algorithms for data analysis kernels accelerated on hybrid multi-node, multi-core HPC architectures comprised of a mix of CPUs, GPUs, and SSDs. Furthermore, the power-aware trend drives the advances in our performance-energy tradeoff analysis framework which enables our data analysis kernels algorithms and software to be parameterized so that users can choose the right power-performance optimizations.
Hudry, Damien; Busko, Dmitry; Popescu, Radian; ...
2017-11-02
Core@shell design represents an important class of architectures because of its capability to dramatically increase the absolute upconversion quantum yield (UCQY) of upconverting nanocrystals (UCNCs) but also to tune energy migration pathways. A relatively new trend towards the use of very thick optically inert shells affording significantly higher absolute UCQYs raises the question of the crystallographic and chemical characteristics of such nanocrystals (NCs). In this article, local chemical analyses performed by scanning transmission electron microscopy (STEM) combined with energy dispersive x-ray spectroscopy (EDXS) and x-ray total scattering experiments together with pair distribution function (PDF) analyses were used to probe themore » local chemical and structural characteristics of hexagonal β-NaGd0.78Yb0.2Er0.02F4@NaYF4 core@shell UCNCs. The investigations lead to a new crystallochemical model to describe core@shell UCNCs that considerably digresses from the commonly accepted epitaxial growth concept with sharp interfaces. The results obtained on ultra-small (4.8 ± 0.5 nm) optically active cores (β-NaGd0.78Yb0.2Er0.02F4) surrounded by an optically inert shell (NaYF4) of tunable thickness (roughly 0, 1, 2, and 3.5 nm) clearly indicate the massive dissolution of the starting seeds and the inter-diffusion of the shell element (such as Y) into the Gd/Yb/Er-containing core giving rise to the formation of a non-homogeneous solid solution characterized by concentration gradients and the lack of sharp interfaces. Independently of the inert shell thickness, core/interface/shell architectures were observed for all synthesized UCNCs. The presented results constitute a significant step towards the comprehensive understanding of the “structure - property” relationship of upconverting core@shell architectures, which is of prime interest not only in the development of more efficient structures but also to provide new physical insights at the nanoscale to better explain upconversion (UC) properties alterations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudry, Damien; Busko, Dmitry; Popescu, Radian
Core@shell design represents an important class of architectures because of its capability to dramatically increase the absolute upconversion quantum yield (UCQY) of upconverting nanocrystals (UCNCs) but also to tune energy migration pathways. A relatively new trend towards the use of very thick optically inert shells affording significantly higher absolute UCQYs raises the question of the crystallographic and chemical characteristics of such nanocrystals (NCs). In this article, local chemical analyses performed by scanning transmission electron microscopy (STEM) combined with energy dispersive x-ray spectroscopy (EDXS) and x-ray total scattering experiments together with pair distribution function (PDF) analyses were used to probe themore » local chemical and structural characteristics of hexagonal β-NaGd0.78Yb0.2Er0.02F4@NaYF4 core@shell UCNCs. The investigations lead to a new crystallochemical model to describe core@shell UCNCs that considerably digresses from the commonly accepted epitaxial growth concept with sharp interfaces. The results obtained on ultra-small (4.8 ± 0.5 nm) optically active cores (β-NaGd0.78Yb0.2Er0.02F4) surrounded by an optically inert shell (NaYF4) of tunable thickness (roughly 0, 1, 2, and 3.5 nm) clearly indicate the massive dissolution of the starting seeds and the inter-diffusion of the shell element (such as Y) into the Gd/Yb/Er-containing core giving rise to the formation of a non-homogeneous solid solution characterized by concentration gradients and the lack of sharp interfaces. Independently of the inert shell thickness, core/interface/shell architectures were observed for all synthesized UCNCs. The presented results constitute a significant step towards the comprehensive understanding of the “structure - property” relationship of upconverting core@shell architectures, which is of prime interest not only in the development of more efficient structures but also to provide new physical insights at the nanoscale to better explain upconversion (UC) properties alterations.« less
Evidence of common and separate eye and hand accumulators underlying flexible eye-hand coordination
Jana, Sumitash; Gopal, Atul
2016-01-01
Eye and hand movements are initiated by anatomically separate regions in the brain, and yet these movements can be flexibly coupled and decoupled, depending on the need. The computational architecture that enables this flexible coupling of independent effectors is not understood. Here, we studied the computational architecture that enables flexible eye-hand coordination using a drift diffusion framework, which predicts that the variability of the reaction time (RT) distribution scales with its mean. We show that a common stochastic accumulator to threshold, followed by a noisy effector-dependent delay, explains eye-hand RT distributions and their correlation in a visual search task that required decision-making, while an interactive eye and hand accumulator model did not. In contrast, in an eye-hand dual task, an interactive model better predicted the observed correlations and RT distributions than a common accumulator model. Notably, these two models could only be distinguished on the basis of the variability and not the means of the predicted RT distributions. Additionally, signatures of separate initiation signals were also observed in a small fraction of trials in the visual search task, implying that these distinct computational architectures were not a manifestation of the task design per se. Taken together, our results suggest two unique computational architectures for eye-hand coordination, with task context biasing the brain toward instantiating one of the two architectures. NEW & NOTEWORTHY Previous studies on eye-hand coordination have considered mainly the means of eye and hand reaction time (RT) distributions. Here, we leverage the approximately linear relationship between the mean and standard deviation of RT distributions, as predicted by the drift-diffusion model, to propose the existence of two distinct computational architectures underlying coordinated eye-hand movements. These architectures, for the first time, provide a computational basis for the flexible coupling between eye and hand movements. PMID:27784809
The new landscape of parallel computer architecture
NASA Astrophysics Data System (ADS)
Shalf, John
2007-07-01
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Polytopol computing for multi-core and distributed systems
NASA Astrophysics Data System (ADS)
Spaanenburg, Henk; Spaanenburg, Lambert; Ranefors, Johan
2009-05-01
Multi-core computing provides new challenges to software engineering. The paper addresses such issues in the general setting of polytopol computing, that takes multi-core problems in such widely differing areas as ambient intelligence sensor networks and cloud computing into account. It argues that the essence lies in a suitable allocation of free moving tasks. Where hardware is ubiquitous and pervasive, the network is virtualized into a connection of software snippets judiciously injected to such hardware that a system function looks as one again. The concept of polytopol computing provides a further formalization in terms of the partitioning of labor between collector and sensor nodes. Collectors provide functions such as a knowledge integrator, awareness collector, situation displayer/reporter, communicator of clues and an inquiry-interface provider. Sensors provide functions such as anomaly detection (only communicating singularities, not continuous observation), they are generally powered or self-powered, amorphous (not on a grid) with generation-and-attrition, field re-programmable, and sensor plug-and-play-able. Together the collector and the sensor are part of the skeleton injector mechanism, added to every node, and give the network the ability to organize itself into some of many topologies. Finally we will discuss a number of applications and indicate how a multi-core architecture supports the security aspects of the skeleton injector.
Uranus and Neptune: Refugees from the Jupiter-Saturn zone?
NASA Astrophysics Data System (ADS)
Thommes, E. W.; Duncan, M. J.; Levison, H. F.
1999-09-01
Plantesimal accretion models of planet formation have been quite successful at reproducing the terrestrial region of the Solar System. However, in the outer Solar System these models run into problems, and it becomes very difficult to grow bodies to the current mass of the ``ice giants," Uranus and Neptune. Here we present an alternative scenario to in-situ formation of the ice giants. In addition to the Jupiter and Saturn solid cores, several more bodies of mass ~ 10 MEarth or more are likely to have formed in the region between 4 and 10 AU. As Jupiter's core, and perhaps Saturn's, accreted nebular gas, the other nearby bodies must have been scattered outward. Dynamical friction with the trans-Saturnian part of the planetesimal disk would have acted to decouple these ``failed cores" from their scatterer, and to circularize their orbits. Numerical simulations presented here show that systems very similar to our outer Solar System (including Uranus, Neptune, the Kuiper belt, and the scattered disk) are a natural product of this process.
Architecture of the Yeast RNA Polymerase II Open Complex and Regulation of Activity by TFIIF
Fishburn, James
2012-01-01
To investigate the function and architecture of the open complex state of RNA polymerase II (Pol II), Saccharomyces cerevisiae minimal open complexes were assembled by using a series of heteroduplex HIS4 promoters, TATA binding protein (TBP), TFIIB, and Pol II. The yeast system demonstrates great flexibility in the position of active open complexes, spanning 30 to 80 bp downstream from TATA, consistent with the transcription start site scanning behavior of yeast Pol II. TFIIF unexpectedly modulates the activity of the open complexes, either repressing or stimulating initiation. The response to TFIIF was dependent on the sequence of the template strand within the single-stranded bubble. Mutations in the TFIIB reader and linker region, which were inactive on duplex DNA, were suppressed by the heteroduplex templates, showing that a major function of the TFIIB reader and linker is in the initiation or stabilization of single-stranded DNA. Probing of the architecture of the minimal open complexes with TFIIB-FeBABE [TFIIB–p-bromoacetamidobenzyl–EDTA-iron(III)] derivatives showed that the TFIIB core domain is surprisingly positioned away from Pol II, and the addition of TFIIF repositions the TFIIB core domain to the Pol II wall domain. Together, our results show an unexpected architecture of minimal open complexes and the regulation of activity by TFIIF and the TFIIB core domain. PMID:22025674
Nonpolar InGaN/GaN Core-Shell Single Nanowire Lasers.
Li, Changyi; Wright, Jeremy B; Liu, Sheng; Lu, Ping; Figiel, Jeffrey J; Leung, Benjamin; Chow, Weng W; Brener, Igal; Koleske, Daniel D; Luk, Ting-Shan; Feezell, Daniel F; Brueck, S R J; Wang, George T
2017-02-08
We report lasing from nonpolar p-i-n InGaN/GaN multi-quantum well core-shell single-nanowire lasers by optical pumping at room temperature. The nanowire lasers were fabricated using a hybrid approach consisting of a top-down two-step etch process followed by a bottom-up regrowth process, enabling precise geometrical control and high material gain and optical confinement. The modal gain spectra and the gain curves of the core-shell nanowire lasers were measured using micro-photoluminescence and analyzed using the Hakki-Paoli method. Significantly lower lasing thresholds due to high optical gain were measured compared to previously reported semipolar InGaN/GaN core-shell nanowires, despite significantly shorter cavity lengths and reduced active region volume. Mode simulations show that due to the core-shell architecture, annular-shaped modes have higher optical confinement than solid transverse modes. The results show the viability of this p-i-n nonpolar core-shell nanowire architecture, previously investigated for next-generation light-emitting diodes, as low-threshold, coherent UV-visible nanoscale light emitters, and open a route toward monolithic, integrable, electrically injected single-nanowire lasers operating at room temperature.
Architecture and Assembly of HIV Integrase Multimers in the Absence of DNA Substrates*
Bojja, Ravi Shankar; Andrake, Mark D.; Merkel, George; Weigand, Steven; Dunbrack, Roland L.; Skalka, Anna Marie
2013-01-01
We have applied small angle x-ray scattering and protein cross-linking coupled with mass spectrometry to determine the architectures of full-length HIV integrase (IN) dimers in solution. By blocking interactions that stabilize either a core-core domain interface or N-terminal domain intermolecular contacts, we show that full-length HIV IN can form two dimer types. One is an expected dimer, characterized by interactions between two catalytic core domains. The other dimer is stabilized by interactions of the N-terminal domain of one monomer with the C-terminal domain and catalytic core domain of the second monomer as well as direct interactions between the two C-terminal domains. This organization is similar to the “reaching dimer” previously described for wild type ASV apoIN and resembles the inner, substrate binding dimer in the crystal structure of the PFV intasome. Results from our small angle x-ray scattering and modeling studies indicate that in the absence of its DNA substrate, the HIV IN tetramer assembles as two stacked reaching dimers that are stabilized by core-core interactions. These models of full-length HIV IN provide new insight into multimer assembly and suggest additional approaches for enzyme inhibition. PMID:23322775
[caCORE: core architecture of bioinformation on cancer research in America].
Gao, Qin; Zhang, Yan-lei; Xie, Zhi-yun; Zhang, Qi-peng; Hu, Zhang-zhi
2006-04-18
A critical factor in the advancement of biomedical research is the ease with which data can be integrated, redistributed and analyzed both within and across domains. This paper summarizes the Biomedical Information Core Infrastructure built by National Cancer Institute Center for Bioinformatics in America (NCICB). The main product from the Core Infrastructure is caCORE--cancer Common Ontologic Reference Environment, which is the infrastructure backbone supporting data management and application development at NCICB. The paper explains the structure and function of caCORE: (1) Enterprise Vocabulary Services (EVS). They provide controlled vocabulary, dictionary and thesaurus services, and EVS produces the NCI Thesaurus and the NCI Metathesaurus; (2) The Cancer Data Standards Repository (caDSR). It provides a metadata registry for common data elements. (3) Cancer Bioinformatics Infrastructure Objects (caBIO). They provide Java, Simple Object Access Protocol and HTTP-XML application programming interfaces. The vision for caCORE is to provide a common data management framework that will support the consistency, clarity, and comparability of biomedical research data and information. In addition to providing facilities for data management and redistribution, caCORE helps solve problems of data integration. All NCICB-developed caCORE components are distributed under open-source licenses that support unrestricted usage by both non-profit and commercial entities, and caCORE has laid the foundation for a number of scientific and clinical applications. Based on it, the paper expounds caCORE-base applications simply in several NCI projects, of which one is CMAP (Cancer Molecular Analysis Project), and the other is caBIG (Cancer Biomedical Informatics Grid). In the end, the paper also gives good prospects of caCORE, and while caCORE was born out of the needs of the cancer research community, it is intended to serve as a general resource. Cancer research has historically contributed to many areas beyond tumor biology. At the same time, the paper makes some suggestions about the study at the present time on biomedical informatics in China.
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.
2012-09-01
A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.
Self isolating high frequency saturable reactor
Moore, James A.
1998-06-23
The present invention discloses a saturable reactor and a method for decoupling the interwinding capacitance from the frequency limitations of the reactor so that the equivalent electrical circuit of the saturable reactor comprises a variable inductor. The saturable reactor comprises a plurality of physically symmetrical magnetic cores with closed loop magnetic paths and a novel method of wiring a control winding and a RF winding. The present invention additionally discloses a matching network and method for matching the impedances of a RF generator to a load. The matching network comprises a matching transformer and a saturable reactor.
El-Toni, Ahmed Mohamed; Habila, Mohamed A; Labis, Joselito Puzon; ALOthman, Zeid A; Alhoshan, Mansour; Elzatahry, Ahmed A; Zhang, Fan
2016-02-07
With the evolution of nanoscience and nanotechnology, studies have been focused on manipulating nanoparticle properties through the control of their size, composition, and morphology. As nanomaterial research has progressed, the foremost focus has gradually shifted from synthesis, morphology control, and characterization of properties to the investigation of function and the utility of integrating these materials and chemical sciences with the physical, biological, and medical fields, which therefore necessitates the development of novel materials that are capable of performing multiple tasks and functions. The construction of multifunctional nanomaterials that integrate two or more functions into a single geometry has been achieved through the surface-coating technique, which created a new class of substances designated as core-shell nanoparticles. Core-shell materials have growing and expanding applications due to the multifunctionality that is achieved through the formation of multiple shells as well as the manipulation of core/shell materials. Moreover, core removal from core-shell-based structures offers excellent opportunities to construct multifunctional hollow core architectures that possess huge storage capacities, low densities, and tunable optical properties. Furthermore, the fabrication of nanomaterials that have the combined properties of a core-shell structure with that of a hollow one has resulted in the creation of a new and important class of substances, known as the rattle core-shell nanoparticles, or nanorattles. The design strategies of these new multifunctional nanostructures (core-shell, hollow core, and nanorattle) are discussed in the first part of this review. In the second part, different synthesis and fabrication approaches for multifunctional core-shell, hollow core-shell and rattle core-shell architectures are highlighted. Finally, in the last part of the article, the versatile and diverse applications of these nanoarchitectures in catalysis, energy storage, sensing, and biomedicine are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Unseren, M.A.
A rigid body model for the entire system which accounts for the load distribution scheme proposed in Part 1 as well as for the dynamics of the manipulators and the kinematic constraints is derived in the joint space. A technique is presented for expressing the object dynamics in terms of the joint variables of both manipulators which leads to a positive definite and symmetric inertia matrix. The model is then transformed to obtain reduced order equations of motion and a separate set of equations which govern the behavior of the internal contact forces. The control architecture is applied to themore » model which results in the explicit decoupling of the position and internal contact force-controlled degrees of freedom (DOF).« less
Architecting the Finite Element Method Pipeline for the GPU.
Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T
2014-02-01
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
Control of the Speed of a Light-Induced Spin Transition through Mesoscale Core-Shell Architecture.
Felts, Ashley C; Slimani, Ahmed; Cain, John M; Andrus, Matthew J; Ahir, Akhil R; Abboud, Khalil A; Meisel, Mark W; Boukheddaden, Kamel; Talham, Daniel R
2018-05-02
The rate of the light-induced spin transition in a coordination polymer network solid dramatically increases when included as the core in mesoscale core-shell particles. A series of photomagnetic coordination polymer core-shell heterostructures, based on the light-switchable Rb a Co b [Fe(CN) 6 ] c · mH 2 O (RbCoFe-PBA) as core with the isostructural K j Ni k [Cr(CN) 6 ] l · nH 2 O (KNiCr-PBA) as shell, are studied using temperature-dependent powder X-ray diffraction and SQUID magnetometry. The core RbCoFe-PBA exhibits a charge transfer-induced spin transition (CTIST), which can be thermally and optically induced. When coupled to the shell, the rate of the optically induced transition from low spin to high spin increases. Isothermal relaxation from the optically induced high spin state of the core back to the low spin state and activation energies associated with the transition between these states were measured. The presence of a shell decreases the activation energy, which is associated with the elastic properties of the core. Numerical simulations using an electro-elastic model for the spin transition in core-shell particles supports the findings, demonstrating how coupling of the core to the shell changes the elastic properties of the system. The ability to tune the rate of optically induced magnetic and structural phase transitions through control of mesoscale architecture presents a new approach to the development of photoswitchable materials with tailored properties.
Density-matrix description of heteronuclear decoupling in A mX n systems
NASA Astrophysics Data System (ADS)
McClung, R. E. D.; John, Boban K.
A detailed investigation of the effects of ordinary noise decoupling and spherical randomization decoupling on the elements of the density matrix for A mX n spin systems is presented. The elements are shown to reach steady-state values in the rotating frame of the decoupled nuclei when the decoupling field is strong and is applied for a sufficient time interval. The steady-state values are found to be linear combinations of the density-matrix elements at the beginning of the decoupling period, and often involve mixing of populations with multiple-quantum coherences, and mixing of the perpendicular components of the magnetization with higher coherences. This description of decoupling is shown to account for the "illusions" of spin decoupling in 2D gated-decoupler 13C J-resolved spectra reported by Levitt et al.
A High Rigidity and Precision Scanning Tunneling Microscope with Decoupled XY and Z Scans
Chen, Xu; Guo, Tengfei; Hou, Yubin; Zhang, Jing
2017-01-01
A new scan-head structure for the scanning tunneling microscope (STM) is proposed, featuring high scan precision and rigidity. The core structure consists of a piezoelectric tube scanner of quadrant type (for XY scans) coaxially housed in a piezoelectric tube with single inner and outer electrodes (for Z scan). They are fixed at one end (called common end). A hollow tantalum shaft is coaxially housed in the XY-scan tube and they are mutually fixed at both ends. When the XY scanner scans, its free end will bring the shaft to scan and the tip which is coaxially inserted in the shaft at the common end will scan a smaller area if the tip protrudes short enough from the common end. The decoupled XY and Z scans are desired for less image distortion and the mechanically reduced scan range has the superiority of reducing the impact of the background electronic noise on the scanner and enhancing the tip positioning precision. High quality atomic resolution images are also shown. PMID:29270242
Vigelius, Matthias; Meyer, Bernd
2012-01-01
For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway. PMID:22506001
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
Dongarra, Jack; Gates, Mark; Haidar, Azzam; ...
2015-01-01
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA.more » High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.« less
NASA Astrophysics Data System (ADS)
Zhang, Xianxia; Wang, Jian; Qin, Tinggao
2003-09-01
Intelligent control algorithms are introduced into the control system of temperature and humidity. A multi-mode control algorithm of PI-Single Neuron is proposed for single loop control of temperature and humidity. In order to remove the coupling between temperature and humidity, a new decoupling method is presented, which is called fuzzy decoupling. The decoupling is achieved by using a fuzzy controller that dynamically modifies the static decoupling coefficient. Taking the control algorithm of PI-Single Neuron as the single loop control of temperature and humidity, the paper provides the simulated output response curves with no decoupling control, static decoupling control and fuzzy decoupling control. Those control algorithms are easily implemented in singlechip-based hardware systems.
NASA Astrophysics Data System (ADS)
Hershey, Kyle W.; Suddard-Bangsund, John; Qian, Gang; Holmes, Russell J.
2017-09-01
The analysis of organic light-emitting device degradation is typically restricted to fitting the overall luminance loss as a function of time or the characterization of fully degraded devices. To develop a more complete understanding of degradation, additional specific data are needed as a function of luminance loss. The overall degradation in luminance during testing can be decoupled into a loss in emitter photoluminescence efficiency and a reduction in the exciton formation efficiency. Here, we demonstrate a method that permits separation of these component efficiencies, yielding the time evolution of two additional specific device parameters that can be used in interpreting and modeling degradation without modification to the device architecture or introduction of any additional post-degradation characterization steps. Here, devices based on the phosphor tris[2-phenylpyridinato-C2,N]iridium(III) (Ir(ppy)3) are characterized as a function of initial luminance and emissive layer thickness. The overall loss in device luminance is found to originate primarily from a reduction in the exciton formation efficiency which is exacerbated in devices with thinner emissive layers. Interestingly, the contribution to overall degradation from a reduction in the efficiency of exciton recombination (i.e., photoluminescence) is unaffected by thickness, suggesting a fixed exciton recombination zone width and degradation at an interface.
NASA Astrophysics Data System (ADS)
Luo, Yugong; Chen, Tao; Li, Keqiang
2015-12-01
The paper presents a novel active distance control strategy for intelligent hybrid electric vehicles (IHEV) with the purpose of guaranteeing an optimal performance in view of the driving functions, optimum safety, fuel economy and ride comfort. Considering the complexity of driving situations, the objects of safety and ride comfort are decoupled from that of fuel economy, and a hierarchical control architecture is adopted to improve the real-time performance and the adaptability. The hierarchical control structure consists of four layers: active distance control object determination, comprehensive driving and braking torque calculation, comprehensive torque distribution and torque coordination. The safety distance control and the emergency stop algorithms are designed to achieve the safety and ride comfort goals. The optimal rule-based energy management algorithm of the hybrid electric system is developed to improve the fuel economy. The torque coordination control strategy is proposed to regulate engine torque, motor torque and hydraulic braking torque to improve the ride comfort. This strategy is verified by simulation and experiment using a forward simulation platform and a prototype vehicle. The results show that the novel control strategy can achieve the integrated and coordinated control of its multiple subsystems, which guarantees top performance of the driving functions and optimum safety, fuel economy and ride comfort.
Zandveld, Jelle; van den Heuvel, Joost; Mulder, Maarten; Brakefield, Paul M; Kirkwood, Thomas B L; Shanley, Daryl P; Zwaan, Bas J
2017-11-01
Phenotypic plasticity is an important concept in life-history evolution, and most organisms, including Drosophila melanogaster, show a plastic life-history response to diet. However, little is known about how these life-history responses are mediated. In this study, we compared adult female flies fed an alternating diet (yoyo flies) with flies fed a constant low (CL) or high (CH) diet and tested how whole genome expression was affected by these diet regimes and how the transcriptional responses related to different life-history traits. We showed that flies were able to respond quickly to diet fluctuations throughout life span by drastically changing their transcription. Importantly, by measuring the response of multiple life-history traits we were able to decouple groups of genes associated with life span or reproduction, life-history traits that often covary with a diet change. A coexpression network analysis uncovered which genes underpin the separate and shared regulation of these life-history traits. Our study provides essential insights to help unravel the genetic architecture mediating life-history responses to diet, and it shows that the flies' whole genome transcription response is highly plastic. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Information systems in healthcare - state and steps towards sustainability.
Lenz, R
2009-01-01
To identify core challenges and first steps on the way to sustainable information systems in healthcare. Recent articles on healthcare information technology and related articles from Medical Informatics and Computer Science were reviewed and analyzed. Core challenges that couldn't be solved over the years are identified. The two core problem areas are process integration, meaning to effectively embed IT-systems into routine workflows, and systems integration, meaning to reduce the effort for interconnecting independently developed IT-components. Standards for systems integration have improved a lot, but their usefulness is limited where system evolution is needed. Sustainable Healthcare Information Systems should be based on system architectures that support system evolution and avoid costly system replacements every five to ten years. Some basic principles for the design of such systems are separation of concerns, loose coupling, deferred systems design, and service oriented architectures.
Thermal Hotspots in CPU Die and It's Future Architecture
NASA Astrophysics Data System (ADS)
Wang, Jian; Hu, Fu-Yuan
Owing to the increasing core frequency and chip integration and the limited die dimension, the power densities in CPU chip have been increasing fastly. The high temperature on chip resulted by power densities threats the processor's performance and chip's reliability. This paper analyzed the thermal hotspots in die and their properties. A new architecture of function units in die - - hot units distributed architecture is suggested to cope with the problems of high power densities for future processor chip.
Security for IP Multimedia Services in the 3GPP Third Generation Mobile System.
ERIC Educational Resources Information Center
Horn, G.; Kroselberg, D.; Muller, K.
2003-01-01
Presents an overview of the security architecture of the IP multimedia core network subsystem (IMS) of the third generation mobile system, known in Europe as UMTS. Discusses IMS security requirements; IMS security architecture; authentication between IMS user and home network; integrity and confidentiality for IMS signalling; and future aspects of…
Designing the invisible architecture of your hospital.
Tye, Joe
2011-01-01
Before building or remodeling a hospital, architects develop a complete set of blueprints. That same sort of detailed attention should be given to the "invisible architecture" of core values, corporate culture, and emotional attitude because this has a much greater impact on the patient and employee experience than do the bricks and mortar.
NASA Technical Reports Server (NTRS)
1983-01-01
The remote manipulating system, the pointing control system, and the external radiator for the core module of the space station are discussed. The principal interfaces for four basic classes of user and transportation vehicles or facilities associated with the space station were examined.
ERIC Educational Resources Information Center
Alexander, Christopher; And Others
This is the third of three works that lay the basis for a new approach to architecture, building, and planning. At the core of these books is the idea that people should design for themselves their own houses, streets, and communities. Although the idea implies a radical transformation of the architectural profession, it comes from the observation…
Hofmann, Hannes G; Keck, Benjamin; Rohkohl, Christopher; Hornegger, Joachim
2011-01-01
Interventional reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. Hardware optimization is not an option but mandatory for interventional image processing and, in particular, for image reconstruction due to the high demands on performance. Several groups have published fast analytical 3-D reconstruction on highly parallel hardware such as GPUs to mitigate this issue. The authors show that the performance of modern CPU-based systems is in the same order as current GPUs for static 3-D reconstruction and outperforms them for a recent motion compensated (3-D+time) image reconstruction algorithm. This work investigates two algorithms: Static 3-D reconstruction as well as a recent motion compensated algorithm. The evaluation was performed using a standardized reconstruction benchmark, RABBITCT, to get comparable results and two additional clinical data sets. The authors demonstrate for a parametric B-spline motion estimation scheme that the derivative computation, which requires many write operations to memory, performs poorly on the GPU and can highly benefit from modern CPU architectures with large caches. Moreover, on a 32-core Intel Xeon server system, the authors achieve linear scaling with the number of cores used and reconstruction times almost in the same range as current GPUs. Algorithmic innovations in the field of motion compensated image reconstruction may lead to a shift back to CPUs in the future. For analytical 3-D reconstruction, the authors show that the gap between GPUs and CPUs became smaller. It can be performed in less than 20 s (on-the-fly) using a 32-core server.
Exascale Hardware Architectures Working Group
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, S; Ang, J; Chiang, P
2011-03-15
The ASC Exascale Hardware Architecture working group is challenged to provide input on the following areas impacting the future use and usability of potential exascale computer systems: processor, memory, and interconnect architectures, as well as the power and resilience of these systems. Going forward, there are many challenging issues that will need to be addressed. First, power constraints in processor technologies will lead to steady increases in parallelism within a socket. Additionally, all cores may not be fully independent nor fully general purpose. Second, there is a clear trend toward less balanced machines, in terms of compute capability compared tomore » memory and interconnect performance. In order to mitigate the memory issues, memory technologies will introduce 3D stacking, eventually moving on-socket and likely on-die, providing greatly increased bandwidth but unfortunately also likely providing smaller memory capacity per core. Off-socket memory, possibly in the form of non-volatile memory, will create a complex memory hierarchy. Third, communication energy will dominate the energy required to compute, such that interconnect power and bandwidth will have a significant impact. All of the above changes are driven by the need for greatly increased energy efficiency, as current technology will prove unsuitable for exascale, due to unsustainable power requirements of such a system. These changes will have the most significant impact on programming models and algorithms, but they will be felt across all layers of the machine. There is clear need to engage all ASC working groups in planning for how to deal with technological changes of this magnitude. The primary function of the Hardware Architecture Working Group is to facilitate codesign with hardware vendors to ensure future exascale platforms are capable of efficiently supporting the ASC applications, which in turn need to meet the mission needs of the NNSA Stockpile Stewardship Program. This issue is relatively immediate, as there is only a small window of opportunity to influence hardware design for 2018 machines. Given the short timeline a firm co-design methodology with vendors is of prime importance.« less
PyNEST: A Convenient Interface to the NEST Simulator.
Eppler, Jochen Martin; Helias, Moritz; Muller, Eilif; Diesmann, Markus; Gewaltig, Marc-Oliver
2008-01-01
The neural simulation tool NEST (http://www.nest-initiative.org) is a simulator for heterogeneous networks of point neurons or neurons with a small number of compartments. It aims at simulations of large neural systems with more than 10(4) neurons and 10(7) to 10(9) synapses. NEST is implemented in C++ and can be used on a large range of architectures from single-core laptops over multi-core desktop computers to super-computers with thousands of processor cores. Python (http://www.python.org) is a modern programming language that has recently received considerable attention in Computational Neuroscience. Python is easy to learn and has many extension modules for scientific computing (e.g. http://www.scipy.org). In this contribution we describe PyNEST, the new user interface to NEST. PyNEST combines NEST's efficient simulation kernel with the simplicity and flexibility of Python. Compared to NEST's native simulation language SLI, PyNEST makes it easier to set up simulations, generate stimuli, and analyze simulation results. We describe how PyNEST connects NEST and Python and how it is implemented. With a number of examples, we illustrate how it is used.
PyNEST: A Convenient Interface to the NEST Simulator
Eppler, Jochen Martin; Helias, Moritz; Muller, Eilif; Diesmann, Markus; Gewaltig, Marc-Oliver
2008-01-01
The neural simulation tool NEST (http://www.nest-initiative.org) is a simulator for heterogeneous networks of point neurons or neurons with a small number of compartments. It aims at simulations of large neural systems with more than 104 neurons and 107 to 109 synapses. NEST is implemented in C++ and can be used on a large range of architectures from single-core laptops over multi-core desktop computers to super-computers with thousands of processor cores. Python (http://www.python.org) is a modern programming language that has recently received considerable attention in Computational Neuroscience. Python is easy to learn and has many extension modules for scientific computing (e.g. http://www.scipy.org). In this contribution we describe PyNEST, the new user interface to NEST. PyNEST combines NEST's efficient simulation kernel with the simplicity and flexibility of Python. Compared to NEST's native simulation language SLI, PyNEST makes it easier to set up simulations, generate stimuli, and analyze simulation results. We describe how PyNEST connects NEST and Python and how it is implemented. With a number of examples, we illustrate how it is used. PMID:19198667
NASA Astrophysics Data System (ADS)
Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.
2016-04-01
We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
A data grid for imaging-based clinical trials
NASA Astrophysics Data System (ADS)
Zhou, Zheng; Chao, Sander S.; Lee, Jasper; Liu, Brent; Documet, Jorge; Huang, H. K.
2007-03-01
Clinical trials play a crucial role in testing new drugs or devices in modern medicine. Medical imaging has also become an important tool in clinical trials because images provide a unique and fast diagnosis with visual observation and quantitative assessment. A typical imaging-based clinical trial consists of: 1) A well-defined rigorous clinical trial protocol, 2) a radiology core that has a quality control mechanism, a biostatistics component, and a server for storing and distributing data and analysis results; and 3) many field sites that generate and send image studies to the radiology core. As the number of clinical trials increases, it becomes a challenge for a radiology core servicing multiple trials to have a server robust enough to administrate and quickly distribute information to participating radiologists/clinicians worldwide. The Data Grid can satisfy the aforementioned requirements of imaging based clinical trials. In this paper, we present a Data Grid architecture for imaging-based clinical trials. A Data Grid prototype has been implemented in the Image Processing and Informatics (IPI) Laboratory at the University of Southern California to test and evaluate performance in storing trial images and analysis results for a clinical trial. The implementation methodology and evaluation protocol of the Data Grid are presented.
The Strange Game of Prestige Scholarships
ERIC Educational Resources Information Center
Knox, John A.
2017-01-01
Honors programs, as home to the highest test scores and highest GPAs on many campuses (for reasons that are not particularly justifiable), can become assembly lines for prestige-scholarship applications and their dangling appendages, the applicants themselves. As honors programs become cogs in universities' PR machines, they decouple from their…
Stamatakis, Alexandros; Ott, Michael
2008-12-27
The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAXML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
Low loss hollow-core waveguide on a silicon substrate
NASA Astrophysics Data System (ADS)
Yang, Weijian; Ferrara, James; Grutter, Karen; Yeh, Anthony; Chase, Chris; Yue, Yang; Willner, Alan E.; Wu, Ming C.; Chang-Hasnain, Connie J.
2012-07-01
Optical-fiber-based, hollow-core waveguides (HCWs) have opened up many new applications in laser surgery, gas sensors, and non-linear optics. Chip-scale HCWs are desirable because they are compact, light-weight and can be integrated with other devices into systems-on-a-chip. However, their progress has been hindered by the lack of a low loss waveguide architecture. Here, a completely new waveguiding concept is demonstrated using two planar, parallel, silicon-on-insulator wafers with high-contrast subwavelength gratings to reflect light in-between. We report a record low optical loss of 0.37 dB/cm for a 9-μm waveguide, mode-matched to a single mode fiber. Two-dimensional light confinement is experimentally realized without sidewalls in the HCWs, which is promising for ultrafast sensing response with nearly instantaneous flow of gases or fluids. This unique waveguide geometry establishes an entirely new scheme for low-cost chip-scale sensor arrays and lab-on-a-chip applications.
Large-Scale Compute-Intensive Analysis via a Combined In-situ and Co-scheduling Workflow Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Sewell, Christopher; Heitmann, Katrin
2015-01-01
Large-scale simulations can produce tens of terabytes of data per analysis cycle, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in situ and co-scheduling approaches for handling Petabyte-size outputs. An initial inmore » situ step is used to reduce the amount of data to be analyzed, and to separate out the data-intensive tasks handled off-line. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.« less
Reconfigurable modular computer networks for spacecraft on-board processing
NASA Technical Reports Server (NTRS)
Rennels, D. A.
1978-01-01
The core electronics subsystems on unmanned spacecraft, which have been sent over the last 20 years to investigate the moon, Mars, Venus, and Mercury, have progressed through an evolution from simple fixed controllers and analog computers in the 1960's to general-purpose digital computers in current designs. This evolution is now moving in the direction of distributed computer networks. Current Voyager spacecraft already use three on-board computers. One is used to store commands and provide overall spacecraft management. Another is used for instrument control and telemetry collection, and the third computer is used for attitude control and scientific instrument pointing. An examination of the control logic in the instruments shows that, for many, it is cost-effective to replace the sequencing logic with a microcomputer. The Unified Data System architecture considered consists of a set of standard microcomputers connected by several redundant buses. A typical self-checking computer module will contain 23 RAMs, two microprocessors, one memory interface, three bus interfaces, and one core building block.
Reconfigurable Very Long Instruction Word (VLIW) Processor
NASA Technical Reports Server (NTRS)
Velev, Miroslav N.
2015-01-01
Future NASA missions will depend on radiation-hardened, power-efficient processing systems-on-a-chip (SOCs) that consist of a range of processor cores custom tailored for space applications. Aries Design Automation, LLC, has developed a processing SOC that is optimized for software-defined radio (SDR) uses. The innovation implements the Institute of Electrical and Electronics Engineers (IEEE) RazorII voltage management technique, a microarchitectural mechanism that allows processor cores to self-monitor, self-analyze, and selfheal after timing errors, regardless of their cause (e.g., radiation; chip aging; variations in the voltage, frequency, temperature, or manufacturing process). This highly automated SOC can also execute legacy PowerPC 750 binary code instruction set architecture (ISA), which is used in the flight-control computers of many previous NASA space missions. In developing this innovation, Aries Design Automation has made significant contributions to the fields of formal verification of complex pipelined microprocessors and Boolean satisfiability (SAT) and has developed highly efficient electronic design automation tools that hold promise for future developments.
NASA Astrophysics Data System (ADS)
Willans, Mathew J.; Sears, Devin N.; Wasylishen, Roderick E.
2008-03-01
The use of continuous-wave (CW) 1H decoupling has generally provided little improvement in the 13C MAS NMR spectroscopy of paramagnetic organic solids. Recent solid-state 13C NMR studies have demonstrated that at rapid magic-angle spinning rates CW decoupling can result in reductions in signal-to-noise and that 1H decoupling should be omitted when acquiring 13C MAS NMR spectra of paramagnetic solids. However, studies of the effectiveness of modern 1H decoupling sequences are lacking, and the performance of such sequences over a variety of experimental conditions must be investigated before 1H decoupling is discounted altogether. We have studied the performance of several commonly used advanced decoupling pulse sequences, namely the TPPM, SPINAL-64, XiX, and eDROOPY sequences, in 13C MAS NMR experiments performed under four combinations of the magnetic field strength (7.05 or 11.75 T), rotor frequency (15 or 30 kHz), and 1H rf-field strength (71, 100, or 140 kHz). The effectiveness of these sequences has been evaluated by comparing the 13C signal intensity, linewidth at half-height, LWHH, and coherence lifetimes, T2', of the methine carbon of copper(II) bis( DL-alanine) monohydrate, Cu(ala) 2·H 2O, and methylene carbon of copper(II) bis( DL-2-aminobutyrate), Cu(ambut) 2, obtained with the advanced sequences to those obtained without 1H decoupling, with CW decoupling, and for fully deuterium labelled samples. The latter have been used as model compounds with perfect 1H decoupling and provide a measure of the efficiency of the 1H decoupling sequence. Overall, the effectiveness of 1H decoupling depends strongly on the decoupling sequence utilized, the experimental conditions and the sample studied. Of the decoupling sequences studied, the XiX sequence consistently yielded the best results, although any of the advanced decoupling sequences strongly outperformed the CW sequence and provided improvements over no 1H decoupling. Experiments performed at 7.05 T demonstrate that the XiX decoupling sequence is the least sensitive to changes in the 1H transmitter frequency and may explain the superior performance of this decoupling sequence. Overall, the most important factor in the effectiveness of 1H decoupling was the carbon type studied, with the methylene carbon of Cu(ambut) 2 being substantially more sensitive to 1H decoupling than the methine carbon of Cu(ala) 2·H 2O. An analysis of the various broadening mechanisms contributing to 13C linewidths has been performed in order to rationalize the different sensitivities of the two carbon sites under the four experimental conditions.
Surface-atmosphere decoupling limits accumulation at Summit, Greenland.
Berkelhammer, Max; Noone, David C; Steen-Larsen, Hans Christian; Bailey, Adriana; Cox, Christopher J; O'Neill, Michael S; Schneider, David; Steffen, Konrad; White, James W C
2016-04-01
Despite rapid melting in the coastal regions of the Greenland Ice Sheet, a significant area (~40%) of the ice sheet rarely experiences surface melting. In these regions, the controls on annual accumulation are poorly constrained owing to surface conditions (for example, surface clouds, blowing snow, and surface inversions), which render moisture flux estimates from myriad approaches (that is, eddy covariance, remote sensing, and direct observations) highly uncertain. Accumulation is partially determined by the temperature dependence of saturation vapor pressure, which influences the maximum humidity of air parcels reaching the ice sheet interior. However, independent proxies for surface temperature and accumulation from ice cores show that the response of accumulation to temperature is variable and not generally consistent with a purely thermodynamic control. Using three years of stable water vapor isotope profiles from a high altitude site on the Greenland Ice Sheet, we show that as the boundary layer becomes increasingly stable, a decoupling between the ice sheet and atmosphere occurs. The limited interaction between the ice sheet surface and free tropospheric air reduces the capacity for surface condensation to achieve the rate set by the humidity of the air parcels reaching interior Greenland. The isolation of the surface also acts to recycle sublimated moisture by recondensing it onto fog particles, which returns the moisture back to the surface through gravitational settling. The observations highlight a unique mechanism by which ice sheet mass is conserved, which has implications for understanding both past and future changes in accumulation rate and the isotopic signal in ice cores from Greenland.
Evidence for Cluster to Cluster Variations in Low-mass Stellar Rotational Evolution
NASA Astrophysics Data System (ADS)
Coker, Carl T.; Pinsonneault, Marc; Terndrup, Donald M.
2016-12-01
The concordance model for angular momentum evolution postulates that star-forming regions and clusters are an evolutionary sequence that can be modeled with assumptions about protostar-disk coupling, angular momentum loss from magnetized winds that saturates in a mass-dependent fashion at high rotation rates, and core-envelope decoupling for solar analogs. We test this approach by combining established data with the large h Per data set from the MONITOR project and new low-mass Pleiades data. We confirm prior results that young low-mass stars can be used to test star-disk coupling and angular momentum loss independent of the treatment of internal angular momentum transport. For slow rotators, we confirm the need for star-disk interactions to evolve the ONC to older systems, using h Per (age 13 Myr) as our natural post-disk case. There is no evidence for extremely long-lived disks as an alternative to core-envelope decoupling. However, our wind models cannot evolve rapid rotators from h Per to older systems consistently, and we find that this result is robust with respect to the choice of angular momentum loss prescription. We outline two possible solutions: either there is cosmic variance in the distribution of stellar rotation rates in different clusters or there are substantially enhanced torques in low-mass rapid rotators. We favor the former explanation and discuss observational tests that could be used to distinguish them. If the distribution of initial conditions depends on environment, models that test parameters by assuming a universal underlying distribution of initial conditions will need to be re-evaluated.
Surface-atmosphere decoupling limits accumulation at Summit, Greenland
Berkelhammer, Max; Noone, David C.; Steen-Larsen, Hans Christian; Bailey, Adriana; Cox, Christopher J.; O’Neill, Michael S.; Schneider, David; Steffen, Konrad; White, James W. C.
2016-01-01
Despite rapid melting in the coastal regions of the Greenland Ice Sheet, a significant area (~40%) of the ice sheet rarely experiences surface melting. In these regions, the controls on annual accumulation are poorly constrained owing to surface conditions (for example, surface clouds, blowing snow, and surface inversions), which render moisture flux estimates from myriad approaches (that is, eddy covariance, remote sensing, and direct observations) highly uncertain. Accumulation is partially determined by the temperature dependence of saturation vapor pressure, which influences the maximum humidity of air parcels reaching the ice sheet interior. However, independent proxies for surface temperature and accumulation from ice cores show that the response of accumulation to temperature is variable and not generally consistent with a purely thermodynamic control. Using three years of stable water vapor isotope profiles from a high altitude site on the Greenland Ice Sheet, we show that as the boundary layer becomes increasingly stable, a decoupling between the ice sheet and atmosphere occurs. The limited interaction between the ice sheet surface and free tropospheric air reduces the capacity for surface condensation to achieve the rate set by the humidity of the air parcels reaching interior Greenland. The isolation of the surface also acts to recycle sublimated moisture by recondensing it onto fog particles, which returns the moisture back to the surface through gravitational settling. The observations highlight a unique mechanism by which ice sheet mass is conserved, which has implications for understanding both past and future changes in accumulation rate and the isotopic signal in ice cores from Greenland. PMID:27386509
Reiher, Markus; Wolf, Alexander
2004-12-08
In order to achieve exact decoupling of the Dirac Hamiltonian within a unitary transformation scheme, we have discussed in part I of this series that either a purely numerical iterative technique (the Barysz-Sadlej-Snijders method) or a stepwise analytic approach (the Douglas-Kroll-Hess method) are possible. For the evaluation of Douglas-Kroll-Hess Hamiltonians up to a pre-defined order it was shown that a symbolic scheme has to be employed. In this work, an algorithm for this analytic derivation of Douglas-Kroll-Hess Hamiltonians up to any arbitrary order in the external potential is presented. We discuss how an estimate for the necessary order for exact decoupling (within machine precision) for a given system can be determined from the convergence behavior of the Douglas-Kroll-Hess expansion prior to a quantum chemical calculation. Once this maximum order has been accomplished, the spectrum of the positive-energy part of the decoupled Hamiltonian, e.g., for electronic bound states, cannot be distinguished from the corresponding part of the spectrum of the Dirac operator. An efficient scalar-relativistic implementation of the symbolic operations for the evaluation of the positive-energy part of the block-diagonal Hamiltonian is presented, and its accuracy is tested for ground-state energies of one-electron ions over the whole periodic table. Furthermore, the first many-electron calculations employing sixth up to fourteenth order DKH Hamiltonians are presented. (c) 2004 American Institute of Physics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reiher, Markus; Wolf, Alexander
In order to achieve exact decoupling of the Dirac Hamiltonian within a unitary transformation scheme, we have discussed in part I of this series that either a purely numerical iterative technique (the Barysz-Sadlej-Snijders method) or a stepwise analytic approach (the Douglas-Kroll-Hess method) are possible. For the evaluation of Douglas-Kroll-Hess Hamiltonians up to a pre-defined order it was shown that a symbolic scheme has to be employed. In this work, an algorithm for this analytic derivation of Douglas-Kroll-Hess Hamiltonians up to any arbitrary order in the external potential is presented. We discuss how an estimate for the necessary order for exactmore » decoupling (within machine precision) for a given system can be determined from the convergence behavior of the Douglas-Kroll-Hess expansion prior to a quantum chemical calculation. Once this maximum order has been accomplished, the spectrum of the positive-energy part of the decoupled Hamiltonian, e.g., for electronic bound states, cannot be distinguished from the corresponding part of the spectrum of the Dirac operator. An efficient scalar-relativistic implementation of the symbolic operations for the evaluation of the positive-energy part of the block-diagonal Hamiltonian is presented, and its accuracy is tested for ground-state energies of one-electron ions over the whole periodic table. Furthermore, the first many-electron calculations employing sixth up to fourteenth order DKH Hamiltonians are presented.« less
NASA Astrophysics Data System (ADS)
Clenet, A.; Ravera, L.; Bertrand, B.; den Hartog, R.; Jackson, B.; van Leeuwen, B.-J.; van Loon, D.; Parot, Y.; Pointecouteau, E.; Sournac, A.
2014-11-01
IRAP is developing the readout electronics of the SPICA-SAFARI's TES bolometer arrays. Based on the frequency domain multiplexing technique the readout electronics provides the AC-signals to voltage-bias the detectors; it demodulates the data; and it computes a feedback to linearize the detection chain. The feedback is computed with a specific technique, so called baseband feedback (BBFB) which ensures that the loop is stable even with long propagation and processing delays (i.e. several μ s) and with fast signals (i.e. frequency carriers of the order of 5 MHz). To optimize the power consumption we took advantage of the reduced science signal bandwidth to decouple the signal sampling frequency and the data processing rate. This technique allowed a reduction of the power consumption of the circuit by a factor of 10. Beyond the firmware architecture the optimization of the instrument concerns the characterization routines and the definition of the optimal parameters. Indeed, to operate an array TES one has to properly define about 21000 parameters. We defined a set of procedures to automatically characterize these parameters and find out the optimal settings.
A universal data access and protocol integration mechanism for smart home
NASA Astrophysics Data System (ADS)
Shao, Pengfei; Yang, Qi; Zhang, Xuan
2013-03-01
With the lack of standardized or completely missing communication interfaces in home electronics, there is no perfect solution to address every aspect in smart homes based on existing protocols and technologies. In addition, the central control unit (CCU) of smart home system working point-to-point between the multiple application interfaces and the underlying hardware interfaces leads to its complicated architecture and unpleasant performance. A flexible data access and protocol integration mechanism is required. The current paper offers a universal, comprehensive data access and protocol integration mechanism for a smart home. The universal mechanism works as a middleware adapter with unified agreements of the communication interfaces and protocols, offers an abstraction of the application level from the hardware specific and decoupling the hardware interface modules from the application level. Further abstraction for the application interfaces and the underlying hardware interfaces are executed based on adaption layer to provide unified interfaces for more flexible user applications and hardware protocol integration. This new universal mechanism fundamentally changes the architecture of the smart home and in some way meets the practical requirement of smart homes more flexible and desirable.
The Use of Polymer Design in Resorbable Colloids
NASA Astrophysics Data System (ADS)
Finne-Wistrand, Anna; Albertsson, Ann-Christine
2006-08-01
During the past decade, researchers in the field of polymer chemistry have developed a wide range of very powerful procedures for constructing ever-more-sophisticated polymers. These methods subsequently have been used in suitable systems to solve specific medical problems. This is complicated, and many key factors such as mechanical properties, biocompatibility, biodegradation, stability, and degradation profile must be considered. Colloid particle systems can be used to solve many biomedical- and pharmaceutical-related problems, and it is expected that nanotechnology can be used to develop these materials, devices, and systems even further. For example, an injectible scaffold system with a defined release and degradation profile has huge potential for the repair and regeneration of damaged tissues. This short, nonexhaustive review presents examples of polymer architecture in resorbable particles that have been compared and tested in biomedical applications. We also discuss the design of polymers for core-shell structures.
Aprà, E; Kowalski, K
2016-03-08
In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
GPU/MIC Acceleration of the LHC High Level Trigger to Extend the Physics Reach at the LHC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Halyo, Valerie; Tully, Christopher
The quest for rare new physics phenomena leads the PI [3] to propose evaluation of coprocessors based on Graphics Processing Units (GPUs) and the Intel Many Integrated Core (MIC) architecture for integration into the trigger system at LHC. This will require development of a new massively parallel implementation of the well known Combinatorial Track Finder which uses the Kalman Filter to accelerate processing of data from the silicon pixel and microstrip detectors and reconstruct the trajectory of all charged particles down to momentums of 100 MeV. It is expected to run at least one order of magnitude faster than anmore » equivalent algorithm on a quad core CPU for extreme pileup scenarios of 100 interactions per bunch crossing. The new tracking algorithms will be developed and optimized separately on the GPU and Intel MIC and then evaluated against each other for performance and power efficiency. The results will be used to project the cost of the proposed hardware architectures for the HLT server farm, taking into account the long term projections of the main vendors in the market (AMD, Intel, and NVIDIA) over the next 10 years. Extensive experience and familiarity of the PI with the LHC tracker and trigger requirements led to the development of a complementary tracking algorithm that is described in [arxiv: 1305.4855], [arxiv: 1309.6275] and preliminary results accepted to JINST.« less
Utilizing IHE-based Electronic Health Record systems for secondary use.
Holzer, K; Gall, W
2011-01-01
Due to the increasing adoption of Electronic Health Records (EHRs) for primary use, the number of electronic documents stored in such systems will soar in the near future. In order to benefit from this development in secondary fields such as medical research, it is important to define requirements for the secondary use of EHR data. Furthermore, analyses of the extent to which an IHE (Integrating the Healthcare Enterprise)-based architecture would fulfill these requirements could provide further information on upcoming obstacles for the secondary use of EHRs. A catalog of eight core requirements for secondary use of EHR data was deduced from the published literature, the risk analysis of the IHE profile MPQ (Multi-Patient Queries) and the analysis of relevant questions. The IHE-based architecture for cross-domain, patient-centered document sharing was extended to a cross-patient architecture. We propose an IHE-based architecture for cross-patient and cross-domain secondary use of EHR data. Evaluation of this architecture concerning the eight core requirements revealed positive fulfillment of six and the partial fulfillment of two requirements. Although not regarded as a primary goal in modern electronic healthcare, the re-use of existing electronic medical documents in EHRs for research and other fields of secondary application holds enormous potential for the future. Further research in this respect is necessary.
Velsko, Stephan; Bates, Thomas
2016-01-01
Despite numerous calls for improvement, the US biosurveillance enterprise remains a patchwork of uncoordinated systems that fail to take advantage of the rapid progress in information processing, communication, and analytics made in the past decade. By synthesizing components from the extensive biosurveillance literature, we propose a conceptual framework for a national biosurveillance architecture and provide suggestions for implementation. The framework differs from the current federal biosurveillance development pathway in that it is not focused on systems useful for "situational awareness" but is instead focused on the long-term goal of having true warning capabilities. Therefore, a guiding design objective is the ability to digitally detect emerging threats that span jurisdictional boundaries, because attempting to solve the most challenging biosurveillance problem first provides the strongest foundation to meet simpler surveillance objectives. Core components of the vision are: (1) a whole-of-government approach to support currently disparate federal surveillance efforts that have a common data need, including those for food safety, vaccine and medical product safety, and infectious disease surveillance; (2) an information architecture that enables secure national access to electronic health records, yet does not require that data be sent to a centralized location for surveillance analysis; (3) an inference architecture that leverages advances in "big data" analytics and learning inference engines-a significant departure from the statistical process control paradigm that underpins nearly all current syndromic surveillance systems; and (4) an organizational architecture with a governance model aimed at establishing national biosurveillance as a critical part of the US national infrastructure. Although it will take many years to implement, and a national campaign of education and debate to acquire public buy-in for such a comprehensive system, the potential benefits warrant increased consideration by the US government.
Yan, Xinqiang; Zhang, Xiaoliang; Wei, Long; Xue, Rong
2015-01-01
Radio-frequency coil arrays using dipole antenna technique have been recently applied for ultrahigh field magnetic resonance (MR) imaging to obtain the better signal-noise-ratio (SNR) gain at the deep area of human tissues. However, the unique structure of dipole antennas makes it challenging to achieve sufficient electromagnetic decoupling among the dipole antenna elements. Currently, there is no decoupling methods proposed for dipole antenna arrays in MR imaging. The recently developed magnetic wall (MW) or induced current elimination decoupling technique has demonstrated its feasibility and robustness in designing microstrip transmission line arrays, L/C loop arrays and monopole arrays. In this study, we aim to investigate the possibility and performance of MW decoupling technique in dipole arrays for MR imaging at the ultrahigh field of 7T. To achieve this goal, a two-channel MW decoupled dipole array was designed, constructed and analyzed experimentally through bench test and MR imaging. Electromagnetic isolation between the two dipole elements was improved from about -3.6 dB (without any decoupling treatments) to -16.5 dB by using the MW decoupling method. MR images acquired from a water phantom using the MW decoupled dipole array and the geometry factor maps were measured, calculated and compared with those acquired using the dipole array without decoupling treatments. The MW decoupled dipole array demonstrated well-defined image profiles from each element and had better geometry factor over the array without decoupling treatments. The experimental results indicate that the MW decoupling technique might be a promising solution to reducing the electromagnetic coupling of dipole arrays in ultrahigh field MRI, consequently improving their performance in SNR and parallel imaging.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures
NASA Technical Reports Server (NTRS)
Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.
2017-01-01
The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
NASA Technical Reports Server (NTRS)
Berg, Melanie D.; LaBel, Kenneth A.
2018-01-01
The following are updated or new subjects added to the FPGA SEE Test Guidelines manual: academic versus mission specific device evaluation, single event latch-up (SEL) test and analysis, SEE response visibility enhancement during radiation testing, mitigation evaluation (embedded and user-implemented), unreliable design and its affects to SEE Data, testing flushable architectures versus non-flushable architectures, intellectual property core (IP Core) test and evaluation (addresses embedded and user-inserted), heavy-ion energy and linear energy transfer (LET) selection, proton versus heavy-ion testing, fault injection, mean fluence to failure analysis, and mission specific system-level single event upset (SEU) response prediction. Most sections within the guidelines manual provide information regarding best practices for test structure and test system development. The scope of this manual addresses academic versus mission specific device evaluation and visibility enhancement in IP Core testing.
NASA Technical Reports Server (NTRS)
Berg, Melanie D.; LaBel, Kenneth A.
2018-01-01
The following are updated or new subjects added to the FPGA SEE Test Guidelines manual: academic versus mission specific device evaluation, single event latch-up (SEL) test and analysis, SEE response visibility enhancement during radiation testing, mitigation evaluation (embedded and user-implemented), unreliable design and its affects to SEE Data, testing flushable architectures versus non-flushable architectures, intellectual property core (IP Core) test and evaluation (addresses embedded and user-inserted), heavy-ion energy and linear energy transfer (LET) selection, proton versus heavy-ion testing, fault injection, mean fluence to failure analysis, and mission specific system-level single event upset (SEU) response prediction. Most sections within the guidelines manual provide information regarding best practices for test structure and test system development. The scope of this manual addresses academic versus mission specific device evaluation and visibility enhancement in IP Core testing.
Bouallaga, I; Massicard, S; Yaniv, M; Thierry, F
2000-11-01
Recent studies have reported new mechanisms that mediate the transcriptional synergy of strong tissue-specific enhancers, involving the cooperative assembly of higher-order nucleoprotein complexes called enhanceosomes. Here we show that the HPV18 enhancer, which controls the epithelial-specific transcription of the E6 and E7 transforming genes, exhibits characteristic features of these structures. We used deletion experiments to show that a core enhancer element cooperates, in a specific helical phasing, with distant essential factors binding to the ends of the enhancer. This core sequence, binding a Jun B/Fra-2 heterodimer, cooperatively recruits the architectural protein HMG-I(Y) in a nucleoprotein complex, where they interact with each other. Therefore, in HeLa cells, HPV18 transcription seems to depend upon the assembly of an enhanceosome containing multiple cellular factors recruited by a core sequence interacting with AP1 and HMG-I(Y).
Atomic structure of the Y complex of the nuclear pore
Kelley, Kotaro; Knockenhauer, Kevin E.; Kabachinski, Greg; ...
2015-03-30
The nuclear pore complex (NPC) is the principal gateway for transport into and out of the nucleus. Selectivity is achieved through the hydrogel-like core of the NPC. The structural integrity of the NPC depends on ~15 architectural proteins, which are organized in distinct subcomplexes to form the >40-MDa ring-like structure. In this paper, we present the 4.1-Å crystal structure of a heterotetrameric core element ('hub') of the Y complex, the essential NPC building block, from Myceliophthora thermophila. Using the hub structure together with known Y-complex fragments, we built the entire ~0.5-MDa Y complex. Our data reveal that the conserved coremore » of the Y complex has six rather than seven members. Finally, evolutionarily distant Y-complex assemblies share a conserved core that is very similar in shape and dimension, thus suggesting that there are closely related architectural codes for constructing the NPC in all eukaryotes.« less
New Scheduling Algorithms for Agile All-Photonic Networks
NASA Astrophysics Data System (ADS)
Mehri, Mohammad Saleh; Ghaffarpour Rahbar, Akbar
2017-12-01
An optical overlaid star network is a class of agile all-photonic networks that consists of one or more core node(s) at the center of the star network and a number of edge nodes around the core node. In this architecture, a core node may use a scheduling algorithm for transmission of traffic through the network. A core node is responsible for scheduling optical packets that arrive from edge nodes and switching them toward their destinations. Nowadays, most edge nodes use virtual output queue (VOQ) architecture for buffering client packets to achieve high throughput. This paper presents two efficient scheduling algorithms called discretionary iterative matching (DIM) and adaptive DIM. These schedulers find maximum matching in a small number of iterations and provide high throughput and incur low delay. The number of arbiters in these schedulers and the number of messages exchanged between inputs and outputs of a core node are reduced. We show that DIM and adaptive DIM can provide better performance in comparison with iterative round-robin matching with SLIP (iSLIP). SLIP means the act of sliding for a short distance to select one of the requested connections based on the scheduling algorithm.
Real Time Monitor of Grid job executions
NASA Astrophysics Data System (ADS)
Colling, D. J.; Martyniak, J.; McGough, A. S.; Křenek, A.; Sitera, J.; Mulač, M.; Dvořák, F.
2010-04-01
In this paper we describe the architecture and operation of the Real Time Monitor (RTM), developed by the Grid team in the HEP group at Imperial College London. This is arguably the most popular dissemination tool within the EGEE [1] Grid. Having been used, on many occasions including GridFest and LHC inauguration events held at CERN in October 2008. The RTM gathers information from EGEE sites hosting Logging and Bookkeeping (LB) services. Information is cached locally at a dedicated server at Imperial College London and made available for clients to use in near real time. The system consists of three main components: the RTM server, enquirer and an apache Web Server which is queried by clients. The RTM server queries the LB servers at fixed time intervals, collecting job related information and storing this in a local database. Job related data includes not only job state (i.e. Scheduled, Waiting, Running or Done) along with timing information but also other attributes such as Virtual Organization and Computing Element (CE) queue - if known. The job data stored in the RTM database is read by the enquirer every minute and converted to an XML format which is stored on a Web Server. This decouples the RTM server database from the client removing the bottleneck problem caused by many clients simultaneously accessing the database. This information can be visualized through either a 2D or 3D Java based client with live job data either being overlaid on to a 2 dimensional map of the world or rendered in 3 dimensions over a globe map using OpenGL.
Vinther, Joachim M; Nielsen, Anders B; Bjerring, Morten; van Eck, Ernst R H; Kentgens, Arno P M; Khaneja, Navin; Nielsen, Niels Chr
2012-12-07
A novel strategy for heteronuclear dipolar decoupling in magic-angle spinning solid-state nuclear magnetic resonance (NMR) spectroscopy is presented, which eliminates residual static high-order terms in the effective Hamiltonian originating from interactions between oscillating dipolar and anisotropic shielding tensors. The method, called refocused continuous-wave (rCW) decoupling, is systematically established by interleaving continuous wave decoupling with appropriately inserted rotor-synchronized high-power π refocusing pulses of alternating phases. The effect of the refocusing pulses in eliminating residual effects from dipolar coupling in heteronuclear spin systems is rationalized by effective Hamiltonian calculations to third order. In some variants the π pulse refocusing is supplemented by insertion of rotor-synchronized π/2 purging pulses to further reduce the residual dipolar coupling effects. Five different rCW decoupling sequences are presented and their performance is compared to state-of-the-art decoupling methods. The rCW decoupling sequences benefit from extreme broadbandedness, tolerance towards rf inhomogeneity, and improved potential for decoupling at relatively low average rf field strengths. In numerical simulations, the rCW schemes clearly reveal superior characteristics relative to the best decoupling schemes presented so far, which we to some extent also are capable of demonstrating experimentally. A major advantage of the rCW decoupling methods is that they are easy to set up and optimize experimentally.
GPU-completeness: theory and implications
NASA Astrophysics Data System (ADS)
Lin, I.-Jong
2011-01-01
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
DDGIPS: a general image processing system in robot vision
NASA Astrophysics Data System (ADS)
Tian, Yuan; Ying, Jun; Ye, Xiuqing; Gu, Weikang
2000-10-01
Real-Time Image Processing is the key work in robot vision. With the limitation of the hardware technique, many algorithm-oriented firmware systems were designed in the past. But their architectures were not flexible enough to achieve a multi-algorithm development system. Because of the rapid development of microelectronics technique, many high performance DSP chips and high density FPGA chips have come to life, and this makes it possible to construct a more flexible architecture in real-time image processing system. In this paper, a Double DSP General Image Processing System (DDGIPS) is concerned. We try to construct a two-DSP-based FPGA-computational system with two TMS320C6201s. The TMS320C6x devices are fixed-point processors based on the advanced VLIW CPU, which has eight functional units, including two multipliers and six arithmetic logic units. These features make C6x a good candidate for a general purpose system. In our system, the two TMS320C6201s each has a local memory space, and they also have a shared system memory space which enables them to intercommunicate and exchange data efficiently. At the same time, they can be directly inter-connected in star-shaped architecture. All of these are under the control of a FPGA group. As the core of the system, FPGA plays a very important role: it takes charge of DPS control, DSP communication, memory space access arbitration and the communication between the system and the host machine. And taking advantage of reconfiguring FPGA, all of the interconnection between the two DSP or between DSP and FPGA can be changed. In this way, users can easily rebuild the real-time image processing system according to the data stream and the task of the application and gain great flexibility.
DDGIPS: a general image processing system in robot vision
NASA Astrophysics Data System (ADS)
Tian, Yuan; Ying, Jun; Ye, Xiuqing; Gu, Weikang
2000-10-01
Real-Time Image Processing is the key work in robot vision. With the limitation of the hardware technique, many algorithm-oriented firmware systems were designed in the past. But their architectures were not flexible enough to achieve a multi- algorithm development system. Because of the rapid development of microelectronics technique, many high performance DSP chips and high density FPGA chips have come to life, and this makes it possible to construct a more flexible architecture in real-time image processing system. In this paper, a Double DSP General Image Processing System (DDGIPS) is concerned. We try to construct a two-DSP-based FPGA-computational system with two TMS320C6201s. The TMS320C6x devices are fixed-point processors based on the advanced VLIW CPU, which has eight functional units, including two multipliers and six arithmetic logic units. These features make C6x a good candidate for a general purpose system. In our system, the two TMS320C6210s each has a local memory space, and they also have a shared system memory space which enable them to intercommunicate and exchange data efficiently. At the same time, they can be directly interconnected in star- shaped architecture. All of these are under the control of FPGA group. As the core of the system, FPGA plays a very important role: it takes charge of DPS control, DSP communication, memory space access arbitration and the communication between the system and the host machine. And taking advantage of reconfiguring FPGA, all of the interconnection between the two DSP or between DSP and FPGA can be changed. In this way, users can easily rebuild the real-time image processing system according to the data stream and the task of the application and gain great flexibility.
Zhang, Zhen; Ma, Cheng; Zhu, Rong
2017-08-23
Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
Zhang, Zhen; Zhu, Rong
2017-01-01
Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas. PMID:28832522
Muñoz-Amatriaín, María; Cuesta-Marcos, Alfonso; Endelman, Jeffrey B; Comadran, Jordi; Bonman, John M; Bockelman, Harold E; Chao, Shiaoman; Russell, Joanne; Waugh, Robbie; Hayes, Patrick M; Muehlbauer, Gary J
2014-01-01
New sources of genetic diversity must be incorporated into plant breeding programs if they are to continue increasing grain yield and quality, and tolerance to abiotic and biotic stresses. Germplasm collections provide a source of genetic and phenotypic diversity, but characterization of these resources is required to increase their utility for breeding programs. We used a barley SNP iSelect platform with 7,842 SNPs to genotype 2,417 barley accessions sampled from the USDA National Small Grains Collection of 33,176 accessions. Most of the accessions in this core collection are categorized as landraces or cultivars/breeding lines and were obtained from more than 100 countries. Both STRUCTURE and principal component analysis identified five major subpopulations within the core collection, mainly differentiated by geographical origin and spike row number (an inflorescence architecture trait). Different patterns of linkage disequilibrium (LD) were found across the barley genome and many regions of high LD contained traits involved in domestication and breeding selection. The genotype data were used to define 'mini-core' sets of accessions capturing the majority of the allelic diversity present in the core collection. These 'mini-core' sets can be used for evaluating traits that are difficult or expensive to score. Genome-wide association studies (GWAS) of 'hull cover', 'spike row number', and 'heading date' demonstrate the utility of the core collection for locating genetic factors determining important phenotypes. The GWAS results were referenced to a new barley consensus map containing 5,665 SNPs. Our results demonstrate that GWAS and high-density SNP genotyping are effective tools for plant breeders interested in accessing genetic diversity in large germplasm collections.
Architectural Theory in the Undergraduate Curriculum: A Pedagogical Alternative
ERIC Educational Resources Information Center
Smith, Korydon H.
2013-01-01
The study of architectural theory remains absent from many undergraduate design programs, or, if present, the structure of many curricula place "theory" as an autonomous, peripheral course. Theory, however, as it is in other disciplines, is the foundation of the discipline of architecture. To regain the importance and vitality of…
Dynamical decoupling of unbounded Hamiltonians
NASA Astrophysics Data System (ADS)
Arenz, Christian; Burgarth, Daniel; Facchi, Paolo; Hillier, Robin
2018-03-01
We investigate the possibility to suppress interactions between a finite dimensional system and an infinite dimensional environment through a fast sequence of unitary kicks on the finite dimensional system. This method, called dynamical decoupling, is known to work for bounded interactions, but physical environments such as bosonic heat baths are usually modeled with unbounded interactions; hence, here, we initiate a systematic study of dynamical decoupling for unbounded operators. We develop a sufficient decoupling criterion for arbitrary Hamiltonians and a necessary decoupling criterion for semibounded Hamiltonians. We give examples for unbounded Hamiltonians where decoupling works and the limiting evolution as well as the convergence speed can be explicitly computed. We show that decoupling does not always work for unbounded interactions and we provide both physically and mathematically motivated examples.
Deploying electromagnetic particle-in-cell (EM-PIC) codes on Xeon Phi accelerators boards
NASA Astrophysics Data System (ADS)
Fonseca, Ricardo
2014-10-01
The complexity of the phenomena involved in several relevant plasma physics scenarios, where highly nonlinear and kinetic processes dominate, makes purely theoretical descriptions impossible. Further understanding of these scenarios requires detailed numerical modeling, but fully relativistic particle-in-cell codes such as OSIRIS are computationally intensive. The quest towards Exaflop computer systems has lead to the development of HPC systems based on add-on accelerator cards, such as GPGPUs and more recently the Xeon Phi accelerators that power the current number 1 system in the world. These cards, also referred to as Intel Many Integrated Core Architecture (MIC) offer peak theoretical performances of >1 TFlop/s for general purpose calculations in a single board, and are receiving significant attention as an attractive alternative to CPUs for plasma modeling. In this work we report on our efforts towards the deployment of an EM-PIC code on a Xeon Phi architecture system. We will focus on the parallelization and vectorization strategies followed, and present a detailed performance evaluation of code performance in comparison with the CPU code.
Building Automatic Grading Tools for Basic of Programming Lab in an Academic Institution
NASA Astrophysics Data System (ADS)
Harimurti, Rina; Iwan Nurhidayat, Andi; Asmunin
2018-04-01
The skills of computer programming is a core competency that must be mastered by students majoring in computer sciences. The best way to improve this skill is through the practice of writing many programs to solve various problems from simple to complex. It takes hard work and a long time to check and evaluate the results of student labs one by one, especially if the number of students a lot. Based on these constrain, web proposes Automatic Grading Tools (AGT), the application that can evaluate and deeply check the source code in C, C++. The application architecture consists of students, web-based applications, compilers, and operating systems. Automatic Grading Tools (AGT) is implemented MVC Architecture and using open source software, such as laravel framework version 5.4, PostgreSQL 9.6, Bootstrap 3.3.7, and jquery library. Automatic Grading Tools has also been tested for real problems by submitting source code in C/C++ language and then compiling. The test results show that the AGT application has been running well.
Fault-Tolerant Software-Defined Radio on Manycore
NASA Technical Reports Server (NTRS)
Ricketts, Scott
2015-01-01
Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
GPU Accelerated Vector Median Filter
NASA Technical Reports Server (NTRS)
Aras, Rifat; Shen, Yuzhong
2011-01-01
Noise reduction is an important step for most image processing tasks. For three channel color images, a widely used technique is vector median filter in which color values of pixels are treated as 3-component vectors. Vector median filters are computationally expensive; for a window size of n x n, each of the n(sup 2) vectors has to be compared with other n(sup 2) - 1 vectors in distances. General purpose computation on graphics processing units (GPUs) is the paradigm of utilizing high-performance many-core GPU architectures for computation tasks that are normally handled by CPUs. In this work. NVIDIA's Compute Unified Device Architecture (CUDA) paradigm is used to accelerate vector median filtering. which has to the best of our knowledge never been done before. The performance of GPU accelerated vector median filter is compared to that of the CPU and MPI-based versions for different image and window sizes, Initial findings of the study showed 100x improvement of performance of vector median filter implementation on GPUs over CPU implementations and further speed-up is expected after more extensive optimizations of the GPU algorithm .
A unified heteronuclear decoupling strategy for magic-angle-spinning solid-state NMR spectroscopy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Equbal, Asif; Bjerring, Morten; Nielsen, Niels Chr., E-mail: madhu@tifr.res.in, E-mail: ncn@inano.au.dk
2015-05-14
A unified strategy of two-pulse based heteronuclear decoupling for solid-state magic-angle spinning nuclear magnetic resonance is presented. The analysis presented here shows that different decoupling sequences like two-pulse phase-modulation (TPPM), X-inverse-X (XiX), and finite pulse refocused continuous wave (rCW{sup A}) are basically specific solutions of a more generalized decoupling scheme which incorporates the concept of time-modulation along with phase-modulation. A plethora of other good decoupling conditions apart from the standard, TPPM, XiX, and rCW{sup A} decoupling conditions are available from the unified decoupling approach. The importance of combined time- and phase-modulation in order to achieve the best decoupling conditions ismore » delineated. The consequences of different indirect dipolar interactions arising from cross terms comprising of heteronuclear and homonuclear dipolar coupling terms and also those between heteronuclear dipolar coupling and chemical-shift anisotropy terms are presented in order to unfold the effects of anisotropic interactions under different decoupling conditions. Extensive numerical simulation results are corroborated with experiments on standard amino acids.« less
Position paper: the science of deep specification.
Appel, Andrew W; Beringer, Lennart; Chlipala, Adam; Pierce, Benjamin C; Shao, Zhong; Weirich, Stephanie; Zdancewic, Steve
2017-10-13
We introduce our efforts within the project 'The science of deep specification' to work out the key formal underpinnings of industrial-scale formal specifications of software and hardware components, anticipating a world where large verified systems are routinely built out of smaller verified components that are also used by many other projects. We identify an important class of specification that has already been used in a few experiments that connect strong component-correctness theorems across the work of different teams. To help popularize the unique advantages of that style, we dub it deep specification , and we say that it encompasses specifications that are rich , two-sided , formal and live (terms that we define in the article). Our core team is developing a proof-of-concept system (based on the Coq proof assistant) whose specification and verification work is divided across largely decoupled subteams at our four institutions, encompassing hardware microarchitecture, compilers, operating systems and applications, along with cross-cutting principles and tools for effective specification. We also aim to catalyse interest in the approach, not just by basic researchers but also by users in industry.This article is part of the themed issue 'Verified trustworthy software systems'. © 2017 The Author(s).
Intelligent deflection routing in buffer-less networks.
Haeri, Soroush; Trajković, Ljiljana
2015-02-01
Deflection routing is employed to ameliorate packet loss caused by contention in buffer-less architectures such as optical burst-switched networks. The main goal of deflection routing is to successfully deflect a packet based only on a limited knowledge that network nodes possess about their environment. In this paper, we present a framework that introduces intelligence to deflection routing (iDef). iDef decouples the design of the signaling infrastructure from the underlying learning algorithm. It consists of a signaling and a decision-making module. Signaling module implements a feedback management protocol while the decision-making module implements a reinforcement learning algorithm. We also propose several learning-based deflection routing protocols, implement them in iDef using the ns-3 network simulator, and compare their performance.
Sample Acquisition and Caching architecture for the Mars Sample Return mission
NASA Astrophysics Data System (ADS)
Zacny, K.; Chu, P.; Cohen, J.; Paulsen, G.; Craft, J.; Szwarc, T.
This paper presents a Mars Sample Return (MSR) Sample Acquisition and Caching (SAC) study developed for the three rover platforms: MER, MER+, and MSL. The study took into account 26 SAC requirements provided by the NASA Mars Exploration Program Office. For this SAC architecture, the reduction of mission risk was chosen by us as having greater priority than mass or volume. For this reason, we selected a “ One Bit per Core” approach. The enabling technology for this architecture is Honeybee Robotics' “ eccentric tubes” core breakoff approach. The breakoff approach allows the drill bits to be relatively small in diameter and in turn lightweight. Hence, the bits could be returned to Earth with the cores inside them with only a modest increase to the total returned mass, but a significant decrease in complexity. Having dedicated bits allows a reduction in the number of core transfer steps and actuators. It also alleviates the bit life problem, eliminates cross contamination, and aids in hermetic sealing. An added advantage is faster drilling time, lower power, lower energy, and lower Weight on Bit (which reduces Arm preload requirements). Drill bits are based on the BigTooth bit concept, which allows re-use of the same bit multiple times, if necessary. The proposed SAC consists of a 1) Rotary-Percussive Core Drill, 2) Bit Storage Carousel, 3) Cache, 4) Robotic Arm, and 5) Rock Abrasion and Brushing Bit (RABBit), which is deployed using the Drill. The system also includes PreView bits (for viewing of cores prior to caching) and Powder bits for acquisition of regolith or cuttings. The SAC total system mass is less than 22 kg for MER and MER+ size rovers and less than 32 kg for the MSL-size rover.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
NASA Astrophysics Data System (ADS)
Fu, Liyue; Song, Aiguo
2018-02-01
In order to improve the measurement precision of 6-axis force/torque sensor for robot, BP decoupling algorithm optimized by GA (GA-BP algorithm) is proposed in this paper. The weights and thresholds of a BP neural network with 6-10-6 topology are optimized by GA to develop decouple a six-axis force/torque sensor. By comparison with other traditional decoupling algorithm, calculating the pseudo-inverse matrix of calibration and classical BP algorithm, the decoupling results validate the good decoupling performance of GA-BP algorithm and the coupling errors are reduced.
Electronic structure of CdSe-ZnS 2D nanoplatelets
NASA Astrophysics Data System (ADS)
Cruguel, Hervé; Livache, Clément; Martinez, Bertille; Pedetti, Silvia; Pierucci, Debora; Izquierdo, Eva; Dufour, Marion; Ithurria, Sandrine; Aubin, Hervé; Ouerghi, Abdelkarim; Lacaze, Emmanuelle; Silly, Mathieu G.; Dubertret, Benoit; Lhuillier, Emmanuel
2017-04-01
Among colloidal nanocrystals, 2D nanoplatelets (NPLs) made of cadmium chalcogenides have led to especially well controlled optical features. However, the growth of core shell heterostructures has so far been mostly focused on CdS shells, while more confined materials will be more promising to decouple the emitting quantum states of the core from their external environment. Using k.p simulation, we demonstrate that a ZnS shell reduces by a factor 10 the leakage of the wavefunction into the surrounding medium. Using X-ray photoemission (XPS), we confirm that the CdSe active layer is indeed unoxidized. Finally, we build an effective electronic spectrum for these CdSe/ZnS NPLs on an absolute energy scale which is a critical set of parameters for the future integration of this material into optoelectronic devices. We determine the work function (WF) to be 4.47 eV while the material is behaving as an n-type semiconductor.
Towards Energy-Performance Trade-off Analysis of Parallel Applications
ERIC Educational Resources Information Center
Korthikanti, Vijay Anand Reddy
2011-01-01
Energy consumption by computer systems has emerged as an important concern, both at the level of individual devices (limited battery capacity in mobile systems) and at the societal level (the production of Green House Gases). In parallel architectures, applications may be executed on a variable number of cores and these cores may operate at…
Au@MnO2 core-shell nanomesh electrodes for transparent flexible supercapacitors.
Qiu, Tengfei; Luo, Bin; Giersig, Michael; Akinoglu, Eser Metin; Hao, Long; Wang, Xiangjun; Shi, Lin; Jin, Meihua; Zhi, Linjie
2014-10-29
A novel Au@MnO2 supercapacitor is presented. The sophisticated core-shell architecture combining an Au nanomesh core with a MnO2 shell on a flexible polymeric substrate is demonstrated as an electrode for high performance transparent flexible supercapacitors (TFSCs). Due to their unique structure, high areal/gravimetric capacitance and rate capability for TFSCs are achieved. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
2015-06-01
unit may setup and teardown the entire tactical infrastructure multiple times per day. This tactical network administrator training is a critical...language and runs on Linux and Unix based systems. All provisioning is based around the Nagios Core application, a powerful backend solution for network...start up a large number of virtual machines quickly. CORE supports the simulation of fixed and mobile networks. CORE is open-source, written in Python
Counterflow heat exchanger with core and plenums at both ends
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bejan, A.; Alalaimi, M.; Lorente, S.
2016-04-22
Here, this paper illustrates the morphing of flow architecture toward greater performance in a counterflow heat exchanger. The architecture consists of two plenums with a core of counterflow channels between them. Each stream enters one plenum and then flows in a channel that travels the core and crosses the second plenum. The volume of the heat exchanger is fixed while the volume fraction occupied by each plenum is variable. Performance is driven by two objectives, simultaneously: low flow resistance and low thermal resistance. The analytical and numerical results show that the overall flow resistance is the lowest when the coremore » is absent, and each plenum occupies half of the available volume and is oriented in counterflow with the other plenum. In this configuration, the thermal resistance also reaches its lowest value. These conclusions hold for fully developed laminar flow and turbulent flow through the core. The curve for effectiveness vs number of heat transfer units (N tu) is steeper (when N tu < 1) than the classical curves for counterflow and crossflow.« less
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less
IMPLICATIONS OF RAPID CORE ROTATION IN RED GIANTS FOR INTERNAL ANGULAR MOMENTUM TRANSPORT IN STARS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tayar, Jamie; Pinsonneault, Marc H., E-mail: tayar.1@osu.edu
2013-09-20
Core rotation rates have been measured for red giant stars using asteroseismology. These data, along with helioseismic measurements and open cluster spin-down studies, provide powerful clues about the nature and timescale for internal angular momentum transport in stars. We focus on two cases: the metal-poor red giant KIC 7341231 ({sup O}tto{sup )} and intermediate-mass core helium burning stars. For both, we examine limiting case studies for angular momentum coupling between cores and envelopes under the assumption of rigid rotation on the main sequence. We discuss the expected pattern of core rotation as a function of mass and radius. In themore » case of Otto, strong post-main-sequence coupling is ruled out and the measured core rotation rate is in the range of 23-33 times the surface value expected from standard spin-down models. The minimum coupling timescale (0.17-0.45 Gyr) is significantly longer than that inferred for young open cluster stars. This implies ineffective internal angular momentum transport in early first ascent giants. By contrast, the core rotation rates of evolved secondary clump stars are found to be consistent with strong coupling given their rapid main-sequence rotation. An extrapolation to the white dwarf regime predicts rotation periods between 330 and 0.0052 days, depending on mass and decoupling time. We identify two key ingredients that explain these features: the presence of a convective core and inefficient angular momentum transport in the presence of larger mean molecular weight gradients. Observational tests that can disentangle these effects are discussed.« less
Observational constraints on neutron star crust-core coupling during glitches
NASA Astrophysics Data System (ADS)
Newton, W. G.; Berger, S.; Haskell, B.
2015-12-01
We demonstrate that observations of glitches in the Vela pulsar can be used to investigate the strength of the crust-core coupling in a neutron star and provide a powerful probe of the internal structure of neutron stars. We assume that glitch recovery is dominated by the torque exerted by the mutual friction-mediated recoupling of superfluid components of the core that were decoupled from the crust during the glitch. Then we use the observations of the recoveries from two recent glitches in the Vela pulsar to infer the fraction of the core that is coupled to the crust during the glitch. We then analyse whether crustal neutrons alone are sufficient to drive glitches in the Vela pulsar, taking into account crustal entrainment. We use two sets of neutron star equations of state (EOSs) which span crust and core consistently and cover a conservative range of the slope of the symmetry energy at saturation density 30 < L < 120 MeV. The two sets differ in the stiffness of the high density EOS. We find that for medium to stiff EOSs, observations imply >70 per cent of the moment of inertia of the core is coupled to the crust during the glitch, though for softer EOSs L ≈ 30 MeV as little as 5 per cent could be coupled. We find that only by extending the region where superfluid vortices are strongly pinned into the core by densities at least 0.016 fm-3 above the crust-core transition density does any EOS reproduce the observed glitch activity.
Space Generic Open Avionics Architecture (SGOAA) reference model technical guide
NASA Technical Reports Server (NTRS)
Wray, Richard B.; Stovall, John R.
1993-01-01
This report presents a full description of the Space Generic Open Avionics Architecture (SGOAA). The SGOAA consists of a generic system architecture for the entities in spacecraft avionics, a generic processing architecture, and a six class model of interfaces in a hardware/software system. The purpose of the SGOAA is to provide an umbrella set of requirements for applying the generic architecture interface model to the design of specific avionics hardware/software systems. The SGOAA defines a generic set of system interface points to facilitate identification of critical interfaces and establishes the requirements for applying appropriate low level detailed implementation standards to those interface points. The generic core avionics system and processing architecture models provided herein are robustly tailorable to specific system applications and provide a platform upon which the interface model is to be applied.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nagasaka, Y; Matsuoka, S; Azad, A
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
DecouplingModes: Passive modes amplitudes
NASA Astrophysics Data System (ADS)
Shaw, J. Richard; Lewis, Antony
2018-01-01
DecouplingModes calculates the amplitude of the passive modes, which requires solving the Einstein equations on superhorizon scales sourced by the anisotropic stress from the magnetic fields (prior to neutrino decoupling), and the magnetic and neutrino stress (after decoupling). The code is available as a Mathematica notebook.
Thermal oxidation of nuclear graphite: A large scale waste treatment option.
Theodosiou, Alex; Jones, Abbie N; Marsden, Barry J
2017-01-01
This study has investigated the laboratory scale thermal oxidation of nuclear graphite, as a proof-of-concept for the treatment and decommissioning of reactor cores on a larger industrial scale. If showed to be effective, this technology could have promising international significance with a considerable impact on the nuclear waste management problem currently facing many countries worldwide. The use of thermal treatment of such graphite waste is seen as advantageous since it will decouple the need for an operational Geological Disposal Facility (GDF). Particulate samples of Magnox Reactor Pile Grade-A (PGA) graphite, were oxidised in both air and 60% O2, over the temperature range 400-1200°C. Oxidation rates were found to increase with temperature, with a particular rise between 700-800°C, suggesting a change in oxidation mechanism. A second increase in oxidation rate was observed between 1000-1200°C and was found to correspond to a large increase in the CO/CO2 ratio, as confirmed through gas analysis. Increasing the oxidant flow rate gave a linear increase in oxidation rate, up to a certain point, and maximum rates of 23.3 and 69.6 mg / min for air and 60% O2 respectively were achieved at a flow of 250 ml / min and temperature of 1000°C. These promising results show that large-scale thermal treatment could be a potential option for the decommissioning of graphite cores, although the design of the plant would need careful consideration in order to achieve optimum efficiency and throughput.
Thermal oxidation of nuclear graphite: A large scale waste treatment option
Jones, Abbie N.; Marsden, Barry J.
2017-01-01
This study has investigated the laboratory scale thermal oxidation of nuclear graphite, as a proof-of-concept for the treatment and decommissioning of reactor cores on a larger industrial scale. If showed to be effective, this technology could have promising international significance with a considerable impact on the nuclear waste management problem currently facing many countries worldwide. The use of thermal treatment of such graphite waste is seen as advantageous since it will decouple the need for an operational Geological Disposal Facility (GDF). Particulate samples of Magnox Reactor Pile Grade-A (PGA) graphite, were oxidised in both air and 60% O2, over the temperature range 400–1200°C. Oxidation rates were found to increase with temperature, with a particular rise between 700–800°C, suggesting a change in oxidation mechanism. A second increase in oxidation rate was observed between 1000–1200°C and was found to correspond to a large increase in the CO/CO2 ratio, as confirmed through gas analysis. Increasing the oxidant flow rate gave a linear increase in oxidation rate, up to a certain point, and maximum rates of 23.3 and 69.6 mg / min for air and 60% O2 respectively were achieved at a flow of 250 ml / min and temperature of 1000°C. These promising results show that large-scale thermal treatment could be a potential option for the decommissioning of graphite cores, although the design of the plant would need careful consideration in order to achieve optimum efficiency and throughput. PMID:28793326
Framework for the Parametric System Modeling of Space Exploration Architectures
NASA Technical Reports Server (NTRS)
Komar, David R.; Hoffman, Jim; Olds, Aaron D.; Seal, Mike D., II
2008-01-01
This paper presents a methodology for performing architecture definition and assessment prior to, or during, program formulation that utilizes a centralized, integrated architecture modeling framework operated by a small, core team of general space architects. This framework, known as the Exploration Architecture Model for IN-space and Earth-to-orbit (EXAMINE), enables: 1) a significantly larger fraction of an architecture trade space to be assessed in a given study timeframe; and 2) the complex element-to-element and element-to-system relationships to be quantitatively explored earlier in the design process. Discussion of the methodology advantages and disadvantages with respect to the distributed study team approach typically used within NASA to perform architecture studies is presented along with an overview of EXAMINE s functional components and tools. An example Mars transportation system architecture model is used to demonstrate EXAMINE s capabilities in this paper. However, the framework is generally applicable for exploration architecture modeling with destinations to any celestial body in the solar system.
Porting plasma physics simulation codes to modern computing architectures using the
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Abbott, Stephen
2015-11-01
Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste
Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
His-Tag-Mediated Dimerization of Chemoreceptors Leads to Assembly of Functional Nanoarrays.
Haglin, Elizabeth R; Yang, Wen; Briegel, Ariane; Thompson, Lynmarie K
2017-11-07
Transmembrane chemotaxis receptors are found in bacteria in extended hexagonal arrays stabilized by the membrane and by cytosolic binding partners, the kinase CheA and coupling protein CheW. Models of array architecture and assembly propose receptors cluster into trimers of dimers that associate with one CheA dimer and two CheW monomers to form the minimal "core unit" necessary for signal transduction. Reconstructing in vitro chemoreceptor ternary complexes that are homogeneous and functional and exhibit native architecture remains a challenge. Here we report that His-tag-mediated receptor dimerization with divalent metals is sufficient to drive assembly of nativelike functional arrays of a receptor cytoplasmic fragment. Our results indicate receptor dimerization initiates assembly and precedes formation of ternary complexes with partial kinase activity. Restoration of maximal kinase activity coincides with a shift to larger complexes, suggesting that kinase activity depends on interactions beyond the core unit. We hypothesize that achieving maximal activity requires building core units into hexagons and/or coalescing hexagons into the extended lattice. Overall, the minimally perturbing His-tag-mediated dimerization leads to assembly of chemoreceptor arrays with native architecture and thus serves as a powerful tool for studying the assembly and mechanism of this complex and other multiprotein complexes.
Pi-Sat: A Low Cost Small Satellite and Distributed Spacecraft Mission System Test Platform
NASA Technical Reports Server (NTRS)
Cudmore, Alan
2015-01-01
Current technology and budget trends indicate a shift in satellite architectures from large, expensive single satellite missions, to small, low cost distributed spacecraft missions. At the center of this shift is the SmallSatCubesat architecture. The primary goal of the Pi-Sat project is to create a low cost, and easy to use Distributed Spacecraft Mission (DSM) test bed to facilitate the research and development of next-generation DSM technologies and concepts. This test bed also serves as a realistic software development platform for Small Satellite and Cubesat architectures. The Pi-Sat is based on the popular $35 Raspberry Pi single board computer featuring a 700Mhz ARM processor, 512MB of RAM, a flash memory card, and a wealth of IO options. The Raspberry Pi runs the Linux operating system and can easily run Code 582s Core Flight System flight software architecture. The low cost and high availability of the Raspberry Pi make it an ideal platform for a Distributed Spacecraft Mission and Cubesat software development. The Pi-Sat models currently include a Pi-Sat 1U Cube, a Pi-Sat Wireless Node, and a Pi-Sat Cubesat processor card.The Pi-Sat project takes advantage of many popular trends in the Maker community including low cost electronics, 3d printing, and rapid prototyping in order to provide a realistic platform for flight software testing, training, and technology development. The Pi-Sat has also provided fantastic hands on training opportunities for NASA summer interns and Pathways students.
NASA Astrophysics Data System (ADS)
Li, Ningzhi; Li, Shizhe; Shen, Jun
2017-06-01
In vivo 13C magnetic resonance spectroscopy (MRS) is a unique and effective tool for studying dynamic human brain metabolism and the cycling of neurotransmitters. One of the major technical challenges for in vivo 13C-MRS is the high radio frequency (RF) power necessary for heteronuclear decoupling. In the common practice of in vivo 13C-MRS, alkanyl carbons are detected in the spectra range of 10-65ppm. The amplitude of decoupling pulses has to be significantly greater than the large one-bond 1H-13C scalar coupling (1JCH=125-145 Hz). Two main proton decoupling methods have been developed: broadband stochastic decoupling and coherent composite or adiabatic pulse decoupling (e.g., WALTZ); the latter is widely used because of its efficiency and superb performance under inhomogeneous B1 field. Because the RF power required for proton decoupling increases quadratically with field strength, in vivo 13C-MRS using coherent decoupling is often limited to low magnetic fields (<= 4 Tesla (T)) to keep the local and averaged specific absorption rate (SAR) under the safety guidelines established by the International Electrotechnical Commission (IEC) and the US Food and Drug Administration (FDA). Alternately, carboxylic/amide carbons are coupled to protons via weak long-range 1H-13C scalar couplings, which can be decoupled using low RF power broadband stochastic decoupling. Recently, the carboxylic/amide 13C-MRS technique using low power random RF heteronuclear decoupling was safely applied to human brain studies at 7T. Here, we review the two major decoupling methods and the carboxylic/amide 13C-MRS with low power decoupling strategy. Further decreases in RF power deposition by frequency-domain windowing and time-domain random under-sampling are also discussed. Low RF power decoupling opens the possibility of performing in vivo 13C experiments of human brain at very high magnetic fields (such as 11.7T), where signal-to-noise ratio as well as spatial and temporal spectral resolution are more favorable than lower fields.
Space Generic Open Avionics Architecture (SGOAA) standard specification
NASA Technical Reports Server (NTRS)
Wray, Richard B.; Stovall, John R.
1993-01-01
The purpose of this standard is to provide an umbrella set of requirements for applying the generic architecture interface model to the design of a specific avionics hardware/software system. This standard defines a generic set of system interface points to facilitate identification of critical interfaces and establishes the requirements for applying appropriate low level detailed implementation standards to those interface points. The generic core avionics system and processing architecture models provided herein are robustly tailorable to specific system applications and provide a platform upon which the interface model is to be applied.
NASA Astrophysics Data System (ADS)
Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan
2010-07-01
This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Of spheres and squares: Can Sloterdijk help us rethink the architecture of climate science?
Skrydstrup, Martin
2016-12-01
This article explores how different visions and values of science translate into different architectural shapes. I bring Peter Sloterdijk's 'spherology' to bear on my ethnographic fieldwork at the NEEM ice core base in Greenland, a significant node in the global infrastructure of climate science. I argue that the visual form of the geodesic dome of the camp materializes specific values and visions of this branch of paleoclimate science, which I elaborate vis-a-vis the pragmatic claims of the scientists/designers and the particular architectural history of Danish ice core drilling in Greenland. I argue that this aesthetic history articulates with Buckminster Fuller's ideas of a 'new nature' and 'scalar connections' encapsulated in his geodesic form. Second, I argue that the aesthetic production of space in the camp replicates the modern distinction between science and society, in so far as the lab space is rectangular and the recreational space is spherical. Third, I argue that NEEM scientists and Sloterdijk are essentially engaged in a common project: the scientists work hard to align air bubbles in the cores with atmospheric fluctuations in the hemisphere on the evidentiary terrain of ice, and Sloterdijk attempts to connect micro-uteri with macro-uteri in an attempt to fundamentally rethink space. Fuller's notion of 'Spaceship Earth', appropriated by Sloterdijk in his thinking about anthropogenic climate change, lends itself well to capturing the scalar alignments and the isolated NEEM base - on a mission to save planet Earth. In conclusion, I argue that Sloterdijk's spherology may serve as a point of departure for rethinking the aesthetic grammar of the architecture of science.
Wang, Wenxiu; Huang, Ningsheng; Zhao, Daiqing
2014-01-01
The decoupling elasticity decomposition quantitative model of energy-related carbon emission in Guangdong is established based on the extended Kaya identity and Tapio decoupling model for the first time, to explore the decoupling relationship and its internal mechanism between energy-related carbon emission and economic growth in Guangdong. Main results are as follows. (1) Total production energy-related carbon emissions in Guangdong increase from 4128 × 104 tC in 1995 to 14396 × 104 tC in 2011. Decoupling elasticity values of energy-related carbon emission and economic growth increase from 0.53 in 1996 to 0.85 in 2011, and its decoupling state turns from weak decoupling in 1996–2004 to expansive coupling in 2005–2011. (2) Land economic output and energy intensity are the first inhibiting factor and the first promoting factor to energy-related carbon emission decoupling from economic growth, respectively. The development speeds of land urbanization and population urbanization, especially land urbanization, play decisive roles in the change of total decoupling elasticity values. (3) Guangdong can realize decoupling of energy-related carbon emission from economic growth effectively by adjusting the energy mix and industrial structure, coordinating the development speed of land urbanization and population urbanization effectively, and strengthening the construction of carbon sink. PMID:24782666
Wang, Wenxiu; Kuang, Yaoqiu; Huang, Ningsheng; Zhao, Daiqing
2014-01-01
The decoupling elasticity decomposition quantitative model of energy-related carbon emission in Guangdong is established based on the extended Kaya identity and Tapio decoupling model for the first time, to explore the decoupling relationship and its internal mechanism between energy-related carbon emission and economic growth in Guangdong. Main results are as follows. (1) Total production energy-related carbon emissions in Guangdong increase from 4128 × 10⁴ tC in 1995 to 14396 × 10⁴ tC in 2011. Decoupling elasticity values of energy-related carbon emission and economic growth increase from 0.53 in 1996 to 0.85 in 2011, and its decoupling state turns from weak decoupling in 1996-2004 to expansive coupling in 2005-2011. (2) Land economic output and energy intensity are the first inhibiting factor and the first promoting factor to energy-related carbon emission decoupling from economic growth, respectively. The development speeds of land urbanization and population urbanization, especially land urbanization, play decisive roles in the change of total decoupling elasticity values. (3) Guangdong can realize decoupling of energy-related carbon emission from economic growth effectively by adjusting the energy mix and industrial structure, coordinating the development speed of land urbanization and population urbanization effectively, and strengthening the construction of carbon sink.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
NASA Astrophysics Data System (ADS)
Ahm, Anne-Sofie C.; Bjerrum, Christian J.; Hammarlund, Emma U.
2017-02-01
The Late Ordovician stratigraphic record integrates glacio-eustatic processes, water-column redox conditions and carbon cycle dynamics. This complex stratigraphic record, however, is dominated by deposits from epeiric seas that are susceptible to local physical and chemical processes decoupled from the open ocean. This study contributes a unique deep water basinal perspective to the Late Ordovician (Hirnantian) glacial record and the perturbations in seawater chemistry that may have contributed to the Hirnantian mass extinction event. We analyze recently drilled cores and outcrop samples from the upper Vinini Formation in central Nevada and report combined trace- and major element geochemistry, Fe speciation (FePy /FeHR and FeHR /FeT), and stable isotope chemostratigraphy (δ13COrg and δ34SPy). Measurements of paired samples from outcrop and core reveal that reactive Fe is preserved mainly as pyrite in core samples, while outcrop samples have been significantly altered as pyrite has been oxidized and remobilized by modern weathering processes. Fe speciation in the more pristine core samples indicates persistent deep water anoxia, at least locally through the Late Ordovician, in contrast to the prevailing interpretation of increased Hirnantian water column oxygenation in shallower environments. Deep water redox conditions were likely decoupled from shallower environments by a basinal shift in organic matter export driven by decreasing rates of organic matter degradation and decreasing shelf areas. The variable magnitude in the record of the Hirnantian carbon isotope excursion may be explained by this increased storage of isotopically light carbon in the deep ocean which, in combination with increased glacio-eustatic restriction, would strengthen lateral- and vertical gradients in seawater chemistry. We adopt multivariate statistical methods to deconstruct the spatial and temporal re-organization of seawater chemistry during the Hirnantian glaciation and attempt to isolate the latent magnitude and global perturbation in the carbon cycle. We speculate, using a two component mixing model and residual estimates from principal component analysis, that the secular open ocean Hirnantian C isotope excursion possibly amounts to only ∼ +1.5‰. Such an increase could be mechanistically driven by the combination of sea-level fall, persistent deep water anoxia, and cooler glacial temperatures that increased the organic carbon burial efficiency in the deeper basins.
Collaboration pathway(s) using new tools for optimizing `operational' climate monitoring from space
NASA Astrophysics Data System (ADS)
Helmuth, Douglas B.; Selva, Daniel; Dwyer, Morgan M.
2015-09-01
Consistently collecting the earth's climate signatures remains a priority for world governments and international scientific organizations. Architecting a long term solution requires transforming scientific missions into an optimized robust `operational' constellation that addresses the collective needs of policy makers, scientific communities and global academic users for trusted data. The application of new tools offers pathways for global architecture collaboration. Recent rule-based expert system (RBES) optimization modeling of the intended NPOESS architecture becomes a surrogate for global operational climate monitoring architecture(s). These rulebased systems tools provide valuable insight for global climate architectures, by comparison/evaluation of alternatives and the sheer range of trade space explored. Optimization of climate monitoring architecture(s) for a partial list of ECV (essential climate variables) is explored and described in detail with dialogue on appropriate rule-based valuations. These optimization tool(s) suggest global collaboration advantages and elicit responses from the audience and climate science community. This paper will focus on recent research exploring joint requirement implications of the high profile NPOESS architecture and extends the research and tools to optimization for a climate centric case study. This reflects work from SPIE RS Conferences 2013 and 2014, abridged for simplification30, 32. First, the heavily securitized NPOESS architecture; inspired the recent research question - was Complexity (as a cost/risk factor) overlooked when considering the benefits of aggregating different missions into a single platform. Now years later a complete reversal; should agencies considering Disaggregation as the answer. We'll discuss what some academic research suggests. Second, using the GCOS requirements of earth climate observations via ECV (essential climate variables) many collected from space-based sensors; and accepting their definitions of global coverages intended to ensure the needs of major global and international organizations (UNFCCC and IPCC) are met as a core objective. Consider how new optimization tools like rule-based engines (RBES) offer alternative methods of evaluating collaborative architectures and constellations? What would the trade space of optimized operational climate monitoring architectures of ECV look like? Third, using the RBES tool kit (2014) demonstrate with application to a climate centric rule-based decision engine - optimizing architectural trades of earth observation satellite systems, allowing comparison(s) to existing architectures and gaining insights for global collaborative architectures. How difficult is it to pull together an optimized climate case study - utilizing for example 12 climate based instruments on multiple existing platforms and nominal handful of orbits; for best cost and performance benefits against the collection requirements of representative set of ECV. How much effort and resources would an organization expect to invest to realize these analysis and utility benefits?
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
A framework for grand scale parallelization of the combined finite discrete element method in 2d
NASA Astrophysics Data System (ADS)
Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.
2014-09-01
Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
An expanding universe of circadian networks in higher plants.
Pruneda-Paz, Jose L; Kay, Steve A
2010-05-01
Extensive circadian clock networks regulate almost every biological process in plants. Clock-controlled physiological responses are coupled with daily oscillations in environmental conditions resulting in enhanced fitness and growth vigor. Identification of core clock components and their associated molecular interactions has established the basic network architecture of plant clocks, which consists of multiple interlocked feedback loops. A hierarchical structure of transcriptional feedback overlaid with regulated protein turnover sets the pace of the clock and ultimately drives all clock-controlled processes. Although originally described as linear entities, increasing evidence suggests that many signaling pathways can act as both inputs and outputs within the overall network. Future studies will determine the molecular mechanisms involved in these complex regulatory loops. 2010 Elsevier Ltd. All rights reserved.
Earth-Base: A Free And Open Source, RESTful Earth Sciences Platform
NASA Astrophysics Data System (ADS)
Kishor, P.; Heim, N. A.; Peters, S. E.; McClennen, M.
2012-12-01
This presentation describes the motivation, concept, and architecture behind Earth-Base, a web-based, RESTful data-management, analysis and visualization platform for earth sciences data. Traditionally web applications have been built directly accessing data from a database using a scripting language. While such applications are great at bring results to a wide audience, they are limited in scope to the imagination and capabilities of the application developer. Earth-Base decouples the data store from the web application by introducing an intermediate "data application" tier. The data application's job is to query the data store using self-documented, RESTful URIs, and send the results back formatted as JavaScript Object Notation (JSON). Decoupling the data store from the application allows virtually limitless flexibility in developing applications, both web-based for human consumption or programmatic for machine consumption. It also allows outside developers to use the data in their own applications, potentially creating applications that the original data creator and app developer may not have even thought of. Standardized specifications for URI-based querying and JSON-formatted results make querying and developing applications easy. URI-based querying also allows utilizing distributed datasets easily. Companion mechanisms for querying data snapshots aka time-travel, usage tracking and license management, and verification of semantic equivalence of data are also described. The latter promotes the "What You Expect Is What You Get" (WYEIWYG) principle that can aid in data citation and verification.
Multiple core computer processor with globally-accessible local memories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shalf, John; Donofrio, David; Oliker, Leonid
A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
Lemieux, Robert P
2004-01-01
By virtue of its spontaneous polarization (PS), a ferroelectric SmC* liquid crystal can be switched between two states corresponding to opposite molecular tilt orientations using an electric field, thus producing an ON-OFF light shutter between crossed polarizers. Considerable efforts have been made over the past decade to develop photonic FLC light shutters because of their potential uses in dynamic holography and optical data storage. The ON-OFF switching of a FLC light shutter can be triggered by light via a photoinversion of PS using a photochromic dopant. The spontaneous polarization is a chiral bulk property that can be left-handed (negative) or right-handed (positive), depending on the absolute configuration of the chiral component of the SmC* phase. In the approach described herein, the magnitude of PS is modulated via the photoisomerization of a chiral thioindigo dopant that undergoes a large increase in transverse dipole moment upon trans-cis photoisomerization. The sign of PS is photoinverted using an "ambidextrous" thioindigo dopant containing a chiral 2-octyloxy side chain that is coupled to the thioindigo core and induces a positive PS, and a chiral 2,3-difluorooctyloxy side chain that is decoupled from the core and induces a negative PS. In the trans form, the 2,3-difluorooctyloxy side chain predominates and the net PS induced by the dopant is negative. However, upon trans-cis-photoisomerization, the increase in transverse dipole moment of the 2-octyloxy/thioindigo unit raises its induced PS over that of the decoupled 2,3-difluorooctyloxy side chain, and thus inverts the net sign of PS induced by the dopant from negative to positive. Copyright 2004 The Japan Chemical Journal Forum and Wiley Periodicals, Inc.
EVIDENCE FOR CLUSTER TO CLUSTER VARIATIONS IN LOW-MASS STELLAR ROTATIONAL EVOLUTION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coker, Carl T.; Pinsonneault, Marc; Terndrup, Donald M., E-mail: coker@astronomy.ohio-state.edu, E-mail: pinsono@astronomy.ohio-state.edu, E-mail: terndrup@astronomy.ohio-state.edu
2016-12-10
The concordance model for angular momentum evolution postulates that star-forming regions and clusters are an evolutionary sequence that can be modeled with assumptions about protostar–disk coupling, angular momentum loss from magnetized winds that saturates in a mass-dependent fashion at high rotation rates, and core-envelope decoupling for solar analogs. We test this approach by combining established data with the large h Per data set from the MONITOR project and new low-mass Pleiades data. We confirm prior results that young low-mass stars can be used to test star–disk coupling and angular momentum loss independent of the treatment of internal angular momentum transport.more » For slow rotators, we confirm the need for star–disk interactions to evolve the ONC to older systems, using h Per (age 13 Myr) as our natural post-disk case. There is no evidence for extremely long-lived disks as an alternative to core-envelope decoupling. However, our wind models cannot evolve rapid rotators from h Per to older systems consistently, and we find that this result is robust with respect to the choice of angular momentum loss prescription. We outline two possible solutions: either there is cosmic variance in the distribution of stellar rotation rates in different clusters or there are substantially enhanced torques in low-mass rapid rotators. We favor the former explanation and discuss observational tests that could be used to distinguish them. If the distribution of initial conditions depends on environment, models that test parameters by assuming a universal underlying distribution of initial conditions will need to be re-evaluated.« less
Surface Buildup Scenarios and Outpost Architectures for Lunar Exploration
NASA Technical Reports Server (NTRS)
Mazanek, Daniel D.; Troutman, Patrick A.; Culbert, Christopher J.; Leonard, Matthew J.; Spexarth, Gary R.
2009-01-01
The Constellation Program Architecture Team and the Lunar Surface Systems Project Office have developed an initial set of lunar surface buildup scenarios and associated polar outpost architectures, along with preliminary supporting element and system designs in support of NASA's Exploration Strategy. The surface scenarios are structured in such a way that outpost assembly can be suspended at any time to accommodate delivery contingencies or changes in mission emphasis. The modular nature of the architectures mitigates the impact of the loss of any one element and enhances the ability of international and commercial partners to contribute elements and systems. Additionally, the core lunar surface system technologies and outpost operations concepts are applicable to future Mars exploration. These buildup scenarios provide a point of departure for future trades and assessments of alternative architectures and surface elements.
Yang, Yunpeng; Zhang, Lu; Huang, He; Yang, Chen; Yang, Sheng; Gu, Yang; Jiang, Weihong
2017-01-24
Catabolite control protein A (CcpA) is the master regulator in Gram-positive bacteria that mediates carbon catabolite repression (CCR) and carbon catabolite activation (CCA), two fundamental regulatory mechanisms that enable competitive advantages in carbon catabolism. It is generally regarded that CcpA exerts its regulatory role by binding to a typical 14- to 16-nucleotide (nt) consensus site that is called a catabolite response element (cre) within the target regions. However, here we report a previously unknown noncanonical flexible architecture of the CcpA-binding site in solventogenic clostridia, providing new mechanistic insights into catabolite regulation. This novel CcpA-binding site, named cre var , has a unique architecture that consists of two inverted repeats and an intervening spacer, all of which are variable in nucleotide composition and length, except for a 6-bp core palindromic sequence (TGTAAA/TTTACA). It was found that the length of the intervening spacer of cre var can affect CcpA binding affinity, and moreover, the core palindromic sequence of cre var is the key structure for regulation. Such a variable architecture of cre var shows potential importance for CcpA's diverse and fine regulation. A total of 103 potential cre var sites were discovered in solventogenic Clostridium acetobutylicum, of which 42 sites were picked out for electrophoretic mobility shift assays (EMSAs), and 30 sites were confirmed to be bound by CcpA. These 30 cre var sites are associated with 27 genes involved in many important pathways. Also of significance, the cre var sites are found to be widespread and function in a great number of taxonomically different Gram-positive bacteria, including pathogens, suggesting their global role in Gram-positive bacteria. In Gram-positive bacteria, the global regulator CcpA controls a large number of important physiological and metabolic processes. Although a typical consensus CcpA-binding site, cre, has been identified, it remains poorly explored for the diversity of CcpA-mediated catabolite regulation. Here, we discovered a novel flexible CcpA-binding site architecture (cre var ) that is highly variable in both length and base composition but follows certain principles, providing new insights into how CcpA can differentially recognize a variety of target genes to form a complicated regulatory network. A comprehensive search further revealed the wide distribution of cre var sites in Gram-positive bacteria, indicating it may have a universal function. This finding is the first to characterize such a highly flexible transcription factor-binding site architecture, which would be valuable for deeper understanding of CcpA-mediated global catabolite regulation in bacteria. Copyright © 2017 Yang et al.
Michael R. Willig; Christopher P. Bloch; Steven J. Presley
2014-01-01
Climate-induced disturbances such as hurricanes affect the structure and functioning of many ecosystems, especially those in the Caribbean Basin, where effects are well documented with regard to biodiversity and biogeochemical dynamics. Because climate change will likely alter the frequency or intensity of such storms, it is increasingly important to understand the...
Decoupling Policy and Practice: How Life Scientists Respond to Ethics Education
ERIC Educational Resources Information Center
Smith-Doerr, Laurel
2008-01-01
Many graduate programmes in science now require courses in ethics. However, little is known about their reception or use. Using websites and interviews, this essay examines ethics requirements in the field of biosciences in three countries (the United States of America, the United Kingdom, and Italy) between 2000 and 2005. Evidence suggests that…
Local atmospheric decoupling in complex topography alters climate change impacts
Christopher Daly; David R. Conklin; Michael H. Unsworth
2009-01-01
Cold air drainage and pooling occur in many mountain valleys, especially at night and during winter. Local climate regimes associated with frequent cold air pooling have substantial impacts on species phenology, distribution, and diversity. However, little is known about how the degree and frequency of cold air drainage and pooling will respond to a changing climate....
Importance of acoustic shielding in sonochemistry.
van Iersel, Maikel M; Benes, Nieck E; Keurentjes, Jos T F
2008-04-01
It is well known that sonochemistry is less efficient at high acoustic intensities. Many authors have attributed this effect to decoupling losses and shielding of the acoustic wave. In this study we investigate both phenomena for a 20 kHz ultrasound field with an intensity ranging from 40 to 150 W/cm2. Visualization of the bubble cloud has demonstrated that the void fraction below the ultrasound horn increases more than proportional with increasing power input. Nevertheless, the energy coupling between the horn and the liquid remains constant; this implies that decoupling losses are not reinforced for larger bubble clouds. On the contrary, microphone measurements have shown that due to the larger bubble cloud a substantial part of the supplied energy is lost at high power inputs. In striving towards more efficient sonochemistry, reduction of shielding appears as one of the major challenges.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
NASA Technical Reports Server (NTRS)
Lee, H.-W.; Lam, K. S.; Devries, P. L.; George, T. F.
1980-01-01
A new semiclassical decoupling scheme (the trajectory-based decoupling scheme) is introduced in a computational study of vibrational-to-electronic energy transfer for a simple model system that simulates collinear atom-diatom collisions. The probability of energy transfer (P) is calculated quasiclassically using the new scheme as well as quantum mechanically as a function of the atomic electronic-energy separation (lambda), with overall good agreement between the two sets of results. Classical mechanics with the new decoupling scheme is found to be capable of predicting resonance behavior whereas an earlier decoupling scheme (the coordinate-based decoupling scheme) failed. Interference effects are not exhibited in P vs lambda results.
Is propensity to obesity associated with the diurnal pattern of core body temperature?
Hynd, P I; Czerwinski, V H; McWhorter, T J
2014-02-01
Obesity affects more than half a billion people worldwide, but the underlying causes remain unresolved. It has been proposed that propensity to obesity may be associated with differences between individuals in metabolic efficiency and in the energy used for homeothermy. It has also been suggested that obese-prone individuals differ in their responsiveness to circadian rhythms. We investigated both these hypotheses by measuring the core body temperature at regular and frequent intervals over a diurnal cycle, using indigestible temperature loggers in two breeds of canines known to differ in propensity to obesity, but prior to divergence in fatness. Greyhounds (obesity-resistant) and Labradors (obesity-prone) were fed indigestible temperature loggers. Gastrointestinal temperature was recorded at 10-min intervals for the period of transit of the logger. Diet, body condition score, activity level and environment were similar for both groups. Energy digestibility was also measured. The mean core body temperature in obesity-resistant dogs (38.27 °C) was slightly higher (P<0.001) than in obesity-prone dogs (38.18 °C) and the former had a greater variation (P<0.001) in 24h circadian core temperature. There were no differences in diet digestibility. Canines differing in propensity to obesity, but prior to its onset, differed little in mean core temperature, supporting similar findings in already-obese and lean humans. Obese-prone dogs were less variable in daily core temperature fluctuations, suggestive of a degree of circadian decoupling.
Guo, Lifang; Tian, Minggang; Feng, Ruiqing; Zhang, Ge; Zhang, Ruoyao; Li, Xuechen; Liu, Zhiqiang; He, Xiuquan; Sun, Jing Zhi; Yu, Xiaoqiang
2018-04-04
Lipid droplets (LDs) with unique interfacial architecture not only play crucial roles in protecting a cell from lipotoxicity and lipoapoptosis but also closely relate with many diseases such as fatty liver and diabetes. Thus, as one of the important applied biomaterials, fluorescent probes with ultrahigh selectivity for in situ and high-fidelity imaging of LDs in living cells and tissues are critical to elucidate relevant physiological and pathological events as well as detect related diseases. However, available probes only utilizing LDs' waterless neutral cores but ignoring the unique phospholipid monolayer interfaces exhibit low selectivity. They cannot differentiate neutral cores of LDs from intracellular other lipophilic microenvironments, which results in extensively cloud-like background noise and severely limited their bioapplications. Herein, to design LD probes with ultrahigh selectivity, the exceptional interfacial architecture of LDs is considered adequately and thus an interface-targeting strategy is proposed for the first time. According to the novel strategy, we have developed two amphipathic fluorescent probes (N-Cy and N-Py) by introducing different cations into a lipophilic fluorophore (nitrobenzoxadiazole (NBD)). Consequently, their cationic moiety precisely locates the interfaces through electrostatic interaction and simultaneously NBD entirely embeds into the waterless core via hydrophobic interaction. Thus, high-fidelity and background-free fluorescence imaging of LDs are expectably realized in living cells in situ. Moreover, LDs in turbid tissues like skeletal muscle slices have been clearly imaged (up to 82 μm depth) by a two-photon microscope. Importantly, using N-Cy, we not only intuitively monitored the variations of LDs in number, size, and morphology but also clearly revealed their abnormity in hepatic tissues resulting from fatty liver. Therefore, these unique probes provide excellent imaging tools for elucidating LD-related physiological and pathological processes and the interface-targeting strategy possesses universal significance for designing probes with ultrahigh selectivity.
NASA Astrophysics Data System (ADS)
Abraham, Ann Rose; Raneesh, B.; Das, Dipankar; Oluwafemi, Oluwatobi Samuel; Thomas, Sabu; Kalarikkal, Nandakumar
2018-04-01
The electric field control of magnetism in multiferroics is attractive for the realization of ultra-fast and miniaturized low power device applications like nonvolatile memories. Room temperature hybrid multiferroic heterostructures with core-shell (0-0) architecture (ferrite core and ferroelectric shell) were developed via a two-step method. High-Resolution Transmission Electron Microscopy (HRTEM) images confirm the core-shell structure. The temperature dependant magnetization measurements and Mossbauer spectra reveal superparamagnetic nature of the core-shell sample. The ferroelectric hysteresis loops reveal leaky nature of the samples. The results indicate the promising applications of the samples for magneto-electric memories and spintronics.
NASA Astrophysics Data System (ADS)
Pignol, C.; Arnaud, F.; Godinho, E.; Galabertier, B.; Caillo, A.; Billy, I.; Augustin, L.; Calzas, M.; Rousseau, D. D.; Crosta, X.
2016-12-01
Managing scientific data is probably one the most challenging issues in modern science. In plaeosciences the question is made even more sensitive with the need of preserving and managing high value fragile geological samples: cores. Large international scientific programs, such as IODP or ICDP led intense effort to solve this problem and proposed detailed high standard work- and dataflows thorough core handling and curating. However many paleoscience results derived from small-scale research programs in which data and sample management is too often managed only locally - when it is… In this paper we present a national effort leads in France to develop an integrated system to curate ice and sediment cores. Under the umbrella of the national excellence equipment program CLIMCOR, we launched a reflexion about core curating and the management of associated fieldwork data. Our aim was then to conserve all data from fieldwork in an integrated cyber-environment which will evolve toward laboratory-acquired data storage in a near future. To do so, our demarche was conducted through an intimate relationship with field operators as well laboratory core curators in order to propose user-oriented solutions. The national core curating initiative proposes a single web portal in which all teams can store their fieldwork data. This portal is used as a national hub to attribute IGSNs. For legacy samples, this requires the establishment of a dedicated core list with associated metadata. However, for forthcoming core data, we developed a mobile application to capture technical and scientific data directly on the field. This application is linked with a unique coring-tools library and is adapted to most coring devices (gravity, drilling, percussion etc.) including multiple sections and holes coring operations. Those field data can be uploaded automatically to the national portal, but also referenced through international standards (IGSN and INSPIRE) and displayed in international portals (currently, NOAA's IMLGS). In this paper, we present the architecture of the integrated system, future perspectives and the approach we adopted to reach our goals. We will also present our mobile application through didactic examples.
NASA Astrophysics Data System (ADS)
Morandage, Shehan; Schnepf, Andrea; Vanderborght, Jan; Javaux, Mathieu; Leitner, Daniel; Laloy, Eric; Vereecken, Harry
2017-04-01
Root traits are increasingly important in breading of new crop varieties. E.g., longer and fewer lateral roots are suggested to improve drought resistance of wheat. Thus, detailed root architectural parameters are important. However, classical field sampling of roots only provides more aggregated information such as root length density (coring), root counts per area (trenches) or root arrival curves at certain depths (rhizotubes). We investigate the possibility of obtaining the information about root system architecture of plants using field based classical root sampling schemes, based on sensitivity analysis and inverse parameter estimation. This methodology was developed based on a virtual experiment where a root architectural model was used to simulate root system development in a field, parameterized for winter wheat. This information provided the ground truth which is normally unknown in a real field experiment. The three sampling schemes coring, trenching, and rhizotubes where virtually applied to and aggregated information computed. Morris OAT global sensitivity analysis method was then performed to determine the most sensitive parameters of root architecture model for the three different sampling methods. The estimated means and the standard deviation of elementary effects of a total number of 37 parameters were evaluated. Upper and lower bounds of the parameters were obtained based on literature and published data of winter wheat root architectural parameters. Root length density profiles of coring, arrival curve characteristics observed in rhizotubes, and root counts in grids of trench profile method were evaluated statistically to investigate the influence of each parameter using five different error functions. Number of branches, insertion angle inter-nodal distance, and elongation rates are the most sensitive parameters and the parameter sensitivity varies slightly with the depth. Most parameters and their interaction with the other parameters show highly nonlinear effect to the model output. The most sensitive parameters will be subject to inverse estimation from the virtual field sampling data using DREAMzs algorithm. The estimated parameters can then be compared with the ground truth in order to determine the suitability of the sampling schemes to identify specific traits or parameters of the root growth model.
Yuan, Yuliang; Wang, Weicheng; Yang, Jie; Tang, Haichao; Ye, Zhizhen; Zeng, Yujia; Lu, Jianguo
2017-10-10
Design of new materials with sophisticated nanostructure has been proven to be an efficient strategy to improve their properties in many applications. Herein, we demonstrate the successful combination of high electron conductive materials of NiCo 2 O 4 with high capacitance materials of MnMoO 4 by forming a core-shell nanostructure. The NiCo 2 O 4 @MnMoO 4 core-shell nanoarrays (CSNAs) electrode possesses high capacitance of 1169 F g -1 (4.24 F cm -2 ) at a current density of 2.5 mA cm -2 , obviously larger than the pristine NiCo 2 O 4 electrode. The asymmetric supercapacitors (ASCs), assembled with NiCo 2 O 4 @MnMoO 4 CSNAs as binder-free cathode and active carbon (AC) as anode, exhibit high energy density of 15 Wh kg -1 and high power density of 6734 W kg -1 . Cycle performance of NiCo 2 O 4 @MnMoO 4 CSNAs//AC ASCs, conducted at current density of 20 mA cm -2 , remain 96.45% of the initial capacitance after 10,000 cycles, demonstrating its excellent long-term cycle stability. Kinetically decoupled analysis reveals that the capacitive capacitance is dominant in the total capacitance of NiCo 2 O 4 @MnMoO 4 CSNAs electrode, which may be the reason for ultra long cycle stability of ASCs. Our assembled button ASC can easily light up a red LED for 30 min and a green LED for 10 min after being charged for 30 s. The remarkable electrochemical performance of NiCo 2 O 4 @MnMoO 4 CSNAs//AC ASCs is attributed to its enhanced surface area, abundant electroactive sites, facile electrolyte infiltration into the 3D NiCo 2 O 4 @MnMnO 4 nanoarrays and fast electron and ion transport path.
Organic Dots Based on AIEgens for Two-Photon Fluorescence Bioimaging.
Lou, Xiaoding; Zhao, Zujin; Tang, Ben Zhong
2016-12-01
Two-photon fluorescence imaging technique is a powerful bioanalytical approach in terms of high photostability, low photodamage, high spatiotemporal resolution. Recently, fluorescent organic dots comprised of organic emissive cores and a polymeric matrix are emerging as promising contrast reagents for two-photon fluorescence imaging, owing to their numerous merits of high and tunable fluorescence, good biocompatibility, strong photobleaching resistance, and multiple surface functionality. The emissive core is crucial for organic dots to get high brightness but many conventional chromophores often encounter a severe problem of fluorescence quenching when they form aggregates. To solve this problem, fluorogens featuring aggregation-induced emission (AIE) can fluoresce strongly in aggregates, and thus become ideal candidates for fluorescent organic dots. In addition, two-photon absorption property of the dots can be readily improved by just increase loading contents of AIE fluorogen (AIEgen). Hence, organic dots based on AIEgens have exhibited excellent performances in two-photon fluorescence in vitro cellular imaging, and in vivo vascular architecture visualization of mouse skin, muscle, brain and skull bone. In view of the rapid advances in this important research field, here, we highlight representative fluorescent organic dots with an emissive core of AIEgen aggregate, and discuss their great potential in bioimaging applications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Pyrene-Labeled Amphiphiles: Dynamic And Structural Probes Of Membranes And Lipoproteins
NASA Astrophysics Data System (ADS)
Pownall, Henry J.; Homan, Reynold; Massey, John B.
1987-01-01
Lipids and proteins are important functional and structural components of living organisms. Although proteins are frequently found as soluble components of plasma or the cell cytoplasm, many lipids are much less soluble and separate into complex assemblies that usually contain proteins. Cell membranes and plasma lipoproteins' are two important macro-molecular assemblies that contain both lipids and proteins. Cell membranes are composed of a variety of lipids and proteins that form an insoluble bilayer array that has relatively little curvature over distances of several nm. Plasma lipoproteins are different in that they are much smaller, water-soluble, and have highly curved surfaces. A model of a high density lipoprotein (HDL) is shown in Figure 1. This model (d - 10 nm) contains a surface of polar lipids and proteins that surrounds a small core of insoluble lipids, mostly triglycerides and cholesteryl esters. The low density (LDL) (d - 25 nm) and very low density (VLDL) (d 90 nm) lipoproteins have similar architectures, except the former has a cholesteryl ester core and the latter a core that is almost exclusively triglyceride (Figure 1). The surface proteins of HDL are amphiphilic and water soluble; the single protein of LDL is insoluble, whereas VLDL contains both soluble and insoluble proteins. The primary structures of all of these proteins are known.
Experimental evaluation of the impact of packet capturing tools for web services.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choe, Yung Ryn; Mohapatra, Prasant; Chuah, Chen-Nee
Network measurement is a discipline that provides the techniques to collect data that are fundamental to many branches of computer science. While many capturing tools and comparisons have made available in the literature and elsewhere, the impact of these packet capturing tools on existing processes have not been thoroughly studied. While not a concern for collection methods in which dedicated servers are used, many usage scenarios of packet capturing now requires the packet capturing tool to run concurrently with operational processes. In this work we perform experimental evaluations of the performance impact that packet capturing process have on web-based services;more » in particular, we observe the impact on web servers. We find that packet capturing processes indeed impact the performance of web servers, but on a multi-core system the impact varies depending on whether the packet capturing and web hosting processes are co-located or not. In addition, the architecture and behavior of the web server and process scheduling is coupled with the behavior of the packet capturing process, which in turn also affect the web server's performance.« less
Further perspective on the theory of heteronuclear decoupling.
Skinner, Thomas E
2014-11-01
An exact general theory of heteronuclear decoupling is presented for spin-1/2 IS systems. RF irradiation applied to the I spins both modifies and generates additional couplings between states of the system. The recently derived equivalence between the dynamics of any N-level quantum system and a system of classical coupled harmonic oscillators makes explicit the exact physical couplings between states. Decoupling is thus more properly viewed as a complex intercoupling. The sign of antiphase magnetization plays a fundamental role in decoupling. A one-to-one correspondence is demonstrated between ±2SyIz and the sense of the S-spin coupling evolution. Magnetization Sx is refocused to obtain the desired decoupled state when ∫2SyIzdt=0. The exact instantaneous coupling at any time during the decoupling sequence is readily obtained in terms of the system states, showing that the creation of two-spin coherence is crucial for reducing the effective scalar coupling, as required for refocusing to occur. Representative examples from new aperiodic sequences as well as standard cyclic, periodic composite-pulse and adiabatic decoupling sequences illustrate the decoupling mechanism. The more general aperiodic sequences, obtained using optimal control, realize the potential inherent in the theory for significantly improved decoupling. Copyright © 2014 Elsevier Inc. All rights reserved.
Mahmood, Zohaib; McDaniel, Patrick; Guérin, Bastien; Keil, Boris; Vester, Markus; Adalsteinsson, Elfar; Wald, Lawrence L; Daniel, Luca
2016-07-01
In a coupled parallel transmit (pTx) array, the power delivered to a channel is partially distributed to other channels because of coupling. This power is dissipated in circulators resulting in a significant reduction in power efficiency. In this study, a technique for designing robust decoupling matrices interfaced between the RF amplifiers and the coils is proposed. The decoupling matrices ensure that most forward power is delivered to the load without loss of encoding capabilities of the pTx array. The decoupling condition requires that the impedance matrix seen by the power amplifiers is a diagonal matrix whose entries match the characteristic impedance of the power amplifiers. In this work, the impedance matrix of the coupled coils is diagonalized by a successive multiplication by its eigenvectors. A general design procedure and software are developed to generate automatically the hardware that implements diagonalization using passive components. The general design method is demonstrated by decoupling two example parallel transmit arrays. Our decoupling matrices achieve better than -20 db decoupling in both cases. A robust framework for designing decoupling matrices for pTx arrays is presented and validated. The proposed decoupling strategy theoretically scales to any arbitrary number of channels. Magn Reson Med 76:329-339, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Landslide Frequency and Failure Mechanisms at NE Gela Basin (Strait of Sicily)
NASA Astrophysics Data System (ADS)
Kuhlmann, J.; Asioli, A.; Trincardi, F.; Klügel, A.; Huhn, K.
2017-11-01
Despite intense research by both academia and industry, the parameters controlling slope stability at continental margins are often speculated upon. Lack of core recovery and age control on failed sediments prevent the assessment of failure timing/frequency and the role of prefailure architecture as shaped by paleoenvironmental changes. This study uses an integrated chronological framework from two boreholes and complementary ultrahigh-resolution acoustic profiling in order to assess (1) the frequency of submarine landsliding at the continental margin of NE Gela Basin and (2) the associated mechanisms of failure. Accurate age control was achieved through absolute radiocarbon dating and indirect dating relying on isotope stratigraphic and micropaleontological reconstructions. A total of nine major slope failure events have been recognized that occurred within the last 87 kyr ( 10 kyr return frequency), though there is evidence for additional syndepositional, small-scaled transport processes of lower volume. Preferential failure involves translational movement of mudflows along subhorizontal surfaces that are induced by sedimentological changes relating to prefailure stratal architecture. Along with sequence-stratigraphic boundaries reflecting paleoenvironmental fluctuations, recovered core material suggests that intercalated volcaniclastic layers are key to the basal confinement and lateral movement of these events in the study area. Another major predisposing factor is given by rapid loading of fine-grained homogenous strata and successive generation of excess pore pressure, as expressed by several fluid escape structures. Recurrent failure, however, requires repeated generation of favorable conditions, and seismic activity, though low if compared to many other Mediterranean settings, is shown to represent a legitimate trigger mechanism.
The Housing Pattern and Entrepreneurship in Polish Suburban Landscape
NASA Astrophysics Data System (ADS)
Martyniuk-Peczek, Justyna; Peczek, Grzegorz; Martyniuk, Olga
2017-10-01
Housing stimulates the development of SMEs (small and medium enterprises) in the suburbs. The multidisciplinary research in fields of urban planning and economics, carried out by the Authors, confirms this trend. The purpose of this paper is to present the multidisciplinary results of the research on the determinants of SME localization in the suburban areas of Gdansk, Gdynia and Sopot (the Metropolitan Area Gdansk-Gdynia-Sopot - MAGGS). Many of researchers attach great significance to the term of urban sprawl. Most authors agree that this phenomenon is multidimensional. It also varies in the global perspective. The conducted research showed that urban sprawl in Poland had a positive impact on the development of entrepreneurship, leading to a situation when the SME location quotient (LQ) in some suburban areas is higher, in comparison to the core city itself. The communities characterized by an LQ significantly higher than in the core city have been identified by Authors as ‘entrepreneurship nests’. To identify the research problem, a two-pronged research in the fields of urban and architectural design as well as economics was adopted. The charter of suburban landscape was determined by site analysis and through a study of the architectural form. The results confirmed that more than 80% of the parcels, which encompass economic activity, also exhibit a residential function. Our study confirms that urban sprawl, with its characteristic housing patterns, stimulates business activity in the suburbs. According to our results, this phenomenon is not only determined by financial factors, but also results from social and spatial reasons.
A customizable class of colloidal-quantum-dot spasers and plasmonic amplifiers
Kress, Stephan J. P.; Cui, Jian; Rohner, Patrik; Kim, David K.; Antolinez, Felipe V.; Zaininger, Karl-Augustin; Jayanti, Sriharsha V.; Richner, Patrizia; McPeak, Kevin M.; Poulikakos, Dimos; Norris, David J.
2017-01-01
Colloidal quantum dots are robust, efficient, and tunable emitters now used in lighting, displays, and lasers. Consequently, when the spaser—a laser-like source of high-intensity, narrow-band surface plasmons—was first proposed, quantum dots were specified as the ideal plasmonic gain medium for overcoming the significant intrinsic losses of plasmons. Many subsequent spasers, however, have required a single material to simultaneously provide gain and define the plasmonic cavity, a design unable to accommodate quantum dots and other colloidal nanomaterials. In addition, these and other designs have been ill suited for integration with other elements in a larger plasmonic circuit, limiting their use. We develop a more open architecture that decouples the gain medium from the cavity, leading to a versatile class of quantum dot–based spasers that allow controlled generation, extraction, and manipulation of plasmons. We first create aberration-corrected plasmonic cavities with high quality factors at desired locations on an ultrasmooth silver substrate. We then incorporate quantum dots into these cavities via electrohydrodynamic printing or drop-casting. Photoexcitation under ambient conditions generates monochromatic plasmons (0.65-nm linewidth at 630 nm, Q ~ 1000) above threshold. This signal is extracted, directed through an integrated amplifier, and focused at a nearby nanoscale tip, generating intense electromagnetic fields. More generally, our device platform can be straightforwardly deployed at different wavelengths, size scales, and geometries on large-area plasmonic chips for fundamental studies and applications. PMID:28948219
Enjoying Sad Music: Paradox or Parallel Processes?
Schubert, Emery
2016-01-01
Enjoyment of negative emotions in music is seen by many as a paradox. This article argues that the paradox exists because it is difficult to view the process that generates enjoyment as being part of the same system that also generates the subjective negative feeling. Compensation theories explain the paradox as the compensation of a negative emotion by the concomitant presence of one or more positive emotions. But compensation brings us no closer to explaining the paradox because it does not explain how experiencing sadness itself is enjoyed. The solution proposed is that an emotion is determined by three critical processes—labeled motivational action tendency (MAT), subjective feeling (SF) and Appraisal. For many emotions the MAT and SF processes are coupled in valence. For example, happiness has positive MAT and positive SF, annoyance has negative MAT and negative SF. However, it is argued that in an aesthetic context, such as listening to music, emotion processes can become decoupled. The decoupling is controlled by the Appraisal process, which can assess if the context of the sadness is real-life (where coupling occurs) or aesthetic (where decoupling can occur). In an aesthetic context sadness retains its negative SF but the aversive, negative MAT is inhibited, leaving sadness to still be experienced as a negative valanced emotion, while contributing to the overall positive MAT. Individual differences, mood and previous experiences mediate the degree to which the aversive aspects of MAT are inhibited according to this Parallel Processing Hypothesis (PPH). The reason for hesitancy in considering or testing PPH, as well as the preponderance of research on sadness at the exclusion of other negative emotions, are discussed. PMID:27445752
SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases
NASA Astrophysics Data System (ADS)
Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.
One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor
Tayara, Hilal; Ham, Woonchul; Chong, Kil To
2016-01-01
This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation. PMID:27983714
Architecture of human translation initiation factor 3
Querol-Audi, Jordi; Sun, Chaomin; Vogan, Jacob M.; Smith, Duane; Gu, Yu; Cate, Jamie; Nogales, Eva
2013-01-01
SUMMARY Eukaryotic translation initiation factor 3 (eIF3) plays a central role in protein synthesis by organizing the formation of the 43S preinitiation complex. Using genetic tag visualization by electron microscopy, we reveal the molecular organization of ten human eIF3 subunits, including an octameric core. The structure of eIF3 bears a close resemblance to that of the proteasome lid, with a conserved spatial organization of eight core subunits containing PCI and MPN domains that coordinate functional interactions in both complexes. We further show that eIF3 subunits a and c interact with initiation factors eIF1 and eIF1A, which control the stringency of start codon selection. Finally, we find that subunit j, which modulates messenger RNA interactions with the small ribosomal subunit, makes multiple independent interactions with the eIF3 octameric core. These results highlight the conserved architecture of eIF3 and how it scaffolds key factors that control translation initiation in higher eukaryotes, including humans. PMID:23623729
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.
Tayara, Hilal; Ham, Woonchul; Chong, Kil To
2016-12-15
This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
A TMS320-based modem for the aeronautical-satellite core data service
NASA Astrophysics Data System (ADS)
Moher, Michael L.; Lodge, John H.
The International Civil Aviation Organization (ICAO) Future Air Navigation Systems (FANS) committee, the Airlines Electronics Engineering Committee (AEEC), and Inmarsat have been developing standards for an aeronautical satellite communications service. These standards encompass a satellite communications system architecture to provide comprehensive aeronautical communications services. Incorporated into the architecture is a core service capability, providing only low rate data communications, which all service providers and all aircraft earth terminals are required to support. In this paper an implementation of the physical layer of this standard for the low data rate core service is described. This is a completely digital modem (up to a low intermediate frequency). The implementation uses a single TMS320C25 chip for the transmit baseband functions of scrambling, encoding, interleaving, block formatting and modulation. The receiver baseband unit uses a dual processor configuration to implement the functions of demodulation, synchronization, de-interleaving, decoding and de-scrambling. The hardware requirements, the software structure and the algorithms of this implementation are described.
BAE Systems' 17μm LWIR camera core for civil, commercial, and military applications
NASA Astrophysics Data System (ADS)
Lee, Jeffrey; Rodriguez, Christian; Blackwell, Richard
2013-06-01
Seventeen (17) µm pixel Long Wave Infrared (LWIR) Sensors based on vanadium oxide (VOx) micro-bolometers have been in full rate production at BAE Systems' Night Vision Sensors facility in Lexington, MA for the past five years.[1] We introduce here a commercial camera core product, the Airia-MTM imaging module, in a VGA format that reads out in 30 and 60Hz progressive modes. The camera core is architected to conserve power with all digital interfaces from the readout integrated circuit through video output. The architecture enables a variety of input/output interfaces including Camera Link, USB 2.0, micro-display drivers and optional RS-170 analog output supporting legacy systems. The modular board architecture of the electronics facilitates hardware upgrades allow us to capitalize on the latest high performance low power electronics developed for the mobile phones. Software and firmware is field upgradeable through a USB 2.0 port. The USB port also gives users access to up to 100 digitally stored (lossless) images.
Technology Challenges for Deep-Throttle Cryogenic Engines for Space Exploration
NASA Technical Reports Server (NTRS)
Brown, Kendall K.; Nelson, Karl W.
2005-01-01
Historically, cryogenic rocket engines have not been used for in-space applications due to their additional complexity, the mission need for high reliability, and the challenges of propellant boil-off. While the mission and vehicle architectures are not yet defined for the lunar and Martian robotic and human exploration objectives, cryogenic rocket engines offer the potential for higher performance and greater architecture/mission flexibility. In-situ cryogenic propellant production could enable a more robust exploration program by significantly reducing the propellant mass delivered to low earth orbit, thus warranting the evaluation of cryogenic rocket engines versus the hypergolic bi-propellant engines used in the Apollo program. A multi-use engine. one which can provide the functionality that separate engines provided in the Apollo mission architecture, is desirable for lunar and Mars exploration missions because it increases overall architecture effectiveness through commonality and modularity. The engine requirement derivation process must address each unique mission application and each unique phase within each mission. The resulting requirements, such as thrust level, performance, packaging, bum duration, number of operations; required impulses for each trajectory phase; operation after extended space or surface exposure; availability for inspection and maintenance; throttle range for planetary descent, ascent, acceleration limits and many more must be addressed. Within engine system studies, the system and component technology, capability, and risks must be evaluated and a balance between the appropriate amount of technology-push and technology-pull must be addressed. This paper will summarize many of the key technology challenges associated with using high-performance cryogenic liquid propellant rocket engine systems and components in the exploration program architectures. The paper is divided into two areas. The first area describes how the mission requirements affect the engine system requirements and create system level technology challenges. An engine system architecture for multiple applications or a family of engines based upon a set of core technologies, design, and fabrication approaches may reduce overall programmatic cost and risk. The engine system discussion will also address the characterization of engine cycle figures of merit, configurations, and design approaches for some in-space vehicle alternatives under consideration. The second area evaluates the component-level technology challenges induced from the system requirements. Component technology issues are discussed addressing injector, thrust chamber, ignition system, turbopump assembly, and valve design for the challenging requirements of high reliability, robustness, fault tolerance, deep throttling, reasonable performance (with respect to weight and specific impulse).
Technology Challenges for Deep-Throttle Cryogenic Engines for Space Exploration
NASA Astrophysics Data System (ADS)
Brown, Kendall K.; Nelson, Karl W.
2005-02-01
Historically, cryogenic rocket engines have not been used for in-space applications due to their additional complexity, the mission need for high reliability, and the challenges of propellant boil-off. While the mission and vehicle architectures are not yet defined for the lunar and Martian robotic and human exploration objectives, cryogenic rocket engines offer the potential for higher performance and greater architecture/mission flexibility. In-situ cryogenic propellant production could enable a more robust exploration program by significantly reducing the propellant mass delivered to low earth orbit, thus warranting the evaluation of cryogenic rocket engines versus the hypergolic bipropellant engines used in the Apollo program. A multi-use engine, one which can provide the functionality that separate engines provided in the Apollo mission architecture, is desirable for lunar and Mars exploration missions because it increases overall architecture effectiveness through commonality and modularity. The engine requirement derivation process must address each unique mission application and each unique phase within each mission. The resulting requirements, such as thrust level, performance, packaging, burn duration, number of operations; required impulses for each trajectory phase; operation after extended space or surface exposure; availability for inspection and maintenance; throttle range for planetary descent, ascent, acceleration limits and many more must be addressed. Within engine system studies, the system and component technology, capability, and risks must be evaluated and a balance between the appropriate amount of technology-push and technology-pull must be addressed. This paper will summarize many of the key technology challenges associated with using high-performance cryogenic liquid propellant rocket engine systems and components in the exploration program architectures. The paper is divided into two areas. The first area describes how the mission requirements affect the engine system requirements and create system level technology challenges. An engine system architecture for multiple applications or a family of engines based upon a set of core technologies, design, and fabrication approaches may reduce overall programmatic cost and risk. The engine system discussion will also address the characterization of engine cycle figures of merit, configurations, and design approaches for some in-space vehicle alternatives under consideration. The second area evaluates the component-level technology challenges induced from the system requirements. Component technology issues are discussed addressing injector, thrust chamber, ignition system, turbopump assembly, and valve design for the challenging requirements of high reliability, robustness, fault tolerance, deep throttling, reasonable performance (with respect to weight and specific impulse).
The past, present, and future of cognitive architectures.
Taatgen, Niels; Anderson, John R
2010-10-01
Cognitive architectures are theories of cognition that try to capture the essential representations and mechanisms that underlie cognition. Research in cognitive architectures has gradually moved from a focus on the functional capabilities of architectures to the ability to model the details of human behavior, and, more recently, brain activity. Although there are many different architectures, they share many identical or similar mechanisms, permitting possible future convergence. In judging the quality of a particular cognitive model, it is pertinent to not just judge its fit to the experimental data but also its simplicity and ability to make predictions. Copyright © 2009 Cognitive Science Society, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadeghian, Hamed, E-mail: hamed.sadeghianmarnani@tno.nl, E-mail: h.sadeghianmarnani@tudelft.nl; Department of Precision and Microsystems Engineering, Delft University of Technology, Mekelweg 2, 2628 CD Delft; Herfst, Rodolf
We have developed a high speed, miniature scanning probe microscope (MSPM) integrated with a Positioning Unit (PU) for accurately positioning the MSPM on a large substrate. This combination enables simultaneous, parallel operation of many units on a large sample for high throughput measurements. The size of the MSPM is 19 × 45 × 70 mm{sup 3}. It contains a one-dimensional flexure stage with counter-balanced actuation for vertical scanning with a bandwidth of 50 kHz and a z-travel range of more than 2 μm. This stage is mechanically decoupled from the rest of the MSPM by suspending it on specific dynamicallymore » determined points. The motion of the probe, which is mounted on top of the flexure stage is measured by a very compact optical beam deflection (OBD). Thermal noise spectrum measurements of short cantilevers show a bandwidth of 2 MHz and a noise of less than 15 fm/Hz{sup 1/2}. A fast approach and engagement of the probe to the substrate surface have been achieved by integrating a small stepper actuator and direct monitoring of the cantilever response to the approaching surface. The PU has the same width as the MSPM, 45 mm and can position the MSPM to a pre-chosen position within an area of 275×30 mm{sup 2} to within 100 nm accuracy within a few seconds. During scanning, the MSPM is detached from the PU which is essential to eliminate mechanical vibration and drift from the relatively low-resonance frequency and low-stiffness structure of the PU. Although the specific implementation of the MSPM we describe here has been developed as an atomic force microscope, the general architecture is applicable to any form of SPM. This high speed MSPM is now being used in a parallel SPM architecture for inspection and metrology of large samples such as semiconductor wafers and masks.« less
Ares V Utilization in Support of a Human Mission to Mars
NASA Technical Reports Server (NTRS)
Holladay, J. B.; Jaap, J. P.; Pinson, R. M.; Creech, S. D.; Ryan, R. M.; Monk, T. S.; Baggett. K. E.; Runager, M. D.; Dux, I. J.; Hack, K. J.;
2010-01-01
During the analysis cycles of Phase A-Cycle 3 (PA-C3) and the follow-on 8-wk minicycle of PA-C3', the Ares V team assessed the Ares V PA-C3D configuration to the Mars Design Reference Mission as defined in the Constellation Architecture Requirements Document and further described in Mars Design Reference Architecture 5.0 (DRA 5.0) that was publicly released in July 2009. The ability to support the reference approach for the crewed Mars mission was confirmed through this analysis (7-launch nuclear thermal propulsion (NTP) architecture) and the reference chemical approach as defined in DRA 5.0 (11- or 12-launch chemical propulsion module approach). Additional chemical propulsion options were defined that utilized additional technology investments (primarily in-space cryogenic propellant transfer) that allowed for the same mission to be accomplished with 9 launches rather than the 11 or 12, as documented in DRA 5.0 and associated follow-on activities. This nine-launch chemical propulsion approach showed a unique ability to decouple the architecture from major technological developments (such as zero-boiloff technology or the development of NTP stages) and allowed for a relaxing of the infrastructure investments required to support a very rapid launch rate (30-day launch spacing as documented in DRA 5.0). As an enhancing capability, it also shows promise in allowing for and incorporating the development of a commercial market for cryogenic propellant delivery on orbit, without placing such development on the critical path of beyond low-Earth orbit exploration. The ability of Ares V to support all of the aforementioned options and discussion of key forward work that is required to fully understand the complexities and challenges presented by the Mars mission is further documented herein.
Understanding the I/O Performance Gap Between Cori KNL and Haswell
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Jialin; Koziol, Quincey; Tang, Houjun
2017-05-01
The Cori system at NERSC has two compute partitions with different CPU architectures: a 2,004 node Haswell partition and a 9,688 node KNL partition, which ranked as the 5th most powerful and fastest supercomputer on the November 2016 Top 500 list. The compute partitions share a common storage configuration, and understanding the IO performance gap between them is important, impacting not only to NERSC/LBNL users and other national labs, but also to the relevant hardware vendors and software developers. In this paper, we have analyzed performance of single core and single node IO comprehensively on the Haswell and KNL partitions,more » and have discovered the major bottlenecks, which include CPU frequencies and memory copy performance. We have also extended our performance tests to multi-node IO and revealed the IO cost difference caused by network latency, buffer size, and communication cost. Overall, we have developed a strong understanding of the IO gap between Haswell and KNL nodes and the lessons learned from this exploration will guide us in designing optimal IO solutions in many-core era.« less
Yang, Zunxian; Lv, Jun; Pang, Haidong; Yan, Wenhuan; Qian, Kun; Guo, Tailiang; Guo, Zaiping
2015-12-01
Carbon nanotubes (CNTs)/MnOx-Carbon hybrid nanofibers have been successfully synthesized by the combination of a liquid chemical redox reaction (LCRR) and a subsequent carbonization heat treatment. The nanostructures exhibit a unique one-dimensional core/shell architecture, with one-dimensional CNTs encapsulated inside and a MnOx-carbon composite nanoparticle layer on the outside. The particular porous characteristics with many meso/micro holes/pores, the highly conductive one-dimensional CNT core, as well as the encapsulating carbon matrix on the outside of the MnOx nanoparticles, lead to excellent electrochemical performance of the electrode. The CNTs/MnOx-Carbon hybrid nanofibers exhibit a high initial reversible capacity of 762.9 mAhg(-1), a high reversible specific capacity of 560.5 mAhg(-1) after 100 cycles, and excellent cycling stability and rate capability, with specific capacity of 396.2 mAhg(-1) when cycled at the current density of 1000 mAg(-1), indicating that the CNTs/MnOx-Carbon hybrid nanofibers are a promising anode candidate for Li-ion batteries.
Yang, Zunxian; Lv, Jun; Pang, Haidong; Yan, Wenhuan; Qian, Kun; Guo, Tailiang; Guo, Zaiping
2015-01-01
Carbon nanotubes (CNTs)/MnOx-Carbon hybrid nanofibers have been successfully synthesized by the combination of a liquid chemical redox reaction (LCRR) and a subsequent carbonization heat treatment. The nanostructures exhibit a unique one-dimensional core/shell architecture, with one-dimensional CNTs encapsulated inside and a MnOx-carbon composite nanoparticle layer on the outside. The particular porous characteristics with many meso/micro holes/pores, the highly conductive one-dimensional CNT core, as well as the encapsulating carbon matrix on the outside of the MnOx nanoparticles, lead to excellent electrochemical performance of the electrode. The CNTs/MnOx-Carbon hybrid nanofibers exhibit a high initial reversible capacity of 762.9 mAhg−1, a high reversible specific capacity of 560.5 mAhg−1 after 100 cycles, and excellent cycling stability and rate capability, with specific capacity of 396.2 mAhg−1 when cycled at the current density of 1000 mAg−1, indicating that the CNTs/MnOx-Carbon hybrid nanofibers are a promising anode candidate for Li-ion batteries. PMID:26621615
Lahiri, A; Roy, Abhijit Guha; Sheet, Debdoot; Biswas, Prabir Kumar
2016-08-01
Automated segmentation of retinal blood vessels in label-free fundus images entails a pivotal role in computed aided diagnosis of ophthalmic pathologies, viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases. The challenge remains active in medical image analysis research due to varied distribution of blood vessels, which manifest variations in their dimensions of physical appearance against a noisy background. In this paper we formulate the segmentation challenge as a classification task. Specifically, we employ unsupervised hierarchical feature learning using ensemble of two level of sparsely trained denoised stacked autoencoder. First level training with bootstrap samples ensures decoupling and second level ensemble formed by different network architectures ensures architectural revision. We show that ensemble training of auto-encoders fosters diversity in learning dictionary of visual kernels for vessel segmentation. SoftMax classifier is used for fine tuning each member autoencoder and multiple strategies are explored for 2-level fusion of ensemble members. On DRIVE dataset, we achieve maximum average accuracy of 95.33% with an impressively low standard deviation of 0.003 and Kappa agreement coefficient of 0.708. Comparison with other major algorithms substantiates the high efficacy of our model.
NASA Astrophysics Data System (ADS)
Sullivan, Christopher James
Weak interactions involving atomic nuclei are critical components in a broad range of as- trophysical phenomenon. As allowed Gamow-Teller transitions are the primary path through which weak interactions in nuclei operate in astrophysical contexts, the constraint of these nuclear transitions is an important goal of nuclear astrophysics. In this work, the charged current nuclear weak interaction known as electron capture is studied in the context of stellar core-collapse supernovae (CCSNe). Specifically, the sensitivity of the core-collapse and early post-bounce phases of CCSNe to nuclear electron capture rates are examined. Electron capture rates are adjusted by factors consistent with uncer- tainties indicated by comparing theoretical rates to those deduced from charge-exchange and beta-decay measurements. With the aide of such sensitivity studies, the diverse role of electron capture on thousands of nuclear species is constrained to a few tens of nuclei near N 50 and A 80 which dictate the primary response of CCSNe to nuclear electron capture. As electron capture is shown to be a leading order uncertainty during the core-collapse phase of CCSNe, future experimental and theoretical efforts should seek to constrain the rates of nuclei in this region. Furthermore, neutral current neutrino-nuclear interactions in the tens-of-MeV energy range are important in a variety of astrophysical environments including core-collapse super- novae as well as in the synthesis of some of the solar systems rarest elements. Estimates for inelastic neutrino scattering on nuclei are also important for neutrino detector construction aimed at the detection of astrophysical neutrinos. Due to the small cross sections involved, direct measurements are rare and have only been performed on a few nuclei. For this rea- son, indirect measurements provide a unique opportunity to constrain the nuclear transition strength needed to infer inelastic neutrino-nucleus cross sections. Herein the (6Li, 6Li‧) inelastic scattering reaction at 100 MeV/u is shown to indirectly select the relevant transitions for inelastic neutrino-nucleus scattering. Specifically, the probes unique selectivity of isovector- spin transfer excitations (Delta S = 1, DeltaT = 1, DeltaTz = 0) is demonstrated, thereby allowing the extraction of Gamow-Teller transition strength in the inelastic channel. Finally, the development and performance of a newly established technique for the sub- field of artificial intelligence known as neuroevolution is described. While separate from the physics that is discussed, these algorithmic advancements seek to improve the adoption of machine learning in the scientific domain by enabling neuroevolution to take advantage of modern heterogeneous compute architectures. Because the evolution of neural network pop- ulations offloads the choice of specific details about the neural networks to an evolutionary search algorithm, neuroevolution can increase the accessibility of machine learning. However, the evolution of neural networks through parameter and structural space presents a novel di- vergence problem when mapping the evaluation of these networks to many-core architectures. The principal focus of the algorithm optimizations described herein are on improving the feed-forward evaluation time when tens-to-hundreds of thousands of heterogeneous neural networks are evaluated concurrently.
Layer-by-Layer Self-Assembly of Plexcitonic Nanoparticles
2013-08-12
nitrate , trisodium citrate tribasic dihydrate, sodium poly(styrene sulfonate) (PSS, MW ~70,000), poly(diallyldimethyl ammonium chloride ) (PDADMAC...Abstract: Colloidal suspensions of multilayer nanoparticles composed of a silver core, a polyelectrolyte spacer layer (inner shell), and a J-aggregate...multilayer architecture served as a framework for examining the coupling of the localized surface plasmon resonance exhibited by the silver core with
A Diversified Investment Strategy Using Autonomous Agents
NASA Astrophysics Data System (ADS)
Barbosa, Rui Pedro; Belo, Orlando
In a previously published article, we presented an architecture for implementing agents with the ability to trade autonomously in the Forex market. At the core of this architecture is an ensemble of classification and regression models that is used to predict the direction of the price of a currency pair. In this paper, we will describe a diversified investment strategy consisting of five agents which were implemented using that architecture. By simulating trades with 18 months of out-of-sample data, we will demonstrate that data mining models can produce profitable predictions, and that the trading risk can be diminished through investment diversification.
Zhao, Yongli; Chen, Zhendong; Zhang, Jie; Wang, Xinbo
2016-07-25
Driven by the forthcoming of 5G mobile communications, the all-IP architecture of mobile core networks, i.e. evolved packet core (EPC) proposed by 3GPP, has been greatly challenged by the users' demands for higher data rate and more reliable end-to-end connection, as well as operators' demands for low operational cost. These challenges can be potentially met by software defined optical networking (SDON), which enables dynamic resource allocation according to the users' requirement. In this article, a novel network architecture for mobile core network is proposed based on SDON. A software defined network (SDN) controller is designed to realize the coordinated control over different entities in EPC networks. We analyze the requirement of EPC-lightpath (EPCL) in data plane and propose an optical switch load balancing (OSLB) algorithm for resource allocation in optical layer. The procedure of establishment and adjustment of EPCLs is demonstrated on a SDON-based EPC testbed with extended OpenFlow protocol. We also evaluate the OSLB algorithm through simulation in terms of bandwidth blocking ratio, traffic load distribution, and resource utilization ratio compared with link-based load balancing (LLB) and MinHops algorithms.