Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
NASA Technical Reports Server (NTRS)
Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash
2003-01-01
Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
High-speed and high-fidelity system and method for collecting network traffic
Weigle, Eric H [Los Alamos, NM
2010-08-24
A system is provided for the high-speed and high-fidelity collection of network traffic. The system can collect traffic at gigabit-per-second (Gbps) speeds, scale to terabit-per-second (Tbps) speeds, and support additional functions such as real-time network intrusion detection. The present system uses a dedicated operating system for traffic collection to maximize efficiency, scalability, and performance. A scalable infrastructure and apparatus for the present system is provided by splitting the work performed on one host onto multiple hosts. The present system simultaneously addresses the issues of scalability, performance, cost, and adaptability with respect to network monitoring, collection, and other network tasks. In addition to high-speed and high-fidelity network collection, the present system provides a flexible infrastructure to perform virtually any function at high speeds such as real-time network intrusion detection and wide-area network emulation for research purposes.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications
NASA Astrophysics Data System (ADS)
Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei
2007-04-01
In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shamis, Pavel; Graham, Richard L; Gorentla Venkata, Manjunath
The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementationmore » of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.« less
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena
2010-09-30
Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
A scalable healthcare information system based on a service-oriented architecture.
Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei
2011-06-01
Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Volume-scalable high-brightness three-dimensional visible light source
Subramania, Ganapathi; Fischer, Arthur J; Wang, George T; Li, Qiming
2014-02-18
A volume-scalable, high-brightness, electrically driven visible light source comprises a three-dimensional photonic crystal (3DPC) comprising one or more direct bandgap semiconductors. The improved light emission performance of the invention is achieved based on the enhancement of radiative emission of light emitters placed inside a 3DPC due to the strong modification of the photonic density-of-states engendered by the 3DPC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar
2014-01-01
In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
NASA Astrophysics Data System (ADS)
Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.
2014-06-01
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Li, Chen; Zhang, Xiong; Wang, Kai; Sun, Xianzhong; Liu, Guanghua; Li, Jiangtao; Tian, Huanfang; Li, Jianqi; Ma, Yanwei
2017-02-01
An ultrafast self-propagating high-temperature synthesis technique offers scalable routes for the fabrication of mesoporous graphene directly from CO 2 . Due to the excellent electrical conductivity and high ion-accessible surface area, supercapacitor electrodes based on the obtained graphene exhibit superior energy and power performance. The capacitance retention is higher than 90% after one million charge/discharge cycles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cao, Xuan; Chen, Haitian; Gu, Xiaofei; Liu, Bilu; Wang, Wenli; Cao, Yu; Wu, Fanqi; Zhou, Chongwu
2014-12-23
Semiconducting single-wall carbon nanotubes are very promising materials in printed electronics due to their excellent mechanical and electrical property, outstanding printability, and great potential for flexible electronics. Nonetheless, developing scalable and low-cost approaches for manufacturing fully printed high-performance single-wall carbon nanotube thin-film transistors remains a major challenge. Here we report that screen printing, which is a simple, scalable, and cost-effective technique, can be used to produce both rigid and flexible thin-film transistors using separated single-wall carbon nanotubes. Our fully printed top-gated nanotube thin-film transistors on rigid and flexible substrates exhibit decent performance, with mobility up to 7.67 cm2 V(-1) s(-1), on/off ratio of 10(4)∼10(5), minimal hysteresis, and low operation voltage (<10 V). In addition, outstanding mechanical flexibility of printed nanotube thin-film transistors (bent with radius of curvature down to 3 mm) and driving capability for organic light-emitting diode have been demonstrated. Given the high performance of the fully screen-printed single-wall carbon nanotube thin-film transistors, we believe screen printing stands as a low-cost, scalable, and reliable approach to manufacture high-performance nanotube thin-film transistors for application in display electronics. Moreover, this technique may be used to fabricate thin-film transistors based on other materials for large-area flexible macroelectronics, and low-cost display electronics.
Tezaur, Irina K.; Tuminaro, Raymond S.; Perego, Mauro; ...
2015-01-01
We examine the scalability of the recently developed Albany/FELIX finite-element based code for the first-order Stokes momentum balance equations for ice flow. We focus our analysis on the performance of two possible preconditioners for the iterative solution of the sparse linear systems that arise from the discretization of the governing equations: (1) a preconditioner based on the incomplete LU (ILU) factorization, and (2) a recently-developed algebraic multigrid (AMG) preconditioner, constructed using the idea of semi-coarsening. A strong scalability study on a realistic, high resolution Greenland ice sheet problem reveals that, for a given number of processor cores, the AMG preconditionermore » results in faster linear solve times but the ILU preconditioner exhibits better scalability. In addition, a weak scalability study is performed on a realistic, moderate resolution Antarctic ice sheet problem, a substantial fraction of which contains floating ice shelves, making it fundamentally different from the Greenland ice sheet problem. We show that as the problem size increases, the performance of the ILU preconditioner deteriorates whereas the AMG preconditioner maintains scalability. This is because the linear systems are extremely ill-conditioned in the presence of floating ice shelves, and the ill-conditioning has a greater negative effect on the ILU preconditioner than on the AMG preconditioner.« less
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
Wafer-scalable high-performance CVD graphene devices and analog circuits
NASA Astrophysics Data System (ADS)
Tao, Li; Lee, Jongho; Li, Huifeng; Piner, Richard; Ruoff, Rodney; Akinwande, Deji
2013-03-01
Graphene field effect transistors (GFETs) will serve as an essential component for functional modules like amplifier and frequency doublers in analog circuits. The performance of these modules is directly related to the mobility of charge carriers in GFETs, which per this study has been greatly improved. Low-field electrostatic measurements show field mobility values up to 12k cm2/Vs at ambient conditions with our newly developed scalable CVD graphene. For both hole and electron transport, fabricated GFETs offer substantial amplification for small and large signals at quasi-static frequencies limited only by external capacitances at high-frequencies. GFETs biased at the peak transconductance point featured high small-signal gain with eventual output power compression similar to conventional transistor amplifiers. GFETs operating around the Dirac voltage afforded positive conversion gain for the first time, to our knowledge, in experimental graphene frequency doublers. This work suggests a realistic prospect for high performance linear and non-linear analog circuits based on the unique electron-hole symmetry and fast transport now accessible in wafer-scalable CVD graphene. *Support from NSF CAREER award (ECCS-1150034) and the W. M. Keck Foundation are appreicated.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Low-power, transparent optical network interface for high bandwidth off-chip interconnects.
Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren
2009-04-13
The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.
High-frequency self-aligned graphene transistors with transferred gate stacks.
Cheng, Rui; Bai, Jingwei; Liao, Lei; Zhou, Hailong; Chen, Yu; Liu, Lixin; Lin, Yung-Chen; Jiang, Shan; Huang, Yu; Duan, Xiangfeng
2012-07-17
Graphene has attracted enormous attention for radio-frequency transistor applications because of its exceptional high carrier mobility, high carrier saturation velocity, and large critical current density. Herein we report a new approach for the scalable fabrication of high-performance graphene transistors with transferred gate stacks. Specifically, arrays of gate stacks are first patterned on a sacrificial substrate, and then transferred onto arbitrary substrates with graphene on top. A self-aligned process, enabled by the unique structure of the transferred gate stacks, is then used to position precisely the source and drain electrodes with minimized access resistance or parasitic capacitance. This process has therefore enabled scalable fabrication of self-aligned graphene transistors with unprecedented performance including a record-high cutoff frequency up to 427 GHz. Our study defines a unique pathway to large-scale fabrication of high-performance graphene transistors, and holds significant potential for future application of graphene-based devices in ultra-high-frequency circuits.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Scalable synthesis of nano-silicon from beach sand for long cycle life Li-ion batteries.
Favors, Zachary; Wang, Wei; Bay, Hamed Hosseini; Mutlu, Zafer; Ahmed, Kazi; Liu, Chueh; Ozkan, Mihrimah; Ozkan, Cengiz S
2014-07-08
Herein, porous nano-silicon has been synthesized via a highly scalable heat scavenger-assisted magnesiothermic reduction of beach sand. This environmentally benign, highly abundant, and low cost SiO₂ source allows for production of nano-silicon at the industry level with excellent electrochemical performance as an anode material for Li-ion batteries. The addition of NaCl, as an effective heat scavenger for the highly exothermic magnesium reduction process, promotes the formation of an interconnected 3D network of nano-silicon with a thickness of 8-10 nm. Carbon coated nano-silicon electrodes achieve remarkable electrochemical performance with a capacity of 1024 mAhg(-1) at 2 Ag(-1) after 1000 cycles.
High Performance Computing Multicast
2012-02-01
responsiveness, first-tier applications often implement replicated in- memory key-value stores , using them to store state or to cache data from services...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response. A...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response
Ultrascale collaborative visualization using a display-rich global cyberinfrastructure.
Jeong, Byungil; Leigh, Jason; Johnson, Andrew; Renambot, Luc; Brown, Maxine; Jagodic, Ratko; Nam, Sungwon; Hur, Hyejung
2010-01-01
The scalable adaptive graphics environment (SAGE) is high-performance graphics middleware for ultrascale collaborative visualization using a display-rich global cyberinfrastructure. Dozens of sites worldwide use this cyberinfrastructure middleware, which connects high-performance-computing resources over high-speed networks to distributed ultraresolution displays.
Design and implementation of scalable tape archiver
NASA Technical Reports Server (NTRS)
Nemoto, Toshihiro; Kitsuregawa, Masaru; Takagi, Mikio
1996-01-01
In order to reduce costs, computer manufacturers try to use commodity parts as much as possible. Mainframes using proprietary processors are being replaced by high performance RISC microprocessor-based workstations, which are further being replaced by the commodity microprocessor used in personal computers. Highly reliable disks for mainframes are also being replaced by disk arrays, which are complexes of disk drives. In this paper we try to clarify the feasibility of a large scale tertiary storage system composed of 8-mm tape archivers utilizing robotics. In the near future, the 8-mm tape archiver will be widely used and become a commodity part, since recent rapid growth of multimedia applications requires much larger storage than disk drives can provide. We designed a scalable tape archiver which connects as many 8-mm tape archivers (element archivers) as possible. In the scalable archiver, robotics can exchange a cassette tape between two adjacent element archivers mechanically. Thus, we can build a large scalable archiver inexpensively. In addition, a sophisticated migration mechanism distributes frequently accessed tapes (hot tapes) evenly among all of the element archivers, which improves the throughput considerably. Even with the failures of some tape drives, the system dynamically redistributes hot tapes to the other element archivers which have live tape drives. Several kinds of specially tailored huge archivers are on the market, however, the 8-mm tape scalable archiver could replace them. To maintain high performance in spite of high access locality when a large number of archivers are attached to the scalable archiver, it is necessary to scatter frequently accessed cassettes among the element archivers and to use the tape drives efficiently. For this purpose, we introduce two cassette migration algorithms, foreground migration and background migration. Background migration transfers cassettes between element archivers to redistribute frequently accessed cassettes, thus balancing the load of each archiver. Background migration occurs the robotics are idle. Both migration algorithms are based on access frequency and space utility of each element archiver. To normalize these parameters according to the number of drives in each element archiver, it is possible to maintain high performance even if some tape drives fail. We found that the foreground migration is efficient at reducing access response time. Beside the foreground migration, the background migration makes it possible to track the transition of spatial access locality quickly.
Scalable Production Method for Graphene Oxide Water Vapor Separation Membranes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fifield, Leonard S.; Shin, Yongsoon; Liu, Wei
ABSTRACT Membranes for selective water vapor separation were assembled from graphene oxide suspension using techniques compatible with high volume industrial production. The large-diameter graphene oxide flake suspensions were synthesized from graphite materials via relatively efficient chemical oxidation steps with attention paid to maintaining flake size and achieving high graphene oxide concentrations. Graphene oxide membranes produced using scalable casting methods exhibited water vapor flux and water/nitrogen selectivity performance meeting or exceeding that of membranes produced using vacuum-assisted laboratory techniques. (PNNL-SA-117497)
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Embedded DCT and wavelet methods for fine granular scalable video: analysis and comparison
NASA Astrophysics Data System (ADS)
van der Schaar-Mitrea, Mihaela; Chen, Yingwei; Radha, Hayder
2000-04-01
Video transmission over bandwidth-varying networks is becoming increasingly important due to emerging applications such as streaming of video over the Internet. The fundamental obstacle in designing such systems resides in the varying characteristics of the Internet (i.e. bandwidth variations and packet-loss patterns). In MPEG-4, a new SNR scalability scheme, called Fine-Granular-Scalability (FGS), is currently under standardization, which is able to adapt in real-time (i.e. at transmission time) to Internet bandwidth variations. The FGS framework consists of a non-scalable motion-predicted base-layer and an intra-coded fine-granular scalable enhancement layer. For example, the base layer can be coded using a DCT-based MPEG-4 compliant, highly efficient video compression scheme. Subsequently, the difference between the original and decoded base-layer is computed, and the resulting FGS-residual signal is intra-frame coded with an embedded scalable coder. In order to achieve high coding efficiency when compressing the FGS enhancement layer, it is crucial to analyze the nature and characteristics of residual signals common to the SNR scalability framework (including FGS). In this paper, we present a thorough analysis of SNR residual signals by evaluating its statistical properties, compaction efficiency and frequency characteristics. The signal analysis revealed that the energy compaction of the DCT and wavelet transforms is limited and the frequency characteristic of SNR residual signals decay rather slowly. Moreover, the blockiness artifacts of the low bit-rate coded base-layer result in artificial high frequencies in the residual signal. Subsequently, a variety of wavelet and embedded DCT coding techniques applicable to the FGS framework are evaluated and their results are interpreted based on the identified signal properties. As expected from the theoretical signal analysis, the rate-distortion performances of the embedded wavelet and DCT-based coders are very similar. However, improved results can be obtained for the wavelet coder by deblocking the base- layer prior to the FGS residual computation. Based on the theoretical analysis and our measurements, we can conclude that for an optimal complexity versus coding-efficiency trade- off, only limited wavelet decomposition (e.g. 2 stages) needs to be performed for the FGS-residual signal. Also, it was observed that the good rate-distortion performance of a coding technique for a certain image type (e.g. natural still-images) does not necessarily translate into similarly good performance for signals with different visual characteristics and statistical properties.
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
NASA Astrophysics Data System (ADS)
MacDonald, B.; Finot, M.; Heiken, B.; Trowbridge, T.; Ackler, H.; Leonard, L.; Johnson, E.; Chang, B.; Keating, T.
2009-08-01
Skyline Solar Inc. has developed a novel silicon-based PV system to simultaneously reduce energy cost and improve scalability of solar energy. The system achieves high gain through a combination of high capacity factor and optical concentration. The design approach drives innovation not only into the details of the system hardware, but also into manufacturing and deployment-related costs and bottlenecks. The result of this philosophy is a modular PV system whose manufacturing strategy relies only on currently existing silicon solar cell, module, reflector and aluminum parts supply chains, as well as turnkey PV module production lines and metal fabrication industries that already exist at enormous scale. Furthermore, with a high gain system design, the generating capacity of all components is multiplied, leading to a rapidly scalable system. The product design and commercialization strategy cooperate synergistically to promise dramatically lower LCOE with substantially lower risk relative to materials-intensive innovations. In this paper, we will present the key design aspects of Skyline's system, including aspects of the optical, mechanical and thermal components, revealing the ease of scalability, low cost and high performance. Additionally, we will present performance and reliability results on modules and the system, using ASTM and UL/IEC methodologies.
High-frequency self-aligned graphene transistors with transferred gate stacks
Cheng, Rui; Bai, Jingwei; Liao, Lei; Zhou, Hailong; Chen, Yu; Liu, Lixin; Lin, Yung-Chen; Jiang, Shan; Huang, Yu; Duan, Xiangfeng
2012-01-01
Graphene has attracted enormous attention for radio-frequency transistor applications because of its exceptional high carrier mobility, high carrier saturation velocity, and large critical current density. Herein we report a new approach for the scalable fabrication of high-performance graphene transistors with transferred gate stacks. Specifically, arrays of gate stacks are first patterned on a sacrificial substrate, and then transferred onto arbitrary substrates with graphene on top. A self-aligned process, enabled by the unique structure of the transferred gate stacks, is then used to position precisely the source and drain electrodes with minimized access resistance or parasitic capacitance. This process has therefore enabled scalable fabrication of self-aligned graphene transistors with unprecedented performance including a record-high cutoff frequency up to 427 GHz. Our study defines a unique pathway to large-scale fabrication of high-performance graphene transistors, and holds significant potential for future application of graphene-based devices in ultra–high-frequency circuits. PMID:22753503
Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2005-12-02
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.
k-RP*{sub s}: A scalable distributed data structure for high-performance multi-attribute access
DOE Office of Scientific and Technical Information (OSTI.GOV)
Litwin, W.; Neimat, M.A.
k-RP*{sub s} is a new data structure for scalable multicomputer files with multi-attribute (k-d) keys. We discuss the k-RP*{sub s} file evolution and search algorithms. Performance analysis shows that a k-RP*{sub s} file can be much larger and orders of magnitude faster than a traditional k-d file. The speed-up is especially important for range and partial match searches that are often impractical with traditional k-d files. This opens up a new perspective for many applications.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Scalable creation of gold nanostructures on high performance engineering polymeric substrate
NASA Astrophysics Data System (ADS)
Jia, Kun; Wang, Pan; Wei, Shiliang; Huang, Yumin; Liu, Xiaobo
2017-12-01
The article reveals a facile protocol for scalable production of gold nanostructures on a high performance engineering thermoplastic substrate made of polyarylene ether nitrile (PEN) for the first time. Firstly, gold thin films with different thicknesses of 2 nm, 4 nm and 6 nm were evaporated on a spin-coated PEN substrate on glass slide in vacuum. Next, the as-evaporated samples were thermally annealed around the glass transition temperature of the PEN substrate, on which gold nanostructures with island-like morphology were created. Moreover, it was found that the initial gold evaporation thickness and annealing atmosphere played an important role in determining the morphology and plasmonic properties of the formulated Au NPs. Interestingly, we discovered that isotropic Au NPs can be easily fabricated on the freestanding PEN substrate, which was fabricated by a cost-effective polymer solution casting method. More specifically, monodispersed Au nanospheres with an average size of ∼60 nm were obtained after annealing a 4 nm gold film covered PEN casting substrate at 220 °C for 2 h in oxygen. Therefore, the scalable production of Au NPs with controlled morphology on PEN substrate would open the way for development of robust flexible nanosensors and optical devices using high performance engineering polyarylene ethers.
Scalable UWB photonic generator based on the combination of doublet pulses.
Moreno, Vanessa; Rius, Manuel; Mora, José; Muriel, Miguel A; Capmany, José
2014-06-30
We propose and experimentally demonstrate a scalable and reconfigurable optical scheme to generate high order UWB pulses. Firstly, various ultra wideband doublets are created through a process of phase-to-intensity conversion by means of a phase modulation and a dispersive media. In a second stage, doublets are combined in an optical processing unit that allows the reconfiguration of UWB high order pulses. Experimental results both in time and frequency domains are presented showing good performance related to the fractional bandwidth and spectral efficiency parameters.
A scalable silicon photonic chip-scale optical switch for high performance computing systems.
Yu, Runxiang; Cheung, Stanley; Li, Yuliang; Okamoto, Katsunari; Proietti, Roberto; Yin, Yawei; Yoo, S J B
2013-12-30
This paper discusses the architecture and provides performance studies of a silicon photonic chip-scale optical switch for scalable interconnect network in high performance computing systems. The proposed switch exploits optical wavelength parallelism and wavelength routing characteristics of an Arrayed Waveguide Grating Router (AWGR) to allow contention resolution in the wavelength domain. Simulation results from a cycle-accurate network simulator indicate that, even with only two transmitter/receiver pairs per node, the switch exhibits lower end-to-end latency and higher throughput at high (>90%) input loads compared with electronic switches. On the device integration level, we propose to integrate all the components (ring modulators, photodetectors and AWGR) on a CMOS-compatible silicon photonic platform to ensure a compact, energy efficient and cost-effective device. We successfully demonstrate proof-of-concept routing functions on an 8 × 8 prototype fabricated using foundry services provided by OpSIS-IME.
A scalable SIMD digital signal processor for high-quality multifunctional printer systems
NASA Astrophysics Data System (ADS)
Kang, Hyeong-Ju; Choi, Yongwoo; Kim, Kimo; Park, In-Cheol; Kim, Jung-Wook; Lee, Eul-Hwan; Gahang, Goo-Soo
2005-01-01
This paper describes a high-performance scalable SIMD digital signal processor (DSP) developed for multifunctional printer systems. The DSP supports a variable number of datapaths to cover a wide range of performance and maintain a RISC-like pipeline structure. Many special instructions suitable for image processing algorithms are included in the DSP. Quad/dual instructions are introduced for 8-bit or 16-bit data, and bit-field extraction/insertion instructions are supported to process various data types. Conditional instructions are supported to deal with complex relative conditions efficiently. In addition, an intelligent DMA block is integrated to align data in the course of data reading. Experimental results show that the proposed DSP outperforms a high-end printer-system DSP by at least two times.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Many-core graph analytics using accelerated sparse linear algebra routines
NASA Astrophysics Data System (ADS)
Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric
2016-05-01
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
From Sensor Networks to Internet of Things. Bluetooth Low Energy, a Standard for This Evolution
Hortelano, Diego; Olivares, Teresa; Ruiz, M. Carmen; Garrido-Hidalgo, Celia; López, Vicente
2017-01-01
Current sensor networks need to be improved and updated to satisfy new essential requirements of the Internet of Things, where cutting-edge applications will appear. These requirements are: total coverage, zero fails (high performance), scalability and sustainability (hardware and software). We are going to evaluate Bluetooth Low Energy as wireless transmission technology and as the ideal candidate for these improvements, due to its low power consumption, its low cost radio chips and its ability to communicate with users directly, using their smartphones or smartbands. However, this technology is relatively recent, and standard network topologies are not able to fulfil its new requirements. To address these shortcomings, the implementation of other more flexible topologies (as the mesh topology) will be very interesting. After studying it in depth, we have identified certain weaknesses, for example, specific devices are needed to provide network scalability, and the need to choose between high performance or sustainability. In this paper, after presenting the studies carried out on these new technologies, we propose a new packet format and a new BLE mesh topology, with two different configurations: Individual Mesh and Collaborative Mesh. Our results show how this topology improves the scalability, sustainability, coverage and performance. PMID:28216560
From Sensor Networks to Internet of Things. Bluetooth Low Energy, a Standard for This Evolution.
Hortelano, Diego; Olivares, Teresa; Ruiz, M Carmen; Garrido-Hidalgo, Celia; López, Vicente
2017-02-14
Current sensor networks need to be improved and updated to satisfy new essential requirements of the Internet of Things, where cutting-edge applications will appear. These requirements are: total coverage, zero fails (high performance), scalability and sustainability (hardware and software). We are going to evaluate Bluetooth Low Energy as wireless transmission technology and as the ideal candidate for these improvements, due to its low power consumption, its low cost radio chips and its ability to communicate with users directly, using their smartphones or smartbands. However, this technology is relatively recent, and standard network topologies are not able to fulfil its new requirements. To address these shortcomings, the implementation of other more flexible topologies (as the mesh topology) will be very interesting. After studying it in depth, we have identified certain weaknesses, for example, specific devices are needed to provide network scalability, and the need to choose between high performance or sustainability. In this paper, after presenting the studies carried out on these new technologies, we propose a new packet format and a new BLE mesh topology, with two different configurations: Individual Mesh and Collaborative Mesh . Our results show how this topology improves the scalability, sustainability, coverage and performance.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation
NASA Astrophysics Data System (ADS)
Wu, Baodong; Li, Shigang; Zhang, Yunquan; Nie, Ningming
2017-02-01
The parallel Kinetic Monte Carlo (KMC) algorithm based on domain decomposition has been widely used in large-scale physical simulations. However, the communication overhead of the parallel KMC algorithm is critical, and severely degrades the overall performance and scalability. In this paper, we present a hybrid optimization strategy to reduce the communication overhead for the parallel KMC simulations. We first propose a communication aggregation algorithm to reduce the total number of messages and eliminate the communication redundancy. Then, we utilize the shared memory to reduce the memory copy overhead of the intra-node communication. Finally, we optimize the communication scheduling using the neighborhood collective operations. We demonstrate the scalability and high performance of our hybrid optimization strategy by both theoretical and experimental analysis. Results show that the optimized KMC algorithm exhibits better performance and scalability than the well-known open-source library-SPPARKS. On 32-node Xeon E5-2680 cluster (total 640 cores), the optimized algorithm reduces the communication time by 24.8% compared with SPPARKS.
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
NASA Astrophysics Data System (ADS)
Jamroz, Benjamin F.; Klöfkorn, Robert
2016-08-01
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scale simulations.
NASA Astrophysics Data System (ADS)
Riera-Palou, Felip; den Brinker, Albertus C.
2007-12-01
This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).
High performance data transfer
NASA Astrophysics Data System (ADS)
Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.
2017-10-01
The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Numerical simulation on a straight-bladed vertical axis wind turbine with auxiliary blade
NASA Astrophysics Data System (ADS)
Li, Y.; Zheng, Y. F.; Feng, F.; He, Q. B.; Wang, N. X.
2016-08-01
To improve the starting performance of the straight-bladed vertical axis wind turbine (SB-VAWT) at low wind speed, and the output characteristics at high wind speed, a flexible, scalable auxiliary vane mechanism was designed and installed into the rotor of SB-VAWT in this study. This new vertical axis wind turbine is a kind of lift-to-drag combination wind turbine. The flexible blade expanded, and the driving force of the wind turbines comes mainly from drag at low rotational speed. On the other hand, the flexible blade is retracted at higher speed, and the driving force is primarily from a lift. To research the effects of the flexible, scalable auxiliary module on the performance of SB-VAWT and to find its best parameters, the computational fluid dynamics (CFD) numerical calculation was carried out. The calculation result shows that the flexible, scalable blades can automatic expand and retract with the rotational speed. The moment coefficient at low tip speed ratio increased substantially. Meanwhile, the moment coefficient has also been improved at high tip speed ratios in certain ranges.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Heat-treated stainless steel felt as scalable anode material for bioelectrochemical systems.
Guo, Kun; Soeriyadi, Alexander H; Feng, Huajun; Prévoteau, Antonin; Patil, Sunil A; Gooding, J Justin; Rabaey, Korneel
2015-11-01
This work reports a simple and scalable method to convert stainless steel (SS) felt into an effective anode for bioelectrochemical systems (BESs) by means of heat treatment. X-ray photoelectron spectroscopy and cyclic voltammetry elucidated that the heat treatment generated an iron oxide rich layer on the SS felt surface. The iron oxide layer dramatically enhanced the electroactive biofilm formation on SS felt surface in BESs. Consequently, the sustained current densities achieved on the treated electrodes (1 cm(2)) were around 1.5±0.13 mA/cm(2), which was seven times higher than the untreated electrodes (0.22±0.04 mA/cm(2)). To test the scalability of this material, the heat-treated SS felt was scaled up to 150 cm(2) and similar current density (1.5 mA/cm(2)) was achieved on the larger electrode. The low cost, straightforwardness of the treatment, high conductivity and high bioelectrocatalytic performance make heat-treated SS felt a scalable anodic material for BESs. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janjusic, Tommy; Kartsaklis, Christos
Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an applicationmore » s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).« less
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
A distributed infrastructure for publishing VO services: an implementation
NASA Astrophysics Data System (ADS)
Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo
2016-07-01
This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.
Jin, Yan; Tan, Yingling; Hu, Xiaozhen; Zhu, Bin; Zheng, Qinghui; Zhang, Zijiao; Zhu, Guoying; Yu, Qian; Jin, Zhong; Zhu, Jia
2017-05-10
Alloy anodes possessed of high theoretical capacity show great potential for next-generation advanced lithium-ion battery. Even though huge volume change during lithium insertion and extraction leads to severe problems, such as pulverization and an unstable solid-electrolyte interphase (SEI), various nanostructures including nanoparticles, nanowires, and porous networks can address related challenges to improve electrochemical performance. However, the complex and expensive fabrication process hinders the widespread application of nanostructured alloy anodes, which generate an urgent demand of low-cost and scalable processes to fabricate building blocks with fine controls of size, morphology, and porosity. Here, we demonstrate a scalable and low-cost process to produce a porous yin-yang hybrid composite anode with graphene coating through high energy ball-milling and selective chemical etching. With void space to buffer the expansion, the produced functional electrodes demonstrate stable cycling performance of 910 mAh g -1 over 600 cycles at a rate of 0.5C for Si-graphene "yin" particles and 750 mAh g -1 over 300 cycles at 0.2C for Sn-graphene "yang" particles. Therefore, we open up a new approach to fabricate alloy anode materials at low-cost, low-energy consumption, and large scale. This type of porous silicon or tin composite with graphene coating can also potentially play a significant role in thermoelectrics and optoelectronics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod
2006-04-15
In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparentmore » and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5μs; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.« less
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoshii, K.; Iskra, K.; Naik, H.
2011-05-01
We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
Scalable nuclear density functional theory with Sky3D
NASA Astrophysics Data System (ADS)
Afibuzzaman, Md; Schuetrumpf, Bastian; Aktulga, Hasan Metin
2018-02-01
In nuclear astrophysics, quantum simulations of large inhomogeneous dense systems as they appear in the crusts of neutron stars present big challenges. The number of particles in a simulation with periodic boundary conditions is strongly limited due to the immense computational cost of the quantum methods. In this paper, we describe techniques for an efficient and scalable parallel implementation of Sky3D, a nuclear density functional theory solver that operates on an equidistant grid. Presented techniques allow Sky3D to achieve good scaling and high performance on a large number of cores, as demonstrated through detailed performance analysis on a Cray XC40 supercomputer.
Final report for “Extreme-scale Algorithms and Solver Resilience”
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2017-06-30
This is a joint project with principal investigators at Oak Ridge National Laboratory, Sandia National Laboratories, the University of California at Berkeley, and the University of Tennessee. Our part of the project involves developing performance models for highly scalable algorithms and the development of latency tolerant iterative methods. During this project, we extended our performance models for the Multigrid method for solving large systems of linear equations and conducted experiments with highly scalable variants of conjugate gradient methods that avoid blocking synchronization. In addition, we worked with the other members of the project on alternative techniques for resilience and reproducibility.more » We also presented an alternative approach for reproducible dot-products in parallel computations that performs almost as well as the conventional approach by separating the order of computation from the details of the decomposition of vectors across the processes.« less
Diskless supercomputers: Scalable, reliable I/O for the Tera-Op technology base
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Ousterhout, John K.; Patterson, David A.
1993-01-01
Computing is seeing an unprecedented improvement in performance; over the last five years there has been an order-of-magnitude improvement in the speeds of workstation CPU's. At least another order of magnitude seems likely in the next five years, to machines with 500 MIPS or more. The goal of the ARPA Teraop program is to realize even larger, more powerful machines, executing as many as a trillion operations per second. Unfortunately, we have seen no comparable breakthroughs in I/O performance; the speeds of I/O devices and the hardware and software architectures for managing them have not changed substantially in many years. We have completed a program of research to demonstrate hardware and software I/O architectures capable of supporting the kinds of internetworked 'visualization' workstations and supercomputers that will appear in the mid 1990s. The project had three overall goals: high performance, high reliability, and scalable, multipurpose system.
Kosa, Gergely; Vuoristo, Kiira S; Horn, Svein Jarle; Zimmermann, Boris; Afseth, Nils Kristian; Kohler, Achim; Shapaval, Volha
2018-06-01
Recent developments in molecular biology and metabolic engineering have resulted in a large increase in the number of strains that need to be tested, positioning high-throughput screening of microorganisms as an important step in bioprocess development. Scalability is crucial for performing reliable screening of microorganisms. Most of the scalability studies from microplate screening systems to controlled stirred-tank bioreactors have been performed so far with unicellular microorganisms. We have compared cultivation of industrially relevant oleaginous filamentous fungi and microalga in a Duetz-microtiter plate system to benchtop and pre-pilot bioreactors. Maximal glucose consumption rate, biomass concentration, lipid content of the biomass, biomass, and lipid yield values showed good scalability for Mucor circinelloides (less than 20% differences) and Mortierella alpina (less than 30% differences) filamentous fungi. Maximal glucose consumption and biomass production rates were identical for Crypthecodinium cohnii in microtiter plate and benchtop bioreactor. Most likely due to shear stress sensitivity of this microalga in stirred bioreactor, biomass concentration and lipid content of biomass were significantly higher in the microtiter plate system than in the benchtop bioreactor. Still, fermentation results obtained in the Duetz-microtiter plate system for Crypthecodinium cohnii are encouraging compared to what has been reported in literature. Good reproducibility (coefficient of variation less than 15% for biomass growth, glucose consumption, lipid content, and pH) were achieved in the Duetz-microtiter plate system for Mucor circinelloides and Crypthecodinium cohnii. Mortierella alpina cultivation reproducibility might be improved with inoculation optimization. In conclusion, we have presented suitability of the Duetz-microtiter plate system for the reproducible, scalable, and cost-efficient high-throughput screening of oleaginous microorganisms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jamroz, Benjamin F.; Klofkorn, Robert
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
Jamroz, Benjamin F.; Klofkorn, Robert
2016-08-26
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
The high performance parallel algorithm for Unified Gas-Kinetic Scheme
NASA Astrophysics Data System (ADS)
Li, Shiyi; Li, Qibing; Fu, Song; Xu, Jinxiu
2016-11-01
A high performance parallel algorithm for UGKS is developed to simulate three-dimensional flows internal and external on arbitrary grid system. The physical domain and velocity domain are divided into different blocks and distributed according to the two-dimensional Cartesian topology with intra-communicators in physical domain for data exchange and other intra-communicators in velocity domain for sum reduction to moment integrals. Numerical results of three-dimensional cavity flow and flow past a sphere agree well with the results from the existing studies and validate the applicability of the algorithm. The scalability of the algorithm is tested both on small (1-16) and large (729-5832) scale processors. The tested speed-up ratio is near linear ashind thus the efficiency is around 1, which reveals the good scalability of the present algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zheng, Shili, E-mail: slzheng@ipe.ac.cn; Wang, Xinran; Yan, Hong
2016-09-15
Highlights: • Nanostructured Na{sub 1.08}V{sub 6}O{sub 15} was synthesized through additive-free sol-gel process. • Prepared Na{sub 1.08}V{sub 6}O{sub 15} demonstrated high capacity and sufficient cycling stability. • The reaction temperature was optimized to allow scalable Na{sub 1.08}V{sub 6}O{sub 15} fabrication. - Abstract: Developing high-capacity cathode material with feasibility and scalability is still challenging for lithium-ion batteries (LIBs). In this study, a high-capacity ternary sodium vanadate compound, nanostructured NaV{sub 6}O{sub 15}, was template-free synthesized through sol-gel process with high producing efficiency. The as-prepared sample was systematically post-treated at different temperature and the post-annealing temperature was found to determine the cycling stabilitymore » and capacity of NaV{sub 6}O{sub 15}. The well-crystallized one exhibited good electrochemical performance with a high specific capacity of 302 mAh g{sup −1} when cycled at current density of 0.03 mA g{sup −1}. Its relatively long-term cycling stability was characterized by the cell performance under the current density of 1 A g{sup −1}, delivering a reversible capacity of 118 mAh g{sup −1} after 300 cycles with 79% capacity retention and nearly 100% coulombic efficiency: all demonstrating its significant promise of proposed strategy for large-scale synthesis of NaV{sub 6}O{sub 15} as cathode with high-capacity and high energy density for LIBs.« less
Scalable Optical-Fiber Communication Networks
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Peterson, John C.
1993-01-01
Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.
Multi-Purpose, Application-Centric, Scalable I/O Proxy Application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, M. C.
2015-06-15
MACSio is a Multi-purpose, Application-Centric, Scalable I/O proxy application. It is designed to support a number of goals with respect to parallel I/O performance testing and benchmarking including the ability to test and compare various I/O libraries and I/O paradigms, to predict scalable performance of real applications and to help identify where improvements in I/O performance can be made within the HPC I/O software stack.
Facile electrodeposition of reduced graphene oxide hydrogels for high-performance supercapacitors
NASA Astrophysics Data System (ADS)
Pham, Viet Hung; Gebre, Tesfaye; Dickerson, James H.
2015-03-01
We report both a facile, scalable method to prepare reduced graphene oxide hydrogels through the electrodeposition of graphene oxide and its use as an electrode for high-performance supercapacitors. Such systems exhibited specific capacitances of 147 and 223 F g-1 at a current density of 10 A g-1 when using H2SO4 and H2SO4 + hydroquinone redox electrolytes, respectively.We report both a facile, scalable method to prepare reduced graphene oxide hydrogels through the electrodeposition of graphene oxide and its use as an electrode for high-performance supercapacitors. Such systems exhibited specific capacitances of 147 and 223 F g-1 at a current density of 10 A g-1 when using H2SO4 and H2SO4 + hydroquinone redox electrolytes, respectively. Electronic supplementary information (ESI) available: GO synthesis, characterization, fabrication of ERGO supercapacitor and electrochemical measurement, elemental composition, TGA and XRD of GO and ERGO. See DOI: 10.1039/c4nr07508k
Scalable, full-colour and controllable chromotropic plasmonic printing
Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua
2015-01-01
Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates controllable chromotropic capability, that is, the ability of reversible colour transformations. This chromotropic capability affords enormous potentials in building functionalized prints for anticounterfeiting, special label, and high-density data encryption storage. With such excellent performances in functional colour applications, this colour-printing approach could pave the way for plasmonic colour printing in real-world commercial utilization. PMID:26567803
Scalable, full-colour and controllable chromotropic plasmonic printing.
Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua
2015-11-16
Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates controllable chromotropic capability, that is, the ability of reversible colour transformations. This chromotropic capability affords enormous potentials in building functionalized prints for anticounterfeiting, special label, and high-density data encryption storage. With such excellent performances in functional colour applications, this colour-printing approach could pave the way for plasmonic colour printing in real-world commercial utilization.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Ruiyi; Das, Suprem R; Jeong, Changwook
Transparent conducting electrodes (TCEs) require high transparency and low sheet resistance for applications in photovoltaics, photodetectors, flat panel displays, touch screen devices, and imagers. Indium tin oxide (ITO), or other transparent conductive oxides, have been used, and provide a baseline sheet resistance (RS) vs. transparency (T) relationship. Several alternative material systems have been investigated. The development of high-performance hybrid structures provides a route towards robust, scalable and low-cost approaches for realizing high-performance TCE.
Kidambi, Piran R; Mariappan, Dhanushkodi D; Dee, Nicholas T; Vyatskikh, Andrey; Zhang, Sui; Karnik, Rohit; Hart, A John
2018-03-28
Scalable, cost-effective synthesis and integration of graphene is imperative to realize large-area applications such as nanoporous atomically thin membranes (NATMs). Here, we report a scalable route to the production of NATMs via high-speed, continuous synthesis of large-area graphene by roll-to-roll chemical vapor deposition (CVD), combined with casting of a hierarchically porous polymer support. To begin, we designed and built a two zone roll-to-roll graphene CVD reactor, which sequentially exposes the moving foil substrate to annealing and growth atmospheres, with a sharp, isothermal transition between the zones. The configurational flexibility of the reactor design allows for a detailed evaluation of key parameters affecting graphene quality and trade-offs to be considered for high-rate roll-to-roll graphene manufacturing. With this system, we achieve synthesis of uniform high-quality monolayer graphene ( I D / I G < 0.065) at speeds ≥5 cm/min. NATMs fabricated from the optimized graphene, via polymer casting and postprocessing, show size-selective molecular transport with performance comparable to that of membranes made from conventionally synthesized graphene. Therefore, this work establishes the feasibility of a scalable manufacturing process of NATMs, for applications including protein desalting and small-molecule separations.
Hossain, Mozakkar; Kumar, Gundam Sandeep; Barimar Prabhava, S N; Sheerin, Emmet D; McCloskey, David; Acharya, Somobrata; Rao, K D M; Boland, John J
2018-05-22
Optically transparent photodetectors are crucial in next-generation optoelectronic applications including smart windows and transparent image sensors. Designing photodetectors with high transparency, photoresponsivity, and robust mechanical flexibility remains a significant challenge, as is managing the inevitable trade-off between high transparency and strong photoresponse. Here we report a scalable method to produce flexible crystalline Si nanostructured wire (NW) networks fabricated from silicon-on-insulator (SOI) with seamless junctions and highly responsive porous Si segments that combine to deliver exceptional performance. These networks show high transparency (∼92% at 550 nm), broadband photodetection (350 to 950 nm) with excellent responsivity (25 A/W), optical response time (0.58 ms), and mechanical flexibility (1000 cycles). Temperature-dependent photocurrent measurements indicate the presence of localized electronic states in the porous Si segments, which play a crucial role in light harvesting and photocarrier generation. The scalable low-cost approach based on SOI has the potential to deliver new classes of flexible optoelectronic devices, including next-generation photodetectors and solar cells.
Kim, Haegyeom; Lim, Hee-Dae; Kim, Sung-Wook; Hong, Jihyun; Seo, Dong-Hwa; Kim, Dae-chul; Jeon, Seokwoo; Park, Sungjin; Kang, Kisuk
2013-01-01
High-performance and cost-effective rechargeable batteries are key to the success of electric vehicles and large-scale energy storage systems. Extensive research has focused on the development of (i) new high-energy electrodes that can store more lithium or (ii) high-power nano-structured electrodes hybridized with carbonaceous materials. However, the current status of lithium batteries based on redox reactions of heavy transition metals still remains far below the demands required for the proposed applications. Herein, we present a novel approach using tunable functional groups on graphene nano-platelets as redox centers. The electrode can deliver high capacity of ~250 mAh g−1, power of ~20 kW kg−1 in an acceptable cathode voltage range, and provide excellent cyclability up to thousands of repeated charge/discharge cycles. The simple, mass-scalable synthetic route for the functionalized graphene nano-platelets proposed in this work suggests that the graphene cathode can be a promising new class of electrode. PMID:23514953
DISP: Optimizations towards Scalable MPI Startup
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu, Huansong; Pophale, Swaroop S; Gorentla Venkata, Manjunath
2016-01-01
Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namelymore » Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.« less
AsyncStageOut: Distributed user data management for CMS Analysis
NASA Astrophysics Data System (ADS)
Riahi, H.; Wildish, T.; Ciangottini, D.; Hernández, J. M.; Andreeva, J.; Balcas, J.; Karavakis, E.; Mascheroni, M.; Tanasijczuk, A. J.; Vaandering, E. W.
2015-12-01
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirements for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.
AsyncStageOut: Distributed User Data Management for CMS Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riahi, H.; Wildish, T.; Ciangottini, D.
2015-12-23
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirementsmore » for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.« less
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
Pak, JuGeon; Park, KeeHyun
2012-01-01
We propose a smart medication dispenser having a high degree of scalability and remote manageability. We construct the dispenser to have extensible hardware architecture for achieving scalability, and we install an agent program in it for achieving remote manageability. The dispenser operates as follows: when the real-time clock reaches the predetermined medication time and the user presses the dispense button at that time, the predetermined medication is dispensed from the medication dispensing tray (MDT). In the proposed dispenser, the medication for each patient is stored in an MDT. One smart medication dispenser contains mainly one MDT; however, the dispenser can be extended to include more MDTs in order to support multiple users using one dispenser. For remote management, the proposed dispenser transmits the medication status and the system configurations to the monitoring server. In the case of a specific event such as a shortage of medication, memory overload, software error, or non-adherence, the event is transmitted immediately. All these operations are performed automatically without the intervention of patients, through the agent program installed in the dispenser. Results of implementation and verification show that the proposed dispenser operates normally and performs the management operations from the medication monitoring server suitably.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Scuba: scalable kernel-based gene prioritization.
Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio
2018-01-25
The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
Efficient scalable solid-state neutron detector.
Moses, Daniel
2015-06-01
We report on scalable solid-state neutron detector system that is specifically designed to yield high thermal neutron detection sensitivity. The basic detector unit in this system is made of a (6)Li foil coupled to two crystalline silicon diodes. The theoretical intrinsic efficiency of a detector-unit is 23.8% and that of detector element comprising a stack of five detector-units is 60%. Based on the measured performance of this detector-unit, the performance of a detector system comprising a planar array of detector elements, scaled to encompass effective area of 0.43 m(2), is estimated to yield the minimum absolute efficiency required of radiological portal monitors used in homeland security.
Highly flexible electronics from scalable vertical thin film transistors.
Liu, Yuan; Zhou, Hailong; Cheng, Rui; Yu, Woojong; Huang, Yu; Duan, Xiangfeng
2014-03-12
Flexible thin-film transistors (TFTs) are of central importance for diverse electronic and particularly macroelectronic applications. The current TFTs using organic or inorganic thin film semiconductors are usually limited by either poor electrical performance or insufficient mechanical flexibility. Here, we report a new design of highly flexible vertical TFTs (VTFTs) with superior electrical performance and mechanical robustness. By using the graphene as a work-function tunable contact for amorphous indium gallium zinc oxide (IGZO) thin film, the vertical current flow across the graphene-IGZO junction can be effectively modulated by an external gate potential to enable VTFTs with a highest on-off ratio exceeding 10(5). The unique vertical transistor architecture can readily enable ultrashort channel devices with very high delivering current and exceptional mechanical flexibility. With large area graphene and IGZO thin film available, our strategy is intrinsically scalable for large scale integration of VTFT arrays and logic circuits, opening up a new pathway to highly flexible macroelectronics.
Security and Cloud Outsourcing Framework for Economic Dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...
2017-04-24
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
Peta-scale computing environments pose significant challenges for both system and application developers and addressing them required more than simply scaling up existing tera-scale solutions. Performance analysis tools play an important role in gaining this understanding, but previous monolithic tools with fixed feature sets have not sufficed. Instead, this project worked on the design, implementation, and evaluation of a general, flexible tool infrastructure supporting the construction of performance tools as “pipelines” of high-quality tool building blocks. These tool building blocks provide common performance tool functionality, and are designed for scalability, lightweight data acquisition and analysis, and interoperability. For this project, wemore » built on Open|SpeedShop, a modular and extensible open source performance analysis tool set. The design and implementation of such a general and reusable infrastructure targeted for petascale systems required us to address several challenging research issues. All components needed to be designed for scale, a task made more difficult by the need to provide general modules. The infrastructure needed to support online data aggregation to cope with the large amounts of performance and debugging data. We needed to be able to map any combination of tool components to each target architecture. And we needed to design interoperable tool APIs and workflows that were concrete enough to support the required functionality, yet provide the necessary flexibility to address a wide range of tools. A major result of this project is the ability to use this scalable infrastructure to quickly create tools that match with a machine architecture and a performance problem that needs to be understood. Another benefit is the ability for application engineers to use the highly scalable, interoperable version of Open|SpeedShop, which are reassembled from the tool building blocks into a flexible, multi-user interface set of tools. This set of tools targeted at Office of Science Leadership Class computer systems and selected Office of Science application codes. We describe the contributions made by the team at the University of Wisconsin. The project built on the efforts in Open|SpeedShop funded by DOE/NNSA and the DOE/NNSA Tri-Lab community, extended Open|Speedshop to the Office of Science Leadership Class Computing Facilities, and addressed new challenges found on these cutting edge systems. Work done under this project at Wisconsin can be divided into two categories, new algorithms and techniques for debugging, and foundation infrastructure work on our Dyninst binary analysis and instrumentation toolkits and MRNet scalability infrastructure.« less
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...
2016-10-06
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
High-fidelity cluster state generation for ultracold atoms in an optical lattice.
Inaba, Kensuke; Tokunaga, Yuuki; Tamaki, Kiyoshi; Igeta, Kazuhiro; Yamashita, Makoto
2014-03-21
We propose a method for generating high-fidelity multipartite spin entanglement of ultracold atoms in an optical lattice in a short operation time with a scalable manner, which is suitable for measurement-based quantum computation. To perform the desired operations based on the perturbative spin-spin interactions, we propose to actively utilize the extra degrees of freedom (DOFs) usually neglected in the perturbative treatment but included in the Hubbard Hamiltonian of atoms, such as, (pseudo-)charge and orbital DOFs. Our method simultaneously achieves high fidelity, short operation time, and scalability by overcoming the following fundamental problem: enhancing the interaction strength for shortening the operation time breaks the perturbative condition of the interaction and inevitably induces unwanted correlations among the spin and extra DOFs.
pcircle - A Suite of Scalable Parallel File System Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG, FEIYI
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Wang, Chunya; Zhang, Mingchao; Xia, Kailun; Gong, Xueqin; Wang, Huimin; Yin, Zhe; Guan, Baolu; Zhang, Yingying
2017-04-19
The prosperous development of stretchable electronics poses a great demand on stretchable conductive materials that could maintain their electrical conductivity under tensile strain. Previously reported strategies to obtain stretchable conductors usually involve complex structure-fabricating processes or utilization of high-cost nanomaterials. It remains a great challenge to produce stretchable and conductive materials via a scalable and cost-effective process. Herein, a large-scalable pyrolysis strategy is developed for the fabrication of intrinsically stretchable and conductive textile in utilizing low-cost and mass-produced weft-knitted textiles as raw materials. Due to the intrinsic stretchability of the weft-knitted structure and the excellent mechanical and electrical properties of the as-obtained carbonized fibers, the obtained flexible and durable textile could sustain tensile strains up to 125% while keeping a stable electrical conductivity (as shown by a Modal-based textile), thus ensuring its applications in elastic electronics. For demonstration purposes, stretchable supercapacitors and wearable thermal-therapy devices that showed stable performance with the loading of tensile strains have been fabricated. Considering the simplicity and large scalability of the process, the low-cost and mass production of the raw materials, and the superior performances of the as-obtained elastic and conductive textile, this strategy would contribute to the development and industrial production of wearable electronics.
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
NASA Technical Reports Server (NTRS)
Liu, Zhuo; Wang, Bin; Wang, Teng; Tian, Yuan; Xu, Cong; Wang, Yandong; Yu, Weikuan; Cruz, Carlos A.; Zhou, Shujia; Clune, Tom;
2013-01-01
Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malony, Allen D; Shende, Sameer
This is the final progress report for the FastOS (Phase 2) (FastOS-2) project with Argonne National Laboratory and the University of Oregon (UO). The project started at UO on July 1, 2008 and ran until April 30, 2010, at which time a six-month no-cost extension began. The FastOS-2 work at UO delivered excellent results in all research work areas: * scalable parallel monitoring * kernel-level performance measurement * parallel I/0 system measurement * large-scale and hybrid application performance measurement * onlne scalable performance data reduction and analysis * binary instrumentation
Ceramic high temperature receiver design and tests
NASA Technical Reports Server (NTRS)
Davis, S. B.
1982-01-01
The High Temperature Solar Thermal Receiver, which was tested a Edwards AFB, CA during the winter of 1980-1981, evolved from technologies developed over a five year period of work. This receiver was tested at the Army Solar Furnace at White Sands, NM in 1976. The receiver, was tested successfully at 1768 deg F and showed thermal efficiencies of 85%. The results were sufficiently promising to lead ERDA to fund our development and test of a 250 kW receiver to measure the efficiency of an open cavity receiver atop a central tower of a heliostat field. This receiver was required to be design scalable to 10, 50, and 100 MW-electric sizes to show applicability to central power tower receivers. That receiver employed rectagular silicon carbide panels and vertical stanchions to achieve scalability. The construction was shown to be fully scalable; and the receiver was operated at temperatures up to 2000 deg F to achieve the performance goals of the experiment during tests at the GIT advanced components test facility during the fall of 1978.
Stephen Thomas Stephen Thomas Researcher IV-Applied Mathematics Stephen.Thomas@nrel.gov | 303-275 -3949 Throughout his research and consulting career, Dr. Thomas has focused on the intersection of high to join the National Center for Atmospheric Research (NCAR), working on high performance and scalable
Performance comparison of leading image codecs: H.264/AVC Intra, JPEG2000, and Microsoft HD Photo
NASA Astrophysics Data System (ADS)
Tran, Trac D.; Liu, Lijie; Topiwala, Pankaj
2007-09-01
This paper provides a detailed rate-distortion performance comparison between JPEG2000, Microsoft HD Photo, and H.264/AVC High Profile 4:4:4 I-frame coding for high-resolution still images and high-definition (HD) 1080p video sequences. This work is an extension to our previous comparative study published in previous SPIE conferences [1, 2]. Here we further optimize all three codecs for compression performance. Coding simulations are performed on a set of large-format color images captured from mainstream digital cameras and 1080p HD video sequences commonly used for H.264/AVC standardization work. Overall, our experimental results show that all three codecs offer very similar coding performances at the high-quality, high-resolution setting. Differences tend to be data-dependent: JPEG2000 with the wavelet technology tends to be the best performer with smooth spatial data; H.264/AVC High-Profile with advanced spatial prediction modes tends to cope best with more complex visual content; Microsoft HD Photo tends to be the most consistent across the board. For the still-image data sets, JPEG2000 offers the best R-D performance gains (around 0.2 to 1 dB in peak signal-to-noise ratio) over H.264/AVC High-Profile intra coding and Microsoft HD Photo. For the 1080p video data set, all three codecs offer very similar coding performance. As in [1, 2], neither do we consider scalability nor complexity in this study (JPEG2000 is operating in non-scalable, but optimal performance mode).
Madaria, Anuj R; Yao, Maoqing; Chi, Chunyung; Huang, Ningfeng; Lin, Chenxi; Li, Ruijuan; Povinelli, Michelle L; Dapkus, P Daniel; Zhou, Chongwu
2012-06-13
Vertically aligned, catalyst-free semiconducting nanowires hold great potential for photovoltaic applications, in which achieving scalable synthesis and optimized optical absorption simultaneously is critical. Here, we report combining nanosphere lithography (NSL) and selected area metal-organic chemical vapor deposition (SA-MOCVD) for the first time for scalable synthesis of vertically aligned gallium arsenide nanowire arrays, and surprisingly, we show that such nanowire arrays with patterning defects due to NSL can be as good as highly ordered nanowire arrays in terms of optical absorption and reflection. Wafer-scale patterning for nanowire synthesis was done using a polystyrene nanosphere template as a mask. Nanowires grown from substrates patterned by NSL show similar structural features to those patterned using electron beam lithography (EBL). Reflection of photons from the NSL-patterned nanowire array was used as a measure of the effect of defects present in the structure. Experimentally, we show that GaAs nanowires as short as 130 nm show reflection of <10% over the visible range of the solar spectrum. Our results indicate that a highly ordered nanowire structure is not necessary: despite the "defects" present in NSL-patterned nanowire arrays, their optical performance is similar to "defect-free" structures patterned by more costly, time-consuming EBL methods. Our scalable approach for synthesis of vertical semiconducting nanowires can have application in high-throughput and low-cost optoelectronic devices, including solar cells.
Electrohydrodynamic printing for scalable MoS2 flake coating: application to gas sensing device
NASA Astrophysics Data System (ADS)
Lim, Sooman; Cho, Byungjin; Bae, Jaehyun; Kim, Ah Ra; Lee, Kyu Hwan; Kim, Se Hyun; Hahm, Myung Gwan; Nam, Jaewook
2016-10-01
Scalable sub-micrometer molybdenum disulfide ({{MoS}}2) flake films with highly uniform coverage were created using a systematic approach. An electrohydrodynamic (EHD) printing process realized a remarkably uniform distribution of exfoliated {{MoS}}2 flakes on desired substrates. In combination with a fast evaporating dispersion medium and an optimal choice of operating parameters, the EHD printing can produce a film rapidly on a substrate without excessive agglomeration or cluster formation, which can be problems in previously reported liquid-based continuous film methods. The printing of exfoliated {{MoS}}2 flakes enabled the fabrication of a gas sensor with high performance and reproducibility for {{NO}}2 and {{NH}}3.
NASA Astrophysics Data System (ADS)
Liu, Simin; Cai, Yijin; Zhao, Xiao; Liang, Yeru; Zheng, Mingtao; Hu, Hang; Dong, Hanwu; Jiang, Sanping; Liu, Yingliang; Xiao, Yong
2017-08-01
Development of facile and scalable synthesis process for the fabrication of nanoporous carbon materials with large specific surface areas, well-defined nanostructure, and high electrochemical activity is critical for the high performance energy storage applications. The key issue is the dedicated balance between the ultrahigh surface area and highly porous but interconnected nanostructure. Here, we demonstrate the fabrication of new sulfur doped nanoporous carbon sphere (S-NCS) with the ultrahigh surface area up to 3357 m2 g-1 via a high-temperature hydrothermal carbonization and subsequent KOH activation process. The as-prepared S-NCS which integrates the advantages of ultrahigh porous structure, well-defined nanospherical and modification of heteroatom displays excellent electrochemical performance. The best performance is obtained on S-NCS prepared by the hydrothermal carbonization of sublimed sulfur and glucose, S-NCS-4, reaching a high specific capacitance (405 F g-1 at a current density of 0.5 A g-1) and outstanding cycle stability. Moreover, the symmetric supercapacitor is assembled by S-NCS-4 displays a superior energy density of 53.5 Wh kg-1 at the power density of 74.2 W kg-1 in 1.0 M LiPF6 EC/DEC. The synthesis method is simple and scalable, providing a new route to prepare highly porous and heteroatom-doped nanoporous carbon spheres for high performance energy storage applications.
Shen, Daozhi; Zou, Guisheng; Liu, Lei; Zhao, Wenzheng; Wu, Aiping; Duley, Walter W; Zhou, Y Norman
2018-02-14
Miniaturization of energy storage devices can significantly decrease the overall size of electronic systems. However, this miniaturization is limited by the reduction of electrode dimensions and the reproducible transfer of small electrolyte drops. This paper reports first a simple scalable direct writing method for the production of ultraminiature microsupercapacitor (MSC) electrodes, based on femtosecond laser reduced graphene oxide (fsrGO) interlaced pads. These pads, separated by 2 μm spacing, are 100 μm long and 8 μm wide. A second stage involves the accurate transfer of an electrolyte microdroplet on top of each individual electrode, which can avoid any interference of the electrolyte with other electronic components. Abundant in-plane mesopores in fsrGO induced by a fs laser together with ultrashort interelectrode spacing enables MSCs to exhibit a high specific capacitance (6.3 mF cm -2 and 105 F cm -3 ) and ∼100% retention after 1000 cycles. An all graphene resistor-capacitor (RC) filter is also constructed by combining the MSC and a fsrGO resistor, which is confirmed to exhibit highly enhanced performance characteristics. This new hybrid technique combining fs laser direct writing and precise microdroplet transfer easily enables scalable production of ultraminiature MSCs, which is believed to be significant for practical application of micro-supercapacitor microelectronic systems.
NASA Astrophysics Data System (ADS)
Huang, T.; Alarcon, C.; Quach, N. T.
2014-12-01
Capture, curate, and analysis are the typical activities performed at any given Earth Science data center. Modern data management systems must be adaptable to heterogeneous science data formats, scalable to meet the mission's quality of service requirements, and able to manage the life-cycle of any given science data product. Designing a scalable data management doesn't happen overnight. It takes countless hours of refining, refactoring, retesting, and re-architecting. The Horizon data management and workflow framework, developed at the Jet Propulsion Laboratory, is a portable, scalable, and reusable framework for developing high-performance data management and product generation workflow systems to automate data capturing, data curation, and data analysis activities. The NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC)'s Data Management and Archive System (DMAS) is its core data infrastructure that handles capturing and distribution of hundreds of thousands of satellite observations each day around the clock. DMAS is an application of the Horizon framework. The NASA Global Imagery Browse Services (GIBS) is NASA's Earth Observing System Data and Information System (EOSDIS)'s solution for making high-resolution global imageries available to the science communities. The Imagery Exchange (TIE), an application of the Horizon framework, is a core subsystem for GIBS responsible for data capturing and imagery generation automation to support the EOSDIS' 12 distributed active archive centers and 17 Science Investigator-led Processing Systems (SIPS). This presentation discusses our ongoing effort in refining, refactoring, retesting, and re-architecting the Horizon framework to enable data-intensive science and its applications.
The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Tai, Ann T.
2000-01-01
The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.
Evaluation of the Huawei UDS cloud storage system for CERN specific data
NASA Astrophysics Data System (ADS)
Zotes Resines, M.; Heikkila, S. S.; Duellmann, D.; Adde, G.; Toebbicke, R.; Hughes, J.; Wang, L.
2014-06-01
Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack SWIFT storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as a standard in the cloud storage market. A set of client machines is used to generate I/O load patterns to evaluate the storage system performance. The presented read and write test results indicate scalability both in metadata and data perspectives. Futher the Huawei UDS cloud storage is shown to be able to recover from a major failure of losing 16 disks. Both cloud storages are finally demonstrated to function as back-end storage systems to a filesystem, which is used to deliver high energy physics software.
An Ephemeral Burst-Buffer File System for Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Teng; Moody, Adam; Yu, Weikuan
BurstFS is a distributed file system for node-local burst buffers on high performance computing systems. BurstFS presents a shared file system space across the burst buffers so that applications that use shared files can access the highly-scalable burst buffers without changing their applications.
Performance-scalable volumetric data classification for online industrial inspection
NASA Astrophysics Data System (ADS)
Abraham, Aby J.; Sadki, Mustapha; Lea, R. M.
2002-03-01
Non-intrusive inspection and non-destructive testing of manufactured objects with complex internal structures typically requires the enhancement, analysis and visualization of high-resolution volumetric data. Given the increasing availability of fast 3D scanning technology (e.g. cone-beam CT), enabling on-line detection and accurate discrimination of components or sub-structures, the inherent complexity of classification algorithms inevitably leads to throughput bottlenecks. Indeed, whereas typical inspection throughput requirements range from 1 to 1000 volumes per hour, depending on density and resolution, current computational capability is one to two orders-of-magnitude less. Accordingly, speeding up classification algorithms requires both reduction of algorithm complexity and acceleration of computer performance. A shape-based classification algorithm, offering algorithm complexity reduction, by using ellipses as generic descriptors of solids-of-revolution, and supporting performance-scalability, by exploiting the inherent parallelism of volumetric data, is presented. A two-stage variant of the classical Hough transform is used for ellipse detection and correlation of the detected ellipses facilitates position-, scale- and orientation-invariant component classification. Performance-scalability is achieved cost-effectively by accelerating a PC host with one or more COTS (Commercial-Off-The-Shelf) PCI multiprocessor cards. Experimental results are reported to demonstrate the feasibility and cost-effectiveness of the data-parallel classification algorithm for on-line industrial inspection applications.
Akama, Toshiki; Okita, Wakana; Nagai, Reito; Li, Chao; Kaneko, Toshiro; Kato, Toshiaki
2017-09-20
Few-layered transition metal dichalcogenides (TMDs) are known as true two-dimensional materials, with excellent semiconducting properties and strong light-matter interaction. Thus, TMDs are attractive materials for semitransparent and flexible solar cells for use in various applications. Hoewver, despite the recent progress, the development of a scalable method to fabricate semitransparent and flexible solar cells with mono- or few-layered TMDs remains a crucial challenge. Here, we show easy and scalable fabrication of a few-layered TMD solar cell using a Schottky-type configuration to obtain a power conversion efficiency (PCE) of approximately 0.7%, which is the highest value reported with few-layered TMDs. Clear power generation was also observed for a device fabricated on a large SiO 2 and flexible substrate, demonstrating that our method has high potential for scalable production. In addition, systematic investigation revealed that the PCE and external quantum efficiency (EQE) strongly depended on the type of photogenerated excitons (A, B, and C) because of different carrier dynamics. Because high solar cell performance along with excellent scalability can be achieved through the proposed process, our fabrication method will contribute to accelerating the industrial use of TMDs as semitransparent and flexible solar cells.
Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services
NASA Astrophysics Data System (ADS)
Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.
Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability
NASA Astrophysics Data System (ADS)
Fan, Li-Zhen; Chen, Tian-Tian; Song, Wei-Li; Li, Xiaogang; Zhang, Shichao
2015-10-01
Supercapacitors fabricated by 3D porous carbon frameworks, such as graphene- and carbon nanotube (CNT)-based aerogels, have been highly attractive due to their various advantages. However, their high cost along with insufficient yield has inhibited their large-scale applications. Here we have demonstrated a facile and easily scalable approach for large-scale preparing novel 3D nitrogen-containing porous carbon frameworks using ultralow-cost commercial cotton. Electrochemical performance suggests that the optimal nitrogen-containing cotton-derived carbon frameworks with a high nitrogen content (12.1 mol%) along with low surface area 285 m2 g-1 present high specific capacities of the 308 and 200 F g-1 in KOH electrolyte at current densities of 0.1 and 10 A g-1, respectively, with very limited capacitance loss upon 10,000 cycles in both aqueous and gel electrolytes. Moreover, the electrode exhibits the highest capacitance up to 220 F g-1 at 0.1 A g-1 and excellent flexibility (with negligible capacitance loss under different bending angles) in the polyvinyl alcohol/KOH gel electrolyte. The observed excellent performance competes well with that found in the electrodes of similar 3D frameworks formed by graphene or CNTs. Therefore, the ultralow-cost and simply strategy here demonstrates great potential for scalable producing high-performance carbon-based supercapacitors in the industry.
Fan, Li-Zhen; Chen, Tian-Tian; Song, Wei-Li; Li, Xiaogang; Zhang, Shichao
2015-10-16
Supercapacitors fabricated by 3D porous carbon frameworks, such as graphene- and carbon nanotube (CNT)-based aerogels, have been highly attractive due to their various advantages. However, their high cost along with insufficient yield has inhibited their large-scale applications. Here we have demonstrated a facile and easily scalable approach for large-scale preparing novel 3D nitrogen-containing porous carbon frameworks using ultralow-cost commercial cotton. Electrochemical performance suggests that the optimal nitrogen-containing cotton-derived carbon frameworks with a high nitrogen content (12.1 mol%) along with low surface area 285 m(2) g(-1) present high specific capacities of the 308 and 200 F g(-1) in KOH electrolyte at current densities of 0.1 and 10 A g(-1), respectively, with very limited capacitance loss upon 10,000 cycles in both aqueous and gel electrolytes. Moreover, the electrode exhibits the highest capacitance up to 220 F g(-1) at 0.1 A g(-1) and excellent flexibility (with negligible capacitance loss under different bending angles) in the polyvinyl alcohol/KOH gel electrolyte. The observed excellent performance competes well with that found in the electrodes of similar 3D frameworks formed by graphene or CNTs. Therefore, the ultralow-cost and simply strategy here demonstrates great potential for scalable producing high-performance carbon-based supercapacitors in the industry.
Fan, Li-Zhen; Chen, Tian-Tian; Song, Wei-Li; Li, Xiaogang; Zhang, Shichao
2015-01-01
Supercapacitors fabricated by 3D porous carbon frameworks, such as graphene- and carbon nanotube (CNT)-based aerogels, have been highly attractive due to their various advantages. However, their high cost along with insufficient yield has inhibited their large-scale applications. Here we have demonstrated a facile and easily scalable approach for large-scale preparing novel 3D nitrogen-containing porous carbon frameworks using ultralow-cost commercial cotton. Electrochemical performance suggests that the optimal nitrogen-containing cotton-derived carbon frameworks with a high nitrogen content (12.1 mol%) along with low surface area 285 m2 g−1 present high specific capacities of the 308 and 200 F g−1 in KOH electrolyte at current densities of 0.1 and 10 A g−1, respectively, with very limited capacitance loss upon 10,000 cycles in both aqueous and gel electrolytes. Moreover, the electrode exhibits the highest capacitance up to 220 F g−1 at 0.1 A g−1 and excellent flexibility (with negligible capacitance loss under different bending angles) in the polyvinyl alcohol/KOH gel electrolyte. The observed excellent performance competes well with that found in the electrodes of similar 3D frameworks formed by graphene or CNTs. Therefore, the ultralow-cost and simply strategy here demonstrates great potential for scalable producing high-performance carbon-based supercapacitors in the industry. PMID:26472144
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Kim, Ho-Sup; Oh, Sang-Soo; Ha, Hong-Soo; Youm, Dojun; Moon, Seung-Hyun; Kim, Jung Ho; Dou, Shi Xue; Heo, Yoon-Uk; Wee, Sung-Hun; Goyal, Amit
2014-01-01
Long-length, high-temperature superconducting (HTS) wires capable of carrying high critical current, Ic, are required for a wide range of applications. Here, we report extremely high performance HTS wires based on 5 μm thick SmBa2Cu3O7 − δ (SmBCO) single layer films on textured metallic templates. SmBCO layer wires over 20 meters long were deposited by a cost-effective, scalable co-evaporation process using a batch-type drum in a dual chamber. All deposition parameters influencing the composition, phase, and texture of the films were optimized via a unique combinatorial method that is broadly applicable for co-evaporation of other promising complex materials containing several cations. Thick SmBCO layers deposited under optimized conditions exhibit excellent cube-on-cube epitaxy. Such excellent structural epitaxy over the entire thickness results in exceptionally high Ic performance, with average Ic over 1,000 A/cm-width for the entire 22 meter long wire and maximum Ic over 1,500 A/cm-width for a short 12 cm long tape. The Ic values reported in this work are the highest values ever reported from any lengths of cuprate-based HTS wire or conductor. PMID:24752189
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Plaza, Javier; Paz, Abel
2010-10-01
Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.
Highly aligned arrays of high aspect ratio barium titanate nanowires via hydrothermal synthesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowland, Christopher C.; Zhou, Zhi; Malakooti, Mohammad H.
2015-06-01
We report on the development of a hydrothermal synthesis procedure that results in the growth of highly aligned arrays of high aspect ratio barium titanate nanowires. Using a multiple step, scalable hydrothermal reaction, a textured titanium dioxide film is deposited on titanium foil upon which highly aligned nanowires are grown via homoepitaxy and converted to barium titanate. Scanning electron microscope images clearly illustrate the effect the textured film has on the degree of orientation of the nanowires. The alignment of nanowires is quantified by calculating the Herman's Orientation Factor, which reveals a 58% improvement in orientation as compared to growthmore » in the absence of the textured film. The ferroelectric properties of barium titanate combined with the development of this scalable growth procedure provide a powerful route towards increasing the efficiency and performance of nanowire-based devices in future real-world applications such as sensing and power harvesting.« less
A New, Scalable and Low Cost Multi-Channel Monitoring System for Polymer Electrolyte Fuel Cells.
Calderón, Antonio José; González, Isaías; Calderón, Manuel; Segura, Francisca; Andújar, José Manuel
2016-03-09
In this work a new, scalable and low cost multi-channel monitoring system for Polymer Electrolyte Fuel Cells (PEFCs) has been designed, constructed and experimentally validated. This developed monitoring system performs non-intrusive voltage measurement of each individual cell of a PEFC stack and it is scalable, in the sense that it is capable to carry out measurements in stacks from 1 to 120 cells (from watts to kilowatts). The developed system comprises two main subsystems: hardware devoted to data acquisition (DAQ) and software devoted to real-time monitoring. The DAQ subsystem is based on the low-cost open-source platform Arduino and the real-time monitoring subsystem has been developed using the high-level graphical language NI LabVIEW. Such integration can be considered a novelty in scientific literature for PEFC monitoring systems. An original amplifying and multiplexing board has been designed to increase the Arduino input port availability. Data storage and real-time monitoring have been performed with an easy-to-use interface. Graphical and numerical visualization allows a continuous tracking of cell voltage. Scalability, flexibility, easy-to-use, versatility and low cost are the main features of the proposed approach. The system is described and experimental results are presented. These results demonstrate its suitability to monitor the voltage in a PEFC at cell level.
A New, Scalable and Low Cost Multi-Channel Monitoring System for Polymer Electrolyte Fuel Cells
Calderón, Antonio José; González, Isaías; Calderón, Manuel; Segura, Francisca; Andújar, José Manuel
2016-01-01
In this work a new, scalable and low cost multi-channel monitoring system for Polymer Electrolyte Fuel Cells (PEFCs) has been designed, constructed and experimentally validated. This developed monitoring system performs non-intrusive voltage measurement of each individual cell of a PEFC stack and it is scalable, in the sense that it is capable to carry out measurements in stacks from 1 to 120 cells (from watts to kilowatts). The developed system comprises two main subsystems: hardware devoted to data acquisition (DAQ) and software devoted to real-time monitoring. The DAQ subsystem is based on the low-cost open-source platform Arduino and the real-time monitoring subsystem has been developed using the high-level graphical language NI LabVIEW. Such integration can be considered a novelty in scientific literature for PEFC monitoring systems. An original amplifying and multiplexing board has been designed to increase the Arduino input port availability. Data storage and real-time monitoring have been performed with an easy-to-use interface. Graphical and numerical visualization allows a continuous tracking of cell voltage. Scalability, flexibility, easy-to-use, versatility and low cost are the main features of the proposed approach. The system is described and experimental results are presented. These results demonstrate its suitability to monitor the voltage in a PEFC at cell level. PMID:27005630
[ANTHROPOMETRIC PROPORTIONALITY METHOD ELECTION IN A SPORT POPULATION; COMPARISON OF THREE METHODS].
Almagià, Atilio; Araneda, Alberto; Sánchez, Javier; Sánchez, Patricio; Zúñiga, Maximiliano; Plaza, Paula
2015-09-01
the proportionality model application, based on ideal proportions, would have a great impact on high performance sports, due to best athletes to resemble anthropometrically. the objective of this study was to compare the following anthropometric methods of proportionality: Phantom, Combined and Scalable, in male champion university Chilean soccer players in 2012 and 2013, using South American professional soccer players as criterion, in order to find the most appropriate proportionality method to sports populations. the measerement of 22 kinanthropometric variables was performed, according to the ISAK protocol, to a sample constituted of 13 members of the men's soccer team of the Pontificia Universidad Católica de Valparaíso. The Z-values of the anthropometrics variables of each method were obtained using their respective equations. It was used as criterion population South American soccer players. a similar trend was observed between the three methods. Significant differences (p < 0.05) were found in some Z-values of Scalable and Combined methods compared to Phantom method. No significant differences were observed between the results obtained by the Combined and Scalable methods, except in wrist, thigh and hip perimeters. it is more appropriate to use the Scalable method over the Combined and Phantom methods for the comparison of Z values in kinanthropometric variables in athletes of the same discipline. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Reversible wavelet filter banks with side informationless spatially adaptive low-pass filters
NASA Astrophysics Data System (ADS)
Abhayaratne, Charith
2011-07-01
Wavelet transforms that have an adaptive low-pass filter are useful in applications that require the signal singularities, sharp transitions, and image edges to be left intact in the low-pass signal. In scalable image coding, the spatial resolution scalability is achieved by reconstructing the low-pass signal subband, which corresponds to the desired resolution level, and discarding other high-frequency wavelet subbands. In such applications, it is vital to have low-pass subbands that are not affected by smoothing artifacts associated with low-pass filtering. We present the mathematical framework for achieving 1-D wavelet transforms that have a spatially adaptive low-pass filter (SALP) using the prediction-first lifting scheme. The adaptivity decisions are computed using the wavelet coefficients, and no bookkeeping is required for the perfect reconstruction. Then, 2-D wavelet transforms that have a spatially adaptive low-pass filter are designed by extending the 1-D SALP framework. Because the 2-D polyphase decompositions are used in this case, the 2-D adaptivity decisions are made nonseparable as opposed to the separable 2-D realization using 1-D transforms. We present examples using the 2-D 5/3 wavelet transform and their lossless image coding and scalable decoding performances in terms of quality and resolution scalability. The proposed 2-D-SALP scheme results in better performance compared to the existing adaptive update lifting schemes.
Evaluation of 3D printed anatomically scalable transfemoral prosthetic knee.
Ramakrishnan, Tyagi; Schlafly, Millicent; Reed, Kyle B
2017-07-01
This case study compares a transfemoral amputee's gait while using the existing Ossur Total Knee 2000 and our novel 3D printed anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee is 3D printed out of a carbon-fiber and nylon composite that has a gear-mesh coupling with a hard-stop weight-actuated locking mechanism aided by a cross-linked four-bar spring mechanism. This design can be scaled using anatomical dimensions of a human femur and tibia to have a unique fit for each user. The transfemoral amputee who was tested is high functioning and walked on the Computer Assisted Rehabilitation Environment (CAREN) at a self-selected pace. The motion capture and force data that was collected showed that there were distinct differences in the gait dynamics. The data was used to perform the Combined Gait Asymmetry Metric (CGAM), where the scores revealed that the overall asymmetry of the gait on the Ossur Total Knee was more asymmetric than the anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee had higher peak knee flexion that caused a large step time asymmetry. This made walking on the anatomically scalable transfemoral prosthetic knee more strenuous due to the compensatory movements in adapting to the different dynamics. This can be overcome by tuning the cross-linked spring mechanism to emulate the dynamics of the subject better. The subject stated that the knee would be good for daily use and has the potential to be adapted as a running knee.
Towards a large-scale scalable adaptive heart model using shallow tree meshes
NASA Astrophysics Data System (ADS)
Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf
2015-10-01
Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
Lin, Yuanjing; Gao, Yuan; Fan, Zhiyong
2017-11-01
Planar supercapacitors with high flexibility, desirable operation safety, and high performance are considered as attractive candidates to serve as energy-storage devices for portable and wearable electronics. Here, a scalable and printable technique is adopted to construct novel and unique hierarchical nanocoral structures as the interdigitated electrodes on flexible substrates. The as-fabricated flexible all-solid-state planar supercapacitors with nanocoral structures achieve areal capacitance up to 52.9 mF cm -2 , which is 2.5 times that of devices without nanocoral structures, and this figure-of-merit is among the highest in the literature for the same category of devices. More interestingly, due to utilization of the inkjet-printing technique, excellent versatility on electrode-pattern artistic design is achieved. Particularly, working supercapacitors with artistically designed patterns are demonstrated. Meanwhile, the high scalability of such a printable method is also demonstrated by fabrication of large-sized artistic supercapacitors serving as energy-storage devices in a wearable self-powered system as a proof of concept. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hierarchical nanostructured conducting polymer hydrogel with high electrochemical activity
Pan, Lijia; Yu, Guihua; Zhai, Dongyuan; Lee, Hye Ryoung; Zhao, Wenting; Liu, Nian; Wang, Huiliang; Tee, Benjamin C.-K.; Shi, Yi; Cui, Yi; Bao, Zhenan
2012-01-01
Conducting polymer hydrogels represent a unique class of materials that synergizes the advantageous features of hydrogels and organic conductors and have been used in many applications such as bioelectronics and energy storage devices. They are often synthesized by polymerizing conductive polymer monomer within a nonconducting hydrogel matrix, resulting in deterioration of their electrical properties. Here, we report a scalable and versatile synthesis of multifunctional polyaniline (PAni) hydrogel with excellent electronic conductivity and electrochemical properties. With high surface area and three-dimensional porous nanostructures, the PAni hydrogels demonstrated potential as high-performance supercapacitor electrodes with high specific capacitance (∼480 F·g-1), unprecedented rate capability, and cycling stability (∼83% capacitance retention after 10,000 cycles). The PAni hydrogels can also function as the active component of glucose oxidase sensors with fast response time (∼0.3 s) and superior sensitivity (∼16.7 μA·mM-1). The scalable synthesis and excellent electrode performance of the PAni hydrogel make it an attractive candidate for bioelectronics and future-generation energy storage electrodes. PMID:22645374
Advanced technologies for scalable ATLAS conditions database access on the grid
NASA Astrophysics Data System (ADS)
Basset, R.; Canali, L.; Dimitrov, G.; Girone, M.; Hawkings, R.; Nevski, P.; Valassi, A.; Vaniachine, A.; Viegas, F.; Walker, R.; Wong, A.
2010-04-01
During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions Db data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions Db data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested a novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of pilot job submission on the Grid: instead of the actual query, an ATLAS utility library sends to the database server a pilot query first.
Scalable fabrication of perovskite solar cells
Li, Zhen; Klein, Talysa R.; Kim, Dong Hoe; ...
2018-03-27
Perovskite materials use earth-abundant elements, have low formation energies for deposition and are compatible with roll-to-roll and other high-volume manufacturing techniques. These features make perovskite solar cells (PSCs) suitable for terawatt-scale energy production with low production costs and low capital expenditure. Demonstrations of performance comparable to that of other thin-film photovoltaics (PVs) and improvements in laboratory-scale cell stability have recently made scale up of this PV technology an intense area of research focus. Here, we review recent progress and challenges in scaling up PSCs and related efforts to enable the terawatt-scale manufacturing and deployment of this PV technology. We discussmore » common device and module architectures, scalable deposition methods and progress in the scalable deposition of perovskite and charge-transport layers. We also provide an overview of device and module stability, module-level characterization techniques and techno-economic analyses of perovskite PV modules.« less
Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing
ERIC Educational Resources Information Center
Loh, Matthew
2013-01-01
The scalability of CMOS technology has driven computation into a diverse range of applications across the power consumption, performance and size spectra. Communication is a necessary adjunct to computation, and whether this is to push data from node-to-node in a high-performance computing cluster or from the receiver of wireless link to a neural…
A complexity-scalable software-based MPEG-2 video encoder.
Chen, Guo-bin; Lu, Xin-ning; Wang, Xing-guo; Liu, Ji-lin
2004-05-01
With the development of general-purpose processors (GPP) and video signal processing algorithms, it is possible to implement a software-based real-time video encoder on GPP, and its low cost and easy upgrade attract developers' interests to transfer video encoding from specialized hardware to more flexible software. In this paper, the encoding structure is set up first to support complexity scalability; then a lot of high performance algorithms are used on the key time-consuming modules in coding process; finally, at programming level, processor characteristics are considered to improve data access efficiency and processing parallelism. Other programming methods such as lookup table are adopted to reduce the computational complexity. Simulation results showed that these ideas could not only improve the global performance of video coding, but also provide great flexibility in complexity regulation.
A novel multiple description scalable coding scheme for mobile wireless video transmission
NASA Astrophysics Data System (ADS)
Zheng, Haifeng; Yu, Lun; Chen, Chang Wen
2005-03-01
We proposed in this paper a novel multiple description scalable coding (MDSC) scheme based on in-band motion compensation temporal filtering (IBMCTF) technique in order to achieve high video coding performance and robust video transmission. The input video sequence is first split into equal-sized groups of frames (GOFs). Within a GOF, each frame is hierarchically decomposed by discrete wavelet transform. Since there is a direct relationship between wavelet coefficients and what they represent in the image content after wavelet decomposition, we are able to reorganize the spatial orientation trees to generate multiple bit-streams and employed SPIHT algorithm to achieve high coding efficiency. We have shown that multiple bit-stream transmission is very effective in combating error propagation in both Internet video streaming and mobile wireless video. Furthermore, we adopt the IBMCTF scheme to remove the redundancy for inter-frames along the temporal direction using motion compensated temporal filtering, thus high coding performance and flexible scalability can be provided in this scheme. In order to make compressed video resilient to channel error and to guarantee robust video transmission over mobile wireless channels, we add redundancy to each bit-stream and apply error concealment strategy for lost motion vectors. Unlike traditional multiple description schemes, the integration of these techniques enable us to generate more than two bit-streams that may be more appropriate for multiple antenna transmission of compressed video. Simulate results on standard video sequences have shown that the proposed scheme provides flexible tradeoff between coding efficiency and error resilience.
2011-06-01
4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
Disparity : scalable anomaly detection for clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, N.; Bradshaw, R.; Lusk, E.
2008-01-01
In this paper, we describe disparity, a tool that does parallel, scalable anomaly detection for clusters. Disparity uses basic statistical methods and scalable reduction operations to perform data reduction on client nodes and uses these results to locate node anomalies. We discuss the implementation of disparity and present results of its use on a SiCortex SC5832 system.
Perspective: The future of quantum dot photonic integrated circuits
NASA Astrophysics Data System (ADS)
Norman, Justin C.; Jung, Daehwan; Wan, Yating; Bowers, John E.
2018-03-01
Direct epitaxial integration of III-V materials on Si offers substantial manufacturing cost and scalability advantages over heterogeneous integration. The challenge is that epitaxial growth introduces high densities of crystalline defects that limit device performance and lifetime. Quantum dot lasers, amplifiers, modulators, and photodetectors epitaxially grown on Si are showing promise for achieving low-cost, scalable integration with silicon photonics. The unique electrical confinement properties of quantum dots provide reduced sensitivity to the crystalline defects that result from III-V/Si growth, while their unique gain dynamics show promise for improved performance and new functionalities relative to their quantum well counterparts in many devices. Clear advantages for using quantum dot active layers for lasers and amplifiers on and off Si have already been demonstrated, and results for quantum dot based photodetectors and modulators look promising. Laser performance on Si is improving rapidly with continuous-wave threshold currents below 1 mA, injection efficiencies of 87%, and output powers of 175 mW at 20 °C. 1500-h reliability tests at 35 °C showed an extrapolated mean-time-to-failure of more than ten million hours. This represents a significant stride toward efficient, scalable, and reliable III-V lasers on on-axis Si substrates for photonic integrate circuits that are fully compatible with complementary metal-oxide-semiconductor (CMOS) foundries.
MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frank Mueller
2009-02-05
MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less
Performances of the PIPER scalable child human body model in accident reconstruction
Giordano, Chiara; Kleiven, Svein
2017-01-01
Human body models (HBMs) have the potential to provide significant insights into the pediatric response to impact. This study describes a scalable/posable approach to perform child accident reconstructions using the Position and Personalize Advanced Human Body Models for Injury Prediction (PIPER) scalable child HBM of different ages and in different positions obtained by the PIPER tool. Overall, the PIPER scalable child HBM managed reasonably well to predict the injury severity and location of the children involved in real-life crash scenarios documented in the medical records. The developed methodology and workflow is essential for future work to determine child injury tolerances based on the full Child Advanced Safety Project for European Roads (CASPER) accident reconstruction database. With the workflow presented in this study, the open-source PIPER scalable HBM combined with the PIPER tool is also foreseen to have implications for improved safety designs for a better protection of children in traffic accidents. PMID:29135997
Progressive Dictionary Learning with Hierarchical Predictive Structure for Scalable Video Coding.
Dai, Wenrui; Shen, Yangmei; Xiong, Hongkai; Jiang, Xiaoqian; Zou, Junni; Taubman, David
2017-04-12
Dictionary learning has emerged as a promising alternative to the conventional hybrid coding framework. However, the rigid structure of sequential training and prediction degrades its performance in scalable video coding. This paper proposes a progressive dictionary learning framework with hierarchical predictive structure for scalable video coding, especially in low bitrate region. For pyramidal layers, sparse representation based on spatio-temporal dictionary is adopted to improve the coding efficiency of enhancement layers (ELs) with a guarantee of reconstruction performance. The overcomplete dictionary is trained to adaptively capture local structures along motion trajectories as well as exploit the correlations between neighboring layers of resolutions. Furthermore, progressive dictionary learning is developed to enable the scalability in temporal domain and restrict the error propagation in a close-loop predictor. Under the hierarchical predictive structure, online learning is leveraged to guarantee the training and prediction performance with an improved convergence rate. To accommodate with the stateof- the-art scalable extension of H.264/AVC and latest HEVC, standardized codec cores are utilized to encode the base and enhancement layers. Experimental results show that the proposed method outperforms the latest SHVC and HEVC simulcast over extensive test sequences with various resolutions.
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Block-based scalable wavelet image codec
NASA Astrophysics Data System (ADS)
Bao, Yiliang; Kuo, C.-C. Jay
1999-10-01
This paper presents a high performance block-based wavelet image coder which is designed to be of very low implementational complexity yet with rich features. In this image coder, the Dual-Sliding Wavelet Transform (DSWT) is first applied to image data to generate wavelet coefficients in fixed-size blocks. Here, a block only consists of wavelet coefficients from a single subband. The coefficient blocks are directly coded with the Low Complexity Binary Description (LCBiD) coefficient coding algorithm. Each block is encoded using binary context-based bitplane coding. No parent-child correlation is exploited in the coding process. There is also no intermediate buffering needed in between DSWT and LCBiD. The compressed bit stream generated by the proposed coder is both SNR and resolution scalable, as well as highly resilient to transmission errors. Both DSWT and LCBiD process the data in blocks whose size is independent of the size of the original image. This gives more flexibility in the implementation. The codec has a very good coding performance even the block size is (16,16).
Scalability enhancement of AODV using local link repairing
NASA Astrophysics Data System (ADS)
Jain, Jyoti; Gupta, Roopam; Bandhopadhyay, T. K.
2014-09-01
Dynamic change in the topology of an ad hoc network makes it difficult to design an efficient routing protocol. Scalability of an ad hoc network is also one of the important criteria of research in this field. Most of the research works in ad hoc network focus on routing and medium access protocols and produce simulation results for limited-size networks. Ad hoc on-demand distance vector (AODV) is one of the best reactive routing protocols. In this article, modified routing protocols based on local link repairing of AODV are proposed. Method of finding alternate routes for next-to-next node is proposed in case of link failure. These protocols are beacon-less, means periodic hello message is removed from the basic AODV to improve scalability. Few control packet formats have been changed to accommodate suggested modification. Proposed protocols are simulated to investigate scalability performance and compared with basic AODV protocol. This also proves that local link repairing of proposed protocol improves scalability of the network. From simulation results, it is clear that scalability performance of routing protocol is improved because of link repairing method. We have tested protocols for different terrain area with approximate constant node densities and different traffic load.
Scalable 2D Mesoporous Silicon Nanosheets for High-Performance Lithium-Ion Battery Anode.
Chen, Song; Chen, Zhuo; Xu, Xingyan; Cao, Chuanbao; Xia, Min; Luo, Yunjun
2018-03-01
Constructing unique mesoporous 2D Si nanostructures to shorten the lithium-ion diffusion pathway, facilitate interfacial charge transfer, and enlarge the electrode-electrolyte interface offers exciting opportunities in future high-performance lithium-ion batteries. However, simultaneous realization of 2D and mesoporous structures for Si material is quite difficult due to its non-van der Waals structure. Here, the coexistence of both mesoporous and 2D ultrathin nanosheets in the Si anodes and considerably high surface area (381.6 m 2 g -1 ) are successfully achieved by a scalable and cost-efficient method. After being encapsulated with the homogeneous carbon layer, the Si/C nanocomposite anodes achieve outstanding reversible capacity, high cycle stability, and excellent rate capability. In particular, the reversible capacity reaches 1072.2 mA h g -1 at 4 A g -1 even after 500 cycles. The obvious enhancements can be attributed to the synergistic effect between the unique 2D mesoporous nanostructure and carbon capsulation. Furthermore, full-cell evaluations indicate that the unique Si/C nanostructures have a great potential in the next-generation lithium-ion battery. These findings not only greatly improve the electrochemical performances of Si anode, but also shine some light on designing the unique nanomaterials for various energy devices. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Leverage hadoop framework for large scale clinical informatics applications.
Dong, Xiao; Bahroos, Neil; Sadhu, Eugene; Jackson, Tommie; Chukhman, Morris; Johnson, Robert; Boyd, Andrew; Hynes, Denise
2013-01-01
In this manuscript, we present our experiences using the Apache Hadoop framework for high data volume and computationally intensive applications, and discuss some best practice guidelines in a clinical informatics setting. There are three main aspects in our approach: (a) process and integrate diverse, heterogeneous data sources using standard Hadoop programming tools and customized MapReduce programs; (b) after fine-grained aggregate results are obtained, perform data analysis using the Mahout data mining library; (c) leverage the column oriented features in HBase for patient centric modeling and complex temporal reasoning. This framework provides a scalable solution to meet the rapidly increasing, imperative "Big Data" needs of clinical and translational research. The intrinsic advantage of fault tolerance, high availability and scalability of Hadoop platform makes these applications readily deployable at the enterprise level cluster environment.
Lee, Tae Hoon; Kim, Kwanpyo; Kim, Gwangwoo; ...
2017-02-27
Organic field-effect transistors have attracted much attention because of their potential use in low-cost, large-area, flexible electronics. High-performance organic transistors require a low density of grain boundaries in their organic films and a decrease in the charge trap density at the semiconductor–dielectric interface for efficient charge transport. In this respect, the role of the dielectric material is crucial because it primarily determines the growth of the film and the interfacial trap density. Here, we demonstrate the use of chemical vapor-deposited hexagonal boron nitride (CVD h-BN) as a scalable growth template/dielectric for high-performance organic field-effect transistors. The field-effect transistors based onmore » C60 films grown on single-layer CVD h-BN exhibit an average mobility of 1.7 cm 2 V –1 s –1 and a maximal mobility of 2.9 cm 2 V –1 s –1 with on/off ratios of 10 7. The structural and morphology analysis shows that the epitaxial, two-dimensional growth of C 60 on CVD h-BN is mainly responsible for the superior charge transport behavior. In conclusion, we believe that CVD h-BN can serve as a growth template for various organic semiconductors, allowing the development of large-area, high-performance flexible electronics.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Tae Hoon; Kim, Kwanpyo; Kim, Gwangwoo
Organic field-effect transistors have attracted much attention because of their potential use in low-cost, large-area, flexible electronics. High-performance organic transistors require a low density of grain boundaries in their organic films and a decrease in the charge trap density at the semiconductor–dielectric interface for efficient charge transport. In this respect, the role of the dielectric material is crucial because it primarily determines the growth of the film and the interfacial trap density. Here, we demonstrate the use of chemical vapor-deposited hexagonal boron nitride (CVD h-BN) as a scalable growth template/dielectric for high-performance organic field-effect transistors. The field-effect transistors based onmore » C60 films grown on single-layer CVD h-BN exhibit an average mobility of 1.7 cm 2 V –1 s –1 and a maximal mobility of 2.9 cm 2 V –1 s –1 with on/off ratios of 10 7. The structural and morphology analysis shows that the epitaxial, two-dimensional growth of C 60 on CVD h-BN is mainly responsible for the superior charge transport behavior. In conclusion, we believe that CVD h-BN can serve as a growth template for various organic semiconductors, allowing the development of large-area, high-performance flexible electronics.« less
Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiu, Dongbin
2017-03-03
The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network
NASA Astrophysics Data System (ADS)
Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan
2017-01-01
The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
A repeatable and scalable fabrication method for sharp, hollow silicon microneedles
NASA Astrophysics Data System (ADS)
Kim, H.; Theogarajan, L. S.; Pennathur, S.
2018-03-01
Scalability and manufacturability are impeding the mass commercialization of microneedles in the medical field. Specifically, microneedle geometries need to be sharp, beveled, and completely controllable, difficult to achieve with microelectromechanical fabrication techniques. In this work, we performed a parametric study using silicon etch chemistries to optimize the fabrication of scalable and manufacturable beveled silicon hollow microneedles. We theoretically verified our parametric results with diffusion reaction equations and created a design guideline for a various set of miconeedles (80-160 µm needle base width, 100-1000 µm pitch, 40-50 µm inner bore diameter, and 150-350 µm height) to show the repeatability, scalability, and manufacturability of our process. As a result, hollow silicon microneedles with any dimensions can be fabricated with less than 2% non-uniformity across a wafer and 5% deviation between different processes. The key to achieving such high uniformity and consistency is a non-agitated HF-HNO3 bath, silicon nitride masks, and surrounding silicon filler materials with well-defined dimensions. Our proposed method is non-labor intensive, well defined by theory, and straightforward for wafer scale mass production, opening doors to a plethora of potential medical and biosensing applications.
NEXUS Scalable and Distributed Next-Generation Avionics Bus for Space Missions
NASA Technical Reports Server (NTRS)
He, Yutao; Shalom, Eddy; Chau, Savio N.; Some, Raphael R.; Bolotin, Gary S.
2011-01-01
A paper discusses NEXUS, a common, next-generation avionics interconnect that is transparently compatible with wired, fiber-optic, and RF physical layers; provides a flexible, scalable, packet switched topology; is fault-tolerant with sub-microsecond detection/recovery latency; has scalable bandwidth from 1 Kbps to 10 Gbps; has guaranteed real-time determinism with sub-microsecond latency/jitter; has built-in testability; features low power consumption (< 100 mW per Gbps); is lightweight with about a 5,000-logic-gate footprint; and is implemented in a small Bus Interface Unit (BIU) with reconfigurable back-end providing interface to legacy subsystems. NEXUS enhances a commercial interconnect standard, Serial RapidIO, to meet avionics interconnect requirements without breaking the standard. This unified interconnect technology can be used to meet performance, power, size, and reliability requirements of all ranges of equipment, sensors, and actuators at chip-to-chip, board-to-board, or box-to-box boundary. Early results from in-house modeling activity of Serial RapidIO using VisualSim indicate that the use of a switched, high-performance avionics network will provide a quantum leap in spacecraft onboard science and autonomy capability for science and exploration missions.
Algorithmic Coordination in Robotic Networks
2010-11-29
appropriate performance, robustness and scalability properties for various task allocation , surveillance, and information gathering applications is...networking, we envision designing and analyzing algorithms with appropriate performance, robustness and scalability properties for various task ...distributed algorithms for target assignments; based on the classic auction algorithms in static networks, we intend to design efficient algorithms in worst
High-performance metadata indexing and search in petascale data storage systems
NASA Astrophysics Data System (ADS)
Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.
2008-07-01
Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.
High-Speed Data Recorder for Space, Geodesy, and Other High-Speed Recording Applications
NASA Technical Reports Server (NTRS)
Taveniku, Mikael
2013-01-01
A high-speed data recorder and replay equipment has been developed for reliable high-data-rate recording to disk media. It solves problems with slow or faulty disks, multiple disk insertions, high-altitude operation, reliable performance using COTS hardware, and long-term maintenance and upgrade path challenges. The current generation data recor - ders used within the VLBI community are aging, special-purpose machines that are both slow (do not meet today's requirements) and are very expensive to maintain and operate. Furthermore, they are not easily upgraded to take advantage of commercial technology development, and are not scalable to multiple 10s of Gbit/s data rates required by new applications. The innovation provides a softwaredefined, high-speed data recorder that is scalable with technology advances in the commercial space. It maximally utilizes current technologies without being locked to a particular hardware platform. The innovation also provides a cost-effective way of streaming large amounts of data from sensors to disk, enabling many applications to store raw sensor data and perform post and signal processing offline. This recording system will be applicable to many applications needing realworld, high-speed data collection, including electronic warfare, softwaredefined radar, signal history storage of multispectral sensors, development of autonomous vehicles, and more.
TriG: Next Generation Scalable Spaceborne GNSS Receiver
NASA Technical Reports Server (NTRS)
Tien, Jeffrey Y.; Okihiro, Brian Bachman; Esterhuizen, Stephan X.; Franklin, Garth W.; Meehan, Thomas K.; Munson, Timothy N.; Robison, David E.; Turbiner, Dmitry; Young, Lawrence E.
2012-01-01
TriG is the next generation NASA scalable space GNSS Science Receiver. It will track all GNSS and additional signals (i.e. GPS, GLONASS, Galileo, Compass and Doris). Scalable 3U architecture and fully software and firmware recofigurable, enabling optimization to meet specific mission requirements. TriG GNSS EM is currently undergoing testing and is expected to complete full performance testing later this year.
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth
2005-03-15
The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
NASA Astrophysics Data System (ADS)
Rana, Moumita; Arora, Gunjan; Gautam, Ujjal K.
2015-02-01
Highly stable, cost-effective electrocatalysts facilitating oxygen reduction are crucial for the commercialization of membrane-based fuel cell and battery technologies. Herein, we demonstrate that protein-rich soya chunks with a high content of N, S and P atoms are an excellent precursor for heteroatom-doped highly graphitized carbon materials. The materials are nanoporous, with a surface area exceeding 1000 m2 g-1, and they are tunable in doping quantities. These materials exhibit highly efficient catalytic performance toward oxygen reduction reaction (ORR) with an onset potential of -0.045 V and a half-wave potential of -0.211 V (versus a saturated calomel electrode) in a basic medium, which is comparable to commercial Pt catalysts and is better than other recently developed metal-free carbon-based catalysts. These exhibit complete methanol tolerance and a performance degradation of merely ˜5% as compared to ˜14% for a commercial Pt/C catalyst after continuous use for 3000 s at the highest reduction current. We found that the fraction of graphitic N increases at a higher graphitization temperature, leading to the near complete reduction of oxygen. It is believed that due to the easy availability of the precursor and the possibility of genetic engineering to homogeneously control the heteroatom distribution, the synthetic strategy is easily scalable, with further improvement in performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Long; Xiong, Yuan; Zhang, Qianqian
The commercialization of nonfullerene organic solar cells (OSCs) relies critically on the response under typical operating conditions (for instance, temperature, humidity) and the ability of scale-up. Despite the rapid increase in power conversion efficiency (PCE) of spin-coated devices fabricated in a protective atmosphere, the device efficiencies of printed nonfullerene OSC devices by blade-coating are still lower than 6%. This slow progress significantly limits the practical printing of high-performance nonfullerene OSCs. Here, a new and stable nonfullerene combination was introduced by pairing a commercially available nonfluorinated acceptor IT-M with the polymeric donor FTAZ. Over 12%-efficiency can be achieved in spincoated FTAZ:IT-Mmore » devices using a single halogen-free solvent. More importantly, chlorinefree, in air blade-coating of FTAZ:IT-M is able to yield a PCE of nearly 11%, despite a humidity of ~50%. X-ray scattering results reveal that large π-π coherence lengths, high degree of faceon orientation with respect to the substrate, and small domain spacings of ~20 nm are closely correlated with such high device performance. Our material system and approach yields the highest reported performance for nonfullerene OSC devices by a coating technique approximating scalable fabrication methods and holds great promise for the development of low-cost, low-toxicity, and high-efficiency OSCs by high-throughput production.« less
Rana, Moumita; Arora, Gunjan; Gautam, Ujjal K
2015-01-01
Highly stable, cost-effective electrocatalysts facilitating oxygen reduction are crucial for the commercialization of membrane-based fuel cell and battery technologies. Herein, we demonstrate that protein-rich soya chunks with a high content of N, S and P atoms are an excellent precursor for heteroatom-doped highly graphitized carbon materials. The materials are nanoporous, with a surface area exceeding 1000 m2 g−1, and they are tunable in doping quantities. These materials exhibit highly efficient catalytic performance toward oxygen reduction reaction (ORR) with an onset potential of −0.045 V and a half-wave potential of −0.211 V (versus a saturated calomel electrode) in a basic medium, which is comparable to commercial Pt catalysts and is better than other recently developed metal-free carbon-based catalysts. These exhibit complete methanol tolerance and a performance degradation of merely ∼5% as compared to ∼14% for a commercial Pt/C catalyst after continuous use for 3000 s at the highest reduction current. We found that the fraction of graphitic N increases at a higher graphitization temperature, leading to the near complete reduction of oxygen. It is believed that due to the easy availability of the precursor and the possibility of genetic engineering to homogeneously control the heteroatom distribution, the synthetic strategy is easily scalable, with further improvement in performance. PMID:27877746
2011-01-01
present performance statistics to explain the scalability behavior. Keywords-atmospheric models, time intergrators , MPI, scal- ability, performance; I...across inter-element bound- aries. Basis functions are constructed as tensor products of Lagrange polynomials ψi (x) = hα(ξ) ⊗ hβ(η) ⊗ hγ(ζ)., where hα
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
A Rich Metadata Filesystem for Scientific Data
ERIC Educational Resources Information Center
Bui, Hoang
2012-01-01
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides…
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Scalable Multiprocessor for High-Speed Computing in Space
NASA Technical Reports Server (NTRS)
Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard
2004-01-01
A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dennis, John; Edwards, Jim; Evans, Kate J
2012-01-01
The Community Atmosphere Model (CAM) version 5 includes a spectral element dynamical core option from NCAR's High-Order Method Modeling Environment. It is a continuous Galerkin spectral finite element method designed for fully unstructured quadrilateral meshes. The current configurations in CAM are based on the cubed-sphere grid. The main motivation for including a spectral element dynamical core is to improve the scalability of CAM by allowing quasi-uniform grids for the sphere that do not require polar filters. In addition, the approach provides other state-of-the-art capabilities such as improved conservation properties. Spectral elements are used for the horizontal discretization, while most othermore » aspects of the dynamical core are a hybrid of well tested techniques from CAM's finite volume and global spectral dynamical core options. Here we first give a overview of the spectral element dynamical core as used in CAM. We then give scalability and performance results from CAM running with three different dynamical core options within the Community Earth System Model, using a pre-industrial time-slice configuration. We focus on high resolution simulations of 1/4 degree, 1/8 degree, and T340 spectral truncation.« less
The P-Mesh: A Commodity-based Scalable Network Architecture for Clusters
NASA Technical Reports Server (NTRS)
Nitzberg, Bill; Kuszmaul, Chris; Stockdale, Ian; Becker, Jeff; Jiang, John; Wong, Parkson; Tweten, David (Technical Monitor)
1998-01-01
We designed a new network architecture, the P-Mesh which combines the scalability and fault resilience of a torus with the performance of a switch. We compare the scalability, performance, and cost of the hub, switch, torus, tree, and P-Mesh architectures. The latter three are capable of scaling to thousands of nodes, however, the torus has severe performance limitations with that many processors. The tree and P-Mesh have similar latency, bandwidth, and bisection bandwidth, but the P-Mesh outperforms the switch architecture (a lower bound for tree performance) on 16-node NAB Parallel Benchmark tests by up to 23%, and costs 40% less. Further, the P-Mesh has better fault resilience characteristics. The P-Mesh architecture trades increased management overhead for lower cost, and is a good bridging technology while the price of tree uplinks is expensive.
Scalable graphene aptasensors for drug quantification
NASA Astrophysics Data System (ADS)
Vishnubhotla, Ramya; Ping, Jinglei; Gao, Zhaoli; Lee, Abigail; Saouaf, Olivia; Vrudhula, Amey; Johnson, A. T. Charlie
2017-11-01
Simpler and more rapid approaches for therapeutic drug-level monitoring are highly desirable to enable use at the point-of-care. We have developed an all-electronic approach for detection of the HIV drug tenofovir based on scalable fabrication of arrays of graphene field-effect transistors (GFETs) functionalized with a commercially available DNA aptamer. The shift in the Dirac voltage of the GFETs varied systematically with the concentration of tenofovir in deionized water, with a detection limit less than 1 ng/mL. Tests against a set of negative controls confirmed the specificity of the sensor response. This approach offers the potential for further development into a rapid and convenient point-of-care tool with clinically relevant performance.
Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd, Joshua S; Gorentla Venkata, Manjunath; Shamis, Pavel
2011-01-01
The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility's Jaguar (OLCF) platform, obtaining better performance and scalability than the native MPI implementation. This paper discuss Cheetah's design and implementation, and optimizations to the framework for Cray XT 5 platforms. Our results show that the Cheetah's Broadcast and Barrier perform better than the native MPI implementation. For medium data,more » the Cheetah's Broadcast outperforms the native MPI implementation by 93% for 49,152 processes problem size. For small and large data, it out performs the native MPI implementation by 10% and 9%, respectively, at 24,576 processes problem size. The Cheetah's Barrier performs 10% better than the native MPI implementation for 12,288 processes problem size.« less
The architecture of the High Performance Storage System (HPSS)
NASA Technical Reports Server (NTRS)
Teaff, Danny; Watson, Dick; Coyne, Bob
1994-01-01
The rapid growth in the size of datasets has caused a serious imbalance in I/O and storage system performance and functionality relative to application requirements and the capabilities of other system components. The High Performance Storage System (HPSS) is a scalable, next-generation storage system that will meet the functionality and performance requirements or large-scale scientific and commercial computing environments. Our goal is to improve the performance and capacity of storage by two orders of magnitude or more over what is available in the general or mass marketplace today. We are also providing corresponding improvements in architecture and functionality. This paper describes the architecture and functionality of HPSS.
Wavelet-based scalable L-infinity-oriented compression.
Alecu, Alin; Munteanu, Adrian; Cornelis, Jan P H; Schelkens, Peter
2006-09-01
Among the different classes of coding techniques proposed in literature, predictive schemes have proven their outstanding performance in near-lossless compression. However, these schemes are incapable of providing embedded L(infinity)-oriented compression, or, at most, provide a very limited number of potential L(infinity) bit-stream truncation points. We propose a new multidimensional wavelet-based L(infinity)-constrained scalable coding framework that generates a fully embedded L(infinity)-oriented bit stream and that retains the coding performance and all the scalability options of state-of-the-art L2-oriented wavelet codecs. Moreover, our codec instantiation of the proposed framework clearly outperforms JPEG2000 in L(infinity) coding sense.
Developing a scalable inert gas ion thruster
NASA Technical Reports Server (NTRS)
James, E.; Ramsey, W.; Steiner, G.
1982-01-01
Analytical studies to identify and then design a high performance scalable ion thruster operating with either argon or xenon for use in large space systems are presented. The magnetoelectrostatic containment concept is selected for its efficient ion generation capabilities. The iterative nature of the bounding magnetic fields allows the designer to scale both the diameter and length, so that the thruster can be adapted to spacecraft growth over time. Three different thruster assemblies (conical, hexagonal and hemispherical) are evaluated for a 12 cm diameter thruster and performance mapping of the various thruster configurations shows that conical discharge chambers produce the most efficient discharge operation, achieving argon efficiencies of 50-80% mass utilization at 240-310 eV/ion and xenon efficiencies of 60-97% at 240-280 eV/ion. Preliminary testing of the large 30 cm thruster, using argon propellant, indicates a 35% improvement over the 12 cm thruster in mass utilization efficiency. Since initial performance is found to be better than projected, a larger 50 cm thruster is already in the development stage.
GraphMeta: Managing HPC Rich Metadata in Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Dong; Chen, Yong; Carns, Philip
High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
Status Report on NEAMS PROTEUS/ORIGEN Integration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wieselquist, William A
2016-02-18
The US Department of Energy’s Nuclear Energy Advanced Modeling and Simulation (NEAMS) Program has contributed significantly to the development of the PROTEUS neutron transport code at Argonne National Laboratory and to the Oak Ridge Isotope Generation and Depletion Code (ORIGEN) depletion/decay code at Oak Ridge National Laboratory. PROTEUS’s key capability is the efficient and scalable (up to hundreds of thousands of cores) neutron transport solver on general, unstructured, three-dimensional finite-element-type meshes. The scalability and mesh generality enable the transfer of neutron and power distributions to other codes in the NEAMS toolkit for advanced multiphysics analysis. Recently, ORIGEN has received considerablemore » modernization to provide the high-performance depletion/decay capability within the NEAMS toolkit. This work presents a description of the initial integration of ORIGEN in PROTEUS, mainly performed during FY 2015, with minor updates in FY 2016.« less
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Integration of an intelligent systems behavior simulator and a scalable soldier-machine interface
NASA Astrophysics Data System (ADS)
Johnson, Tony; Manteuffel, Chris; Brewster, Benjamin; Tierney, Terry
2007-04-01
As the Army's Future Combat Systems (FCS) introduce emerging technologies and new force structures to the battlefield, soldiers will increasingly face new challenges in workload management. The next generation warfighter will be responsible for effectively managing robotic assets in addition to performing other missions. Studies of future battlefield operational scenarios involving the use of automation, including the specification of existing and proposed technologies, will provide significant insight into potential problem areas regarding soldier workload. The US Army Tank Automotive Research, Development, and Engineering Center (TARDEC) is currently executing an Army technology objective program to analyze and evaluate the effect of automated technologies and their associated control devices with respect to soldier workload. The Human-Robotic Interface (HRI) Intelligent Systems Behavior Simulator (ISBS) is a human performance measurement simulation system that allows modelers to develop constructive simulations of military scenarios with various deployments of interface technologies in order to evaluate operator effectiveness. One such interface is TARDEC's Scalable Soldier-Machine Interface (SMI). The scalable SMI provides a configurable machine interface application that is capable of adapting to several hardware platforms by recognizing the physical space limitations of the display device. This paper describes the integration of the ISBS and Scalable SMI applications, which will ultimately benefit both systems. The ISBS will be able to use the Scalable SMI to visualize the behaviors of virtual soldiers performing HRI tasks, such as route planning, and the scalable SMI will benefit from stimuli provided by the ISBS simulation environment. The paper describes the background of each system and details of the system integration approach.
NASA Astrophysics Data System (ADS)
Xiao, Fei; Yang, Shengxiong; Zhang, Zheye; Liu, Hongfang; Xiao, Junwu; Wan, Lian; Luo, Jun; Wang, Shuai; Liu, Yunqi
2015-03-01
We reported a scalable and modular method to prepare a new type of sandwich-structured graphene-based nanohybrid paper and explore its practical application as high-performance electrode in flexible supercapacitor. The freestanding and flexible graphene paper was firstly fabricated by highly reproducible printing technique and bubbling delamination method, by which the area and thickness of the graphene paper can be freely adjusted in a wide range. The as-prepared graphene paper possesses a collection of unique properties of highly electrical conductivity (340 S cm-1), light weight (1 mg cm-2) and excellent mechanical properties. In order to improve its supercapacitive properties, we have prepared a unique sandwich-structured graphene/polyaniline/graphene paper by in situ electropolymerization of porous polyaniline nanomaterials on graphene paper, followed by wrapping an ultrathin graphene layer on its surface. This unique design strategy not only circumvents the low energy storage capacity resulting from the double-layer capacitor of graphene paper, but also enhances the rate performance and cycling stability of porous polyaniline. The as-obtained all-solid-state symmetric supercapacitor exhibits high energy density, high power density, excellent cycling stability and exceptional mechanical flexibility, demonstrative of its extensive potential applications for flexible energy-related devices and wearable electronics.
Xiao, Fei; Yang, Shengxiong; Zhang, Zheye; Liu, Hongfang; Xiao, Junwu; Wan, Lian; Luo, Jun; Wang, Shuai; Liu, Yunqi
2015-01-01
We reported a scalable and modular method to prepare a new type of sandwich-structured graphene-based nanohybrid paper and explore its practical application as high-performance electrode in flexible supercapacitor. The freestanding and flexible graphene paper was firstly fabricated by highly reproducible printing technique and bubbling delamination method, by which the area and thickness of the graphene paper can be freely adjusted in a wide range. The as-prepared graphene paper possesses a collection of unique properties of highly electrical conductivity (340 S cm−1), light weight (1 mg cm−2) and excellent mechanical properties. In order to improve its supercapacitive properties, we have prepared a unique sandwich-structured graphene/polyaniline/graphene paper by in situ electropolymerization of porous polyaniline nanomaterials on graphene paper, followed by wrapping an ultrathin graphene layer on its surface. This unique design strategy not only circumvents the low energy storage capacity resulting from the double-layer capacitor of graphene paper, but also enhances the rate performance and cycling stability of porous polyaniline. The as-obtained all-solid-state symmetric supercapacitor exhibits high energy density, high power density, excellent cycling stability and exceptional mechanical flexibility, demonstrative of its extensive potential applications for flexible energy-related devices and wearable electronics. PMID:25797022
Xiao, Fei; Yang, Shengxiong; Zhang, Zheye; Liu, Hongfang; Xiao, Junwu; Wan, Lian; Luo, Jun; Wang, Shuai; Liu, Yunqi
2015-03-23
We reported a scalable and modular method to prepare a new type of sandwich-structured graphene-based nanohybrid paper and explore its practical application as high-performance electrode in flexible supercapacitor. The freestanding and flexible graphene paper was firstly fabricated by highly reproducible printing technique and bubbling delamination method, by which the area and thickness of the graphene paper can be freely adjusted in a wide range. The as-prepared graphene paper possesses a collection of unique properties of highly electrical conductivity (340 S cm(-1)), light weight (1 mg cm(-2)) and excellent mechanical properties. In order to improve its supercapacitive properties, we have prepared a unique sandwich-structured graphene/polyaniline/graphene paper by in situ electropolymerization of porous polyaniline nanomaterials on graphene paper, followed by wrapping an ultrathin graphene layer on its surface. This unique design strategy not only circumvents the low energy storage capacity resulting from the double-layer capacitor of graphene paper, but also enhances the rate performance and cycling stability of porous polyaniline. The as-obtained all-solid-state symmetric supercapacitor exhibits high energy density, high power density, excellent cycling stability and exceptional mechanical flexibility, demonstrative of its extensive potential applications for flexible energy-related devices and wearable electronics.
A look at scalable dense linear algebra libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.
1992-01-01
We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
A look at scalable dense linear algebra libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.
1992-08-01
We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2000-01-01
A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
Photonics and other approaches to high speed communications
NASA Technical Reports Server (NTRS)
Maly, Kurt
1992-01-01
Our research group of 4 faculty and about 10-15 graduate students was actively involved (as a group) in the development of computer communication networks for the last five years. Many of its individuals have been involved in related research for a much longer period. The overall research goal is to extend network performance to higher data rates, to improve protocol performance at most ISO layers and to improve network operational performance. We briefly state our research goals, then discuss the research accomplishments and direct your attention to attached and/or published papers which cover the following topics: scalable parallel communications; high performance interconnection between high data rate networks; and a simple, effective media access protocol system for integrated, high data rate networks.
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran
2017-03-01
In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.
NASA Astrophysics Data System (ADS)
Al Hadhrami, Tawfik; Nightingale, James M.; Wang, Qi; Grecos, Christos
2014-05-01
In emergency situations, the ability to remotely monitor unfolding events using high-quality video feeds will significantly improve the incident commander's understanding of the situation and thereby aids effective decision making. This paper presents a novel, adaptive video monitoring system for emergency situations where the normal communications network infrastructure has been severely impaired or is no longer operational. The proposed scheme, operating over a rapidly deployable wireless mesh network, supports real-time video feeds between first responders, forward operating bases and primary command and control centers. Video feeds captured on portable devices carried by first responders and by static visual sensors are encoded in H.264/SVC, the scalable extension to H.264/AVC, allowing efficient, standard-based temporal, spatial, and quality scalability of the video. A three-tier video delivery system is proposed, which balances the need to avoid overuse of mesh nodes with the operational requirements of the emergency management team. In the first tier, the video feeds are delivered at a low spatial and temporal resolution employing only the base layer of the H.264/SVC video stream. Routing in this mode is designed to employ all nodes across the entire mesh network. In the second tier, whenever operational considerations require that commanders or operators focus on a particular video feed, a `fidelity control' mechanism at the monitoring station sends control messages to the routing and scheduling agents in the mesh network, which increase the quality of the received picture using SNR scalability while conserving bandwidth by maintaining a low frame rate. In this mode, routing decisions are based on reliable packet delivery with the most reliable routes being used to deliver the base and lower enhancement layers; as fidelity is increased and more scalable layers are transmitted they will be assigned to routes in descending order of reliability. The third tier of video delivery transmits a high-quality video stream including all available scalable layers using the most reliable routes through the mesh network ensuring the highest possible video quality. The proposed scheme is implemented in a proven simulator, and the performance of the proposed system is numerically evaluated through extensive simulations. We further present an in-depth analysis of the proposed solutions and potential approaches towards supporting high-quality visual communications in such a demanding context.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
ERIC Educational Resources Information Center
Younge, Andrew J.
2016-01-01
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
ERIC Educational Resources Information Center
Robinson, William S.; Buntrock, LeAnn M.
2011-01-01
Turning around chronically low-performing schools is challenging work requiring fundamental rethinking of the change process, and a systemic rather than school-by-school approach. Without a doubt, high-impact school leaders are critical to turnaround success, and pockets of success around the country demonstrate this. However, transformational and…
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2012-01-01
The simulation was performed on 64K cores of Intrepid, running at 0.25 simulated-years-per-day and taking 25 million core-hours. This is the first simulation using both the CAM5 physics and the highly scalable spectral element dynamical core. The animation of Total Precipitable Water clearly shows hurricanes developing in the Atlantic and Pacific.
Zou, Lei; Lai, Yanqing; Hu, Hongxing; Wang, Mengran; Zhang, Kai; Zhang, Peng; Fang, Jing; Li, Jie
2017-10-12
A facile and scalable method is realized for the in situ synthesis of N/S co-doped 3 D porous carbon nanosheet networks (NSPCNNs) as anode materials for sodium-ion batteries. During the synthesis, NaCl is used as a template to prepare porous carbon nanosheet networks. In the resultant architecture, the unique 3 D porous architecture ensures a large specific surface area and fast diffusion paths of both electrons and ions. In addition, the import of N/S produces abundant defects, increased interlayer spacings, more active sites, and high electronic conductivity. The obtained products deliver a high specific capacity and excellent long-term cycling performance, specifically, a capacity of 336.2 mA h g -1 at 0.05 A g -1 , remaining as large as 214.9 mA h g -1 after 2000 charge/discharge cycles at 0.5 A g -1 . This material has great prospects for future applications of scalable, low-cost, and environmentally friendly sodium-ion batteries. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Nonepitaxial Thin-Film InP for Scalable and Efficient Photocathodes.
Hettick, Mark; Zheng, Maxwell; Lin, Yongjing; Sutter-Fella, Carolin M; Ager, Joel W; Javey, Ali
2015-06-18
To date, some of the highest performance photocathodes of a photoelectrochemical (PEC) cell have been shown with single-crystalline p-type InP wafers, exhibiting half-cell solar-to-hydrogen conversion efficiencies of over 14%. However, the high cost of single-crystalline InP wafers may present a challenge for future large-scale industrial deployment. Analogous to solar cells, a thin-film approach could address the cost challenges by utilizing the benefits of the InP material while decreasing the use of expensive materials and processes. Here, we demonstrate this approach, using the newly developed thin-film vapor-liquid-solid (TF-VLS) nonepitaxial growth method combined with an atomic-layer deposition protection process to create thin-film InP photocathodes with large grain size and high performance, in the first reported solar device configuration generated by materials grown with this technique. Current-voltage measurements show a photocurrent (29.4 mA/cm(2)) and onset potential (630 mV) approaching single-crystalline wafers and an overall power conversion efficiency of 11.6%, making TF-VLS InP a promising photocathode for scalable and efficient solar hydrogen generation.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
Theoretical and Empirical Analysis of a Spatial EA Parallel Boosting Algorithm.
Kamath, Uday; Domeniconi, Carlotta; De Jong, Kenneth
2018-01-01
Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.
Schute, Kai; Rose, Marcus
2015-10-26
A metal-free route for the synthesis of hyper-cross-linked polymers (HCP) based on Brønsted acids such as trifluoromethanesulfonic acid as well as H2 SO4 is reported. It is an improved method compared to conventional synthesis strategies that use stoichiometric amounts of metal-based Lewis acids such as FeCl3 . The resulting high-performance adsorbents exhibit a permanent porosity with high specific surface areas up to 1842 m(2) g(-1) . Easy scalability of the HCP synthesis is proven on the multi-gram scale. All chemo-physical properties are preserved. Water-vapor adsorption shows that the resulting materials exhibit an even more pronounced hydrophobicity compared to the conventionally prepared materials. The reduced surface polarity enhances the selectivity in the liquid-phase adsorption of the biogenic platform chemical 5-hydroxymethylfurfural. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Crescentini, Marco; Thei, Frederico; Bennati, Marco; Saha, Shimul; de Planque, Maurits R R; Morgan, Hywel; Tartagni, Marco
2015-06-01
Lipid bilayer membrane (BLM) arrays are required for high throughput analysis, for example drug screening or advanced DNA sequencing. Complex microfluidic devices are being developed but these are restricted in terms of array size and structure or have integrated electronic sensing with limited noise performance. We present a compact and scalable multichannel electrophysiology platform based on a hybrid approach that combines integrated state-of-the-art microelectronics with low-cost disposable fluidics providing a platform for high-quality parallel single ion channel recording. Specifically, we have developed a new integrated circuit amplifier based on a novel noise cancellation scheme that eliminates flicker noise derived from devices under test and amplifiers. The system is demonstrated through the simultaneous recording of ion channel activity from eight bilayer membranes. The platform is scalable and could be extended to much larger array sizes, limited only by electronic data decimation and communication capabilities.
Parallel Climate Data Assimilation PSAS Package
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Kai; Kim, Donghoe; Whitaker, James B
Rapid development of perovskite solar cells (PSCs) during the past several years has made this photovoltaic (PV) technology a serious contender for potential large-scale deployment on the terawatt scale in the PV market. To successfully transition PSC technology from the laboratory to industry scale, substantial efforts need to focus on scalable fabrication of high-performance perovskite modules with minimum negative environmental impact. Here, we provide an overview of the current research and our perspective regarding PSC technology toward future large-scale manufacturing and deployment. Several key challenges discussed are (1) a scalable process for large-area perovskite module fabrication; (2) less hazardous chemicalmore » routes for PSC fabrication; and (3) suitable perovskite module designs for different applications.« less
Hierarchical porous NiCo2O4 nanowires for high-rate supercapacitors.
Jiang, Hao; Ma, Jan; Li, Chunzhong
2012-05-11
We demonstrate a simple and scalable strategy for synthesizing hierarchical porous NiCo(2)O(4) nanowires which exhibit a high specific capacitance of 743 F g(-1) at 1 A g(-1) with excellent rate performance (78.6% capacity retention at 40 A g(-1)) and cycling stability (only 6.2% loss after 3000 cycles). This journal is © The Royal Society of Chemistry 2012
Engineering PFLOTRAN for Scalable Performance on Cray XT and IBM BlueGene Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mills, Richard T; Sripathi, Vamsi K; Mahinthakumar, Gnanamanika
We describe PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - and the approaches we have employed to obtain scalable performance on some of the largest scale supercomputers in the world. We present detailed analyses of I/O and solver performance on Jaguar, the Cray XT5 at Oak Ridge National Laboratory, and Intrepid, the IBM BlueGene/P at Argonne National Laboratory, that have guided our choice of algorithms.
NASA Technical Reports Server (NTRS)
Spence, Brian; White, Steve; Schmid, Kevin; Douglas Mark
2012-01-01
The Flexible Array Concentrator Technology (FACT) is a lightweight, high-performance reflective concentrator blanket assembly that can be used on flexible solar array blankets. The FACT concentrator replaces every other row of solar cells on a solar array blanket, significantly reducing the cost of the array. The modular design is highly scalable for the array system designer, and exhibits compact stowage, good off-pointing acceptance, and mass/cost savings. The assembly s relatively low concentration ratio, accompanied by a large radiative area, provides for a low cell operating temperature, and eliminates many of the thermal problems inherent in high-concentration-ratio designs. Unlike other reflector technologies, the FACT concentrator modules function on both z-fold and rolled flexible solar array blankets, as well as rigid array systems. Mega-ROSA (Mega Roll-Out Solar Array) is a new, highly modularized and extremely scalable version of ROSA that provides immense power level range capability from 100 kW to several MW in size. Mega-ROSA will enable extremely high-power spacecraft and SEP-powered missions, including space-tug and largescale planetary science and lunar/asteroid exploration missions. Mega-ROSA's inherent broad power scalability is achieved while retaining ROSA s solar array performance metrics and missionenabling features for lightweight, compact stowage volume and affordability. This innovation will enable future ultra-high-power missions through lowcost (25 to 50% cost savings, depending on PV and blanket technology), lightweight, high specific power (greater than 200 to 400 Watts per kilogram BOL (beginning-of-life) at the wing level depending on PV and blanket technology), compact stowage volume (greater than 50 kilowatts per cubic meter for very large arrays), high reliability, platform simplicity (low failure modes), high deployed strength/stiffness when scaled to huge sizes, and high-voltage operation capability. Mega-ROSA is adaptable to all photovoltaic and concentrator flexible blanket technologies, and can readily accommodate standard multijunction and emerging ultra-lightweight IMM (inverted metamorphic) photovoltaic flexible blanket assemblies, as well as ENTECHs Stretched Lens Array (SLA) and DSSs (Deployable Space Systems) FACT, which allows for cost reduction at the array level.
High dimensional biological data retrieval optimization with NoSQL technology.
Wang, Shicai; Pandis, Ioannis; Wu, Chao; He, Sijin; Johnson, David; Emam, Ibrahim; Guitton, Florian; Guo, Yike
2014-01-01
High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data.
High dimensional biological data retrieval optimization with NoSQL technology
2014-01-01
Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data. PMID:25435347
A vanadium-doped ZnO nanosheets-polymer composite for flexible piezoelectric nanogenerators
NASA Astrophysics Data System (ADS)
Shin, Sung-Ho; Kwon, Yang Hyeog; Lee, Min Hyung; Jung, Joo-Yun; Seol, Jae Hun; Nah, Junghyo
2016-01-01
We report high performance flexible piezoelectric nanogenerators (PENGs) by employing vanadium (V)-doped ZnO nanosheets (NSs) and the polydimethylsiloxane (PDMS) composite structure. The V-doped ZnO NSs were synthesized to overcome the inherently low piezoelectric properties of intrinsic ZnO. Ferroelectric phase transition induced in the V-doped ZnO NSs contributed to significantly improve the performance of the PENGs after the poling process. Consequently, the PENGs exhibited high output voltage and current up to ~32 V and ~6.2 μA, respectively, under the applied strain, which are sufficient to directly turn on a number of light emitting diodes (LEDs). The composite approach for PENG fabrication is scalable, robust, and reproducible during periodic bending/releasing over extended cycles. The approach introduced here extends the performance limits of ZnO-based PENGs and demonstrates their potential as energy harvesting devices.We report high performance flexible piezoelectric nanogenerators (PENGs) by employing vanadium (V)-doped ZnO nanosheets (NSs) and the polydimethylsiloxane (PDMS) composite structure. The V-doped ZnO NSs were synthesized to overcome the inherently low piezoelectric properties of intrinsic ZnO. Ferroelectric phase transition induced in the V-doped ZnO NSs contributed to significantly improve the performance of the PENGs after the poling process. Consequently, the PENGs exhibited high output voltage and current up to ~32 V and ~6.2 μA, respectively, under the applied strain, which are sufficient to directly turn on a number of light emitting diodes (LEDs). The composite approach for PENG fabrication is scalable, robust, and reproducible during periodic bending/releasing over extended cycles. The approach introduced here extends the performance limits of ZnO-based PENGs and demonstrates their potential as energy harvesting devices. Electronic supplementary information (ESI) available. See DOI: 10.1039/c5nr07185b
Towards Scalable Strain Gauge-Based Joint Torque Sensors
D’Imperio, Mariapaola; Cannella, Ferdinando; Caldwell, Darwin G.; Cuschieri, Alfred
2017-01-01
During recent decades, strain gauge-based joint torque sensors have been commonly used to provide high-fidelity torque measurements in robotics. Although measurement of joint torque/force is often required in engineering research and development, the gluing and wiring of strain gauges used as torque sensors pose difficulties during integration within the restricted space available in small joints. The problem is compounded by the need for a scalable geometric design to measure joint torque. In this communication, we describe a novel design of a strain gauge-based mono-axial torque sensor referred to as square-cut torque sensor (SCTS), the significant features of which are high degree of linearity, symmetry, and high scalability in terms of both size and measuring range. Most importantly, SCTS provides easy access for gluing and wiring of the strain gauges on sensor surface despite the limited available space. We demonstrated that the SCTS was better in terms of symmetry (clockwise and counterclockwise rotation) and more linear. These capabilities have been shown through finite element modeling (ANSYS) confirmed by observed data obtained by load testing experiments. The high performance of SCTS was confirmed by studies involving changes in size, material and/or wings width and thickness. Finally, we demonstrated that the SCTS can be successfully implementation inside the hip joints of miniaturized hydraulically actuated quadruped robot-MiniHyQ. This communication is based on work presented at the 18th International Conference on Climbing and Walking Robots (CLAWAR). PMID:28820446
Towards Scalable Strain Gauge-Based Joint Torque Sensors.
Khan, Hamza; D'Imperio, Mariapaola; Cannella, Ferdinando; Caldwell, Darwin G; Cuschieri, Alfred; Semini, Claudio
2017-08-18
During recent decades, strain gauge-based joint torque sensors have been commonly used to provide high-fidelity torque measurements in robotics. Although measurement of joint torque/force is often required in engineering research and development, the gluing and wiring of strain gauges used as torque sensors pose difficulties during integration within the restricted space available in small joints. The problem is compounded by the need for a scalable geometric design to measure joint torque. In this communication, we describe a novel design of a strain gauge-based mono-axial torque sensor referred to as square-cut torque sensor (SCTS) , the significant features of which are high degree of linearity, symmetry, and high scalability in terms of both size and measuring range. Most importantly, SCTS provides easy access for gluing and wiring of the strain gauges on sensor surface despite the limited available space. We demonstrated that the SCTS was better in terms of symmetry (clockwise and counterclockwise rotation) and more linear. These capabilities have been shown through finite element modeling (ANSYS) confirmed by observed data obtained by load testing experiments. The high performance of SCTS was confirmed by studies involving changes in size, material and/or wings width and thickness. Finally, we demonstrated that the SCTS can be successfully implementation inside the hip joints of miniaturized hydraulically actuated quadruped robot- MiniHyQ . This communication is based on work presented at the 18th International Conference on Climbing and Walking Robots (CLAWAR).
Scalable Unix commands for parallel processors : a high-performance implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ong, E.; Lusk, E.; Gropp, W.
2001-06-22
We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Implementation of NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry
2000-01-01
A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.
Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele
2015-01-01
Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh; ...
2013-01-01
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost in the event of failure. We alleviate this problem through a scalable checkpoint-restart system, mcrEngine. McrEngine aggregates checkpoints from multiple application processes with knowledge of the data semantics available through widely-used I/O libraries, e.g., HDF5 and netCDF, and compresses them. Our novel scheme improves compressibility ofmore » checkpoints up to 115% over simple concatenation and compression. Our evaluation with large-scale application checkpoints show that mcrEngine reduces checkpointing overhead by up to 87% and restart overhead by up to 62% over a baseline with no aggregation or compression.« less
High-performance graphene-based supercapacitors made by a scalable blade-coating approach
NASA Astrophysics Data System (ADS)
Wang, Bin; Liu, Jinzhang; Mirri, Francesca; Pasquali, Matteo; Motta, Nunzio; Holmes, John W.
2016-04-01
Graphene oxide (GO) sheets can form liquid crystals (LCs) in their aqueous dispersions that are more viscous with a stronger LC feature. In this work we combine the viscous LC-GO solution with the blade-coating technique to make GO films, for constructing graphene-based supercapacitors in a scalable way. Reduced GO (rGO) films are prepared by wet chemical methods, using either hydrazine (HZ) or hydroiodic acid (HI). Solid-state supercapacitors with rGO films as electrodes and highly conductive carbon nanotube films as current collectors are fabricated and the capacitive properties of different rGO films are compared. It is found that the HZ-rGO film is superior to the HI-rGO film in achieving high capacitance, owing to the 3D structure of graphene sheets in the electrode. Compared to gelled electrolyte, the use of liquid electrolyte (H2SO4) can further increase the capacitance to 265 F per gram (corresponding to 52 mF per cm2) of the HZ-rGO film.
Zhang, Zailei; Wang, Yanhong; Ren, Wenfeng; Tan, Qiangqiang; Chen, Yunfa; Li, Hong; Zhong, Ziyi; Su, Fabing
2014-05-12
Despite the promising application of porous Si-based anodes in future Li ion batteries, the large-scale synthesis of these materials is still a great challenge. A scalable synthesis of porous Si materials is presented by the Rochow reaction, which is commonly used to produce organosilane monomers for synthesizing organosilane products in chemical industry. Commercial Si microparticles reacted with gas CH3 Cl over various Cu-based catalyst particles to substantially create macropores within the unreacted Si accompanying with carbon deposition to generate porous Si/C composites. Taking advantage of the interconnected porous structure and conductive carbon-coated layer after simple post treatment, these composites as anodes exhibit high reversible capacity and long cycle life. It is expected that by integrating the organosilane synthesis process and controlling reaction conditions, the manufacture of porous Si-based anodes on an industrial scale is highly possible. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Scalable Emergency Response System for Oceangoing Assets. Final Summary Report
2009-01-20
Developed for platform defense of submarines and surface ships at sea and in port, the iRobot High Speed UUV enables rapid transit with respect to a ... high level decontamination, resulting in a return to port Decision Making Option C is to sealift via decontamination vessel contingent, and perform a ...the world aboard a prepared LCAC.48 A high - speed , over- 47 One Marine’s View. Retrieved from http://www.onemarinesview.com/one_marines_view/2006/09
Azizi, Amin; Gadinski, Matthew R; Li, Qi; AlSaud, Mohammed Abu; Wang, Jianjun; Wang, Yi; Wang, Bo; Liu, Feihua; Chen, Long-Qing; Alem, Nasim; Wang, Qing
2017-09-01
Polymer dielectrics are the preferred materials of choice for power electronics and pulsed power applications. However, their relatively low operating temperatures significantly limit their uses in harsh-environment energy storage devices, e.g., automobile and aerospace power systems. Herein, hexagonal boron nitride (h-BN) films are prepared from chemical vapor deposition (CVD) and readily transferred onto polyetherimide (PEI) films. Greatly improved performance in terms of discharged energy density and charge-discharge efficiency is achieved in the PEI sandwiched with CVD-grown h-BN films at elevated temperatures when compared to neat PEI films and other high-temperature polymer and nanocomposite dielectrics. Notably, the h-BN-coated PEI films are capable of operating with >90% charge-discharge efficiencies and delivering high energy densities, i.e., 1.2 J cm -3 , even at a temperature close to the glass transition temperature of polymer (i.e., 217 °C) where pristine PEI almost fails. Outstanding cyclability and dielectric stability over a straight 55 000 charge-discharge cycles are demonstrated in the h-BN-coated PEI at high temperatures. The work demonstrates a general and scalable pathway to enable the high-temperature capacitive energy applications of a wide range of engineering polymers and also offers an efficient method for the synthesis and transfer of 2D nanomaterials at the scale demanded for applications. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Santhanagopalan, Sunand; Balram, Anirudh; Meng, Dennis Desheng
2013-03-26
It is commonly perceived that reduction-oxidation (redox) capacitors have to sacrifice power density to achieve higher energy density than carbon-based electric double layer capacitors. In this work, we report the synergetic advantages of combining the high crystallinity of hydrothermally synthesized α-MnO2 nanorods with alignment for high performance redox capacitors. Such an approach is enabled by high voltage electrophoretic deposition (HVEPD) technology which can obtain vertically aligned nanoforests with great process versatility. The scalable nanomanufacturing process is demonstrated by roll-printing an aligned forest of α-MnO2 nanorods on a large flexible substrate (1 inch by 1 foot). The electrodes show very high power density (340 kW/kg at an energy density of 4.7 Wh/kg) and excellent cyclability (over 92% capacitance retention over 2000 cycles). Pretreatment of the substrate and use of a conductive holding layer have also been shown to significantly reduce the contact resistance between the aligned nanoforests and the substrates. High areal specific capacitances of around 8500 μF/cm(2) have been obtained for each electrode with a two-electrode device configuration. Over 93% capacitance retention was observed when the cycling current densities were increased from 0.25 to 10 mA/cm(2), indicating high rate capabilities of the fabricated electrodes and resulting in the very high attainable power density. The high performance of the electrodes is attributed to the crystallographic structure, 1D morphology, aligned orientation, and low contact resistance.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
D'Aiuto, Leonardo; Zhi, Yun; Kumar Das, Dhanjit; Wilcox, Madeleine R; Johnson, Jon W; McClain, Lora; MacDonald, Matthew L; Di Maio, Roberto; Schurdak, Mark E; Piazza, Paolo; Viggiano, Luigi; Sweet, Robert; Kinchington, Paul R; Bhattacharjee, Ayantika G; Yolken, Robert; Nimgaonka, Vishwajit L; Nimgaonkar, Vishwajit L
2014-01-01
Induced pluripotent stem cell (iPSC)-based technologies offer an unprecedented opportunity to perform high-throughput screening of novel drugs for neurological and neurodegenerative diseases. Such screenings require a robust and scalable method for generating large numbers of mature, differentiated neuronal cells. Currently available methods based on differentiation of embryoid bodies (EBs) or directed differentiation of adherent culture systems are either expensive or are not scalable. We developed a protocol for large-scale generation of neuronal stem cells (NSCs)/early neural progenitor cells (eNPCs) and their differentiation into neurons. Our scalable protocol allows robust and cost-effective generation of NSCs/eNPCs from iPSCs. Following culture in neurobasal medium supplemented with B27 and BDNF, NSCs/eNPCs differentiate predominantly into vesicular glutamate transporter 1 (VGLUT1) positive neurons. Targeted mass spectrometry analysis demonstrates that iPSC-derived neurons express ligand-gated channels and other synaptic proteins and whole-cell patch-clamp experiments indicate that these channels are functional. The robust and cost-effective differentiation protocol described here for large-scale generation of NSCs/eNPCs and their differentiation into neurons paves the way for automated high-throughput screening of drugs for neurological and neurodegenerative diseases.
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
High-performance, scalable optical network-on-chip architectures
NASA Astrophysics Data System (ADS)
Tan, Xianfang
The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of GWOR in optical communication and BFT in non-uniform traffic communication and three-dimension (3D) implementation. 5. A cycle-accurate NoC simulator is developed to evaluate the performance of proposed HONoC architectures. It is a comprehensive platform that can simulate both electronic and optical NoCs. Different size HONoC architectures are evaluated in terms of throughput, latency and energy dissipation. Simulation results confirm that HONoC achieves good network performance with lower power consumption.
NASA Astrophysics Data System (ADS)
Burnett, W.
2016-12-01
The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.
Ogi, Jun; Kato, Yuri; Matoba, Yoshihisa; Yamane, Chigusa; Nagahata, Kazunori; Nakashima, Yusaku; Kishimoto, Takuya; Hashimoto, Shigeki; Maari, Koichi; Oike, Yusuke; Ezaki, Takayuki
2017-12-19
A 24-μm-pitch microelectrode array (MEA) with 6912 readout channels at 12 kHz and 23.2-μV rms random noise is presented. The aim is to reduce noise in a "highly scalable" MEA with a complementary metal-oxide-semiconductor integration circuit (CMOS-MEA), in which a large number of readout channels and a high electrode density can be expected. Despite the small dimension and the simplicity of the in-pixel circuit for the high electrode-density and the relatively large number of readout channels of the prototype CMOS-MEA chip developed in this work, the noise within the chip is successfully reduced to less than half that reported in a previous work, for a device with similar in-pixel circuit simplicity and a large number of readout channels. Further, the action potential was clearly observed on cardiomyocytes using the CMOS-MEA. These results indicate the high-scalability of the CMOS-MEA. The highly scalable CMOS-MEA provides high-spatial-resolution mapping of cell action potentials, and the mapping can aid understanding of complex activities in cells, including neuron network activities.
Field of genes: using Apache Kafka as a bioinformatic data repository.
Lawlor, Brendan; Lynch, Richard; Mac Aogáin, Micheál; Walsh, Paul
2018-04-01
Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI's RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data.
Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju
2015-01-01
Indoor location-based services (iLBS) are extremely dynamic and changeable, and include numerous resources and mobile devices. In particular, the network infrastructure requires support for high scalability in the indoor environment, and various resource lookups are requested concurrently and frequently from several locations based on the dynamic network environment. A traditional map-based centralized approach for iLBSs has several disadvantages: it requires global knowledge to maintain a complete geographic indoor map; the central server is a single point of failure; it can also cause low scalability and traffic congestion; and it is hard to adapt to a change of service area in real time. This paper proposes a self-organizing and fully distributed platform for iLBSs. The proposed self-organizing distributed platform provides a dynamic reconfiguration of locality accuracy and service coverage by expanding and contracting dynamically. In order to verify the suggested platform, scalability performance according to the number of inserted or deleted nodes composing the dynamic infrastructure was evaluated through a simulation similar to the real environment. PMID:26016908
Architecture Knowledge for Evaluating Scalable Databases
2015-01-16
problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly
Joint source-channel coding for motion-compensated DCT-based SNR scalable video.
Kondi, Lisimachos P; Ishtiaq, Faisal; Katsaggelos, Aggelos K
2002-01-01
In this paper, we develop an approach toward joint source-channel coding for motion-compensated DCT-based scalable video coding and transmission. A framework for the optimal selection of the source and channel coding rates over all scalable layers is presented such that the overall distortion is minimized. The algorithm utilizes universal rate distortion characteristics which are obtained experimentally and show the sensitivity of the source encoder and decoder to channel errors. The proposed algorithm allocates the available bit rate between scalable layers and, within each layer, between source and channel coding. We present the results of this rate allocation algorithm for video transmission over a wireless channel using the H.263 Version 2 signal-to-noise ratio (SNR) scalable codec for source coding and rate-compatible punctured convolutional (RCPC) codes for channel coding. We discuss the performance of the algorithm with respect to the channel conditions, coding methodologies, layer rates, and number of layers.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-08-14
... best demonstrate that they have the managerial and operational capacity, including significant and demonstrable scalability in their management, finances, systems, and infrastructure, to assume the...--Scalability in operations and management to perform timely, accurate, and comprehensive lender claims review...
NASA Astrophysics Data System (ADS)
Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.
2017-12-01
StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an uninterrupted pipeline from toy/teaching codes to high-performance, extreme-scale solves. StagBLDemo replicates the functionality of an advanced MATLAB-style regional geodynamics code, thus providing users with a concrete procedure to exceed the performance and scalability limitations of smaller-scale tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, J; Coss, D; McMurry, J
Purpose: To evaluate the efficiency of multithreaded Geant4 (Geant4-MT, version 10.0) for proton Monte Carlo dose calculations using a high performance computing facility. Methods: Geant4-MT was used to calculate 3D dose distributions in 1×1×1 mm3 voxels in a water phantom and patient's head with a 150 MeV proton beam covering approximately 5×5 cm2 in the water phantom. Three timestamps were measured on the fly to separately analyze the required time for initialization (which cannot be parallelized), processing time of individual threads, and completion time. Scalability of averaged processing time per thread was calculated as a function of thread number (1,more » 100, 150, and 200) for both 1M and 50 M histories. The total memory usage was recorded. Results: Simulations with 50 M histories were fastest with 100 threads, taking approximately 1.3 hours and 6 hours for the water phantom and the CT data, respectively with better than 1.0 % statistical uncertainty. The calculations show 1/N scalability in the event loops for both cases. The gains from parallel calculations started to decrease with 150 threads. The memory usage increases linearly with number of threads. No critical failures were observed during the simulations. Conclusion: Multithreading in Geant4-MT decreased simulation time in proton dose distribution calculations by a factor of 64 and 54 at a near optimal 100 threads for water phantom and patient's data respectively. Further simulations will be done to determine the efficiency at the optimal thread number. Considering the trend of computer architecture development, utilizing Geant4-MT for radiotherapy simulations is an excellent cost-effective alternative for a distributed batch queuing system. However, because the scalability depends highly on simulation details, i.e., the ratio of the processing time of one event versus waiting time to access for the shared event queue, a performance evaluation as described is recommended.« less
Reducing Communication in Algebraic Multigrid Using Additive Variants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vassilevski, Panayot S.; Yang, Ulrike Meier
Algebraic multigrid (AMG) has proven to be an effective scalable solver on many high performance computers. However, its increasing communication complexity on coarser levels has shown to seriously impact its performance on computers with high communication cost. Moreover, additive AMG variants provide not only increased parallelism as well as decreased numbers of messages per cycle but also generally exhibit slower convergence. Here we present various new additive variants with convergence rates that are significantly improved compared to the classical additive algebraic multigrid method and investigate their potential for decreased communication, and improved communication-computation overlap, features that are essential for goodmore » performance on future exascale architectures.« less
Reducing Communication in Algebraic Multigrid Using Additive Variants
Vassilevski, Panayot S.; Yang, Ulrike Meier
2014-02-12
Algebraic multigrid (AMG) has proven to be an effective scalable solver on many high performance computers. However, its increasing communication complexity on coarser levels has shown to seriously impact its performance on computers with high communication cost. Moreover, additive AMG variants provide not only increased parallelism as well as decreased numbers of messages per cycle but also generally exhibit slower convergence. Here we present various new additive variants with convergence rates that are significantly improved compared to the classical additive algebraic multigrid method and investigate their potential for decreased communication, and improved communication-computation overlap, features that are essential for goodmore » performance on future exascale architectures.« less
Building and managing high performance, scalable, commodity mass storage systems
NASA Technical Reports Server (NTRS)
Lekashman, John
1998-01-01
The NAS Systems Division has recently embarked on a significant new way of handling the mass storage problem. One of the basic goals of this new development are to build systems at very large capacity and high performance, yet have the advantages of commodity products. The central design philosophy is to build storage systems the way the Internet was built. Competitive, survivable, expandable, and wide open. The thrust of this paper is to describe the motivation for this effort, what we mean by commodity mass storage, what the implications are for a facility that performs such an action, and where we think it will lead.
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jian; Hamidouche, Khaled; Zheng, Jie
2015-08-05
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
Scalability improvements to NRLMOL for DFT calculations of large molecules
NASA Astrophysics Data System (ADS)
Diaz, Carlos Manuel
Advances in high performance computing (HPC) have provided a way to treat large, computationally demanding tasks using thousands of processors. With the development of more powerful HPC architectures, the need to create efficient and scalable code has grown more important. Electronic structure calculations are valuable in understanding experimental observations and are routinely used for new materials predictions. For the electronic structure calculations, the memory and computation time are proportional to the number of atoms. Memory requirements for these calculations scale as N2, where N is the number of atoms. While the recent advances in HPC offer platforms with large numbers of cores, the limited amount of memory available on a given node and poor scalability of the electronic structure code hinder their efficient usage of these platforms. This thesis will present some developments to overcome these bottlenecks in order to study large systems. These developments, which are implemented in the NRLMOL electronic structure code, involve the use of sparse matrix storage formats and the use of linear algebra using sparse and distributed matrices. These developments along with other related development now allow ground state density functional calculations using up to 25,000 basis functions and the excited state calculations using up to 17,000 basis functions while utilizing all cores on a node. An example on a light-harvesting triad molecule is described. Finally, future plans to further improve the scalability will be presented.
AHPCRC (Army High Performance Computing Research Center) Bulletin. Volume 2, Issue 1
2010-01-01
Researchers in AHPCRC Technical Area 4 focus on improving processes for developing scalable, accurate parallel programs that are easily ported from one...control number. 1. REPORT DATE 2011 2. REPORT TYPE 3. DATES COVERED 00-00-2011 to 00-00-2011 4 . TITLE AND SUBTITLE AHPCRC (Army High...continued on page 4 Virtual levels in Sequoia represent an abstract memory hierarchy without specifying data transfer mechanisms, giving the
2014-01-01
Taniguchi Advanced Materials Laboratory National Institute for Materials Science 1–1 Namiki, Tsukuba , 305–0044 , Japan Prof. J. Hone Department...of Mechanical Engineering Columbia University New York , NY , 10027 , USA DOI : 10.1002/adma.201304973 The growth of high-quality organic...vdW heterostructures, combined with recent progress on large-area growth of layered materials , [ 6,7 ] provides new opportunities for the scalable
Massively parallel first-principles simulation of electron dynamics in materials
Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.; ...
2017-08-01
Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Thermal engineering of FAPbI3 perovskite material via radiative thermal annealing and in situ XRD
Pool, Vanessa L.; Dou, Benjia; Van Campen, Douglas G.; Klein-Stockert, Talysa R.; Barnes, Frank S.; Shaheen, Sean E.; Ahmad, Md I.; van Hest, Maikel F. A. M.; Toney, Michael F.
2017-01-01
Lead halide perovskites have emerged as successful optoelectronic materials with high photovoltaic power conversion efficiencies and low material cost. However, substantial challenges remain in the scalability, stability and fundamental understanding of the materials. Here we present the application of radiative thermal annealing, an easily scalable processing method for synthesizing formamidinium lead iodide (FAPbI3) perovskite solar absorbers. Devices fabricated from films formed via radiative thermal annealing have equivalent efficiencies to those annealed using a conventional hotplate. By coupling results from in situ X-ray diffraction using a radiative thermal annealing system with device performances, we mapped the processing phase space of FAPbI3 and corresponding device efficiencies. Our map of processing-structure-performance space suggests the commonly used FAPbI3 annealing time, 10 min at 170 °C, can be significantly reduced to 40 s at 170 °C without affecting the photovoltaic performance. The Johnson-Mehl-Avrami model was used to determine the activation energy for decomposition of FAPbI3 into PbI2. PMID:28094249
Massively parallel first-principles simulation of electron dynamics in materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.
Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Thermal engineering of FAPbI 3 perovskite material via radiative thermal annealing and in situ XRD
Pool, Vanessa L.; Dou, Benjia; Van Campen, Douglas G.; ...
2017-01-17
Lead halide perovskites have emerged as successful optoelectronic materials with high photovoltaic power conversion efficiencies and low material cost. However, substantial challenges remain in the scalability, stability and fundamental understanding of the materials. Here we present the application of radiative thermal annealing, an easily scalable processing method for synthesizing formamidinium lead iodide (FAPbI 3) perovskite solar absorbers. Devices fabricated from films formed via radiative thermal annealing have equivalent efficiencies to those annealed using a conventional hotplate. By coupling results from in situ X-ray diffraction using a radiative thermal annealing system with device performances, we mapped the processing phase space ofmore » FAPbI 3 and corresponding device efficiencies. Our map of processing-structure-performance space suggests the commonly used FAPbI 3 annealing time, 10 min at 170 degrees C, can be significantly reduced to 40 s at 170 degrees C without affecting the photovoltaic performance. Lastly, the Johnson-Mehl-Avrami model was used to determine the activation energy for decomposition of FAPbI 3 into PbI 2.« less
Air-stable ink for scalable, high-throughput layer deposition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weil, Benjamin D; Connor, Stephen T; Cui, Yi
A method for producing and depositing air-stable, easily decomposable, vulcanized ink on any of a wide range of substrates is disclosed. The ink enables high-volume production of optoelectronic and/or electronic devices using scalable production methods, such as roll-to-roll transfer, fast rolling processes, and the like.
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
NASA Astrophysics Data System (ADS)
Jing, Changfeng; Liang, Song; Ruan, Yong; Huang, Jie
2008-10-01
During the urbanization process, when facing complex requirements of city development, ever-growing urban data, rapid development of planning business and increasing planning complexity, a scalable, extensible urban planning management information system is needed urgently. PM2006 is such a system that can deal with these problems. In response to the status and problems in urban planning, the scalability and extensibility of PM2006 are introduced which can be seen as business-oriented workflow extensibility, scalability of DLL-based architecture, flexibility on platforms of GIS and database, scalability of data updating and maintenance and so on. It is verified that PM2006 system has good extensibility and scalability which can meet the requirements of all levels of administrative divisions and can adapt to ever-growing changes in urban planning business. At the end of this paper, the application of PM2006 in Urban Planning Bureau of Suzhou city is described.
paraGSEA: a scalable approach for large-scale gene expression profiling
Peng, Shaoliang; Yang, Shunyun
2017-01-01
Abstract More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. PMID:28973463
Marathe, Aniruddha P.; Harris, Rachel A.; Lowenthal, David K.; ...
2015-12-17
The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides fixed-cost and variable-cost, auction-based options. The auction market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost bymore » exploiting redundancy in the EC2 auction market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to seven times cheaper than using the on-demand market and up to 44 percent cheaper than the best non-redundant, auction-market algorithm. We extend our adaptive algorithm to incorporate application scalability characteristics for further cost savings. In conclusion, we show that the adaptive algorithm informed with scalability characteristics of applications achieves up to 56 percent cost savings compared to the expected cost for the base adaptive algorithm run at a fixed, user-defined scale.« less
NASA Technical Reports Server (NTRS)
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dong, Yu Cheng; Center of Super-Diamond and Advanced Films; Ma, Ru Guang
2013-05-01
We report a scalable strategy to synthesize Fe₃O₄/graphene nanocomposites as a high-performance anode material for lithium ion batteries. In this study, ferric citrate is used as precursor to prepare Fe₃O₄ nanoparticles without introducing additional reducing agent; furthermore and show that such Fe₃O₄ nanoparticles can be anchored on graphene sheets which attributed to multifunctional group effect of citrate. Electrochemical characterization of the Fe₃O₄/graphene nanocomposites exhibit large reversible capacity (~1347 mA h g⁻¹ at a current density of 0.2 C up to 100 cycles, and subsequent capacity of ~619 mA h g⁻¹ at a current density of 2 C up to 200more » cycles), as well as high coulombic efficiency (~97%), excellent rate capability, and good cyclic stability. High resolution transmission electron microscopy confirms that Fe₃O₄ nanoparticles, with a size of ~4–16 nm are densely anchored on thin graphene sheets, resulting in large synergetic effects between Fe₃O₄ nanoparticles and graphene sheets with high electrochemical performance. - Graphical abstract: The reduction of Fe³⁺ to Fe²⁺ and the deposition of Fe₃O₄ on graphene sheets occur simultaneously using citrate function as reductant and anchor agent in this reaction process. Highlights: • Fe₃O₄/graphene composites are synthesized directly from graphene and C₆H₅FeO₇. • The citrate function as reductant and anchor agent in this reaction process. • The resulting Fe₃O₄ particles (~4–16 nm) are densely anchored on graphene sheets. • The prepared Fe₃O₄/graphene composites exhibit excellent electrochemical performance.« less
Phonon-based scalable platform for chip-scale quantum computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reinke, Charles M.; El-Kady, Ihab
Here, we present a scalable phonon-based quantum computer on a phononic crystal platform. Practical schemes involve selective placement of a single acceptor atom in the peak of the strain field in a high-Q phononic crystal cavity that enables coupling of the phonon modes to the energy levels of the atom. We show theoretical optimization of the cavity design and coupling waveguide, along with estimated performance figures of the coupled system. A qubit can be created by entangling a phonon at the resonance frequency of the cavity with the atom states. Qubits based on this half-sound, half-matter quasi-particle, called a phoniton,more » may outcompete other quantum architectures in terms of combined emission rate, coherence lifetime, and fabrication demands.« less
Phonon-based scalable platform for chip-scale quantum computing
Reinke, Charles M.; El-Kady, Ihab
2016-12-19
Here, we present a scalable phonon-based quantum computer on a phononic crystal platform. Practical schemes involve selective placement of a single acceptor atom in the peak of the strain field in a high-Q phononic crystal cavity that enables coupling of the phonon modes to the energy levels of the atom. We show theoretical optimization of the cavity design and coupling waveguide, along with estimated performance figures of the coupled system. A qubit can be created by entangling a phonon at the resonance frequency of the cavity with the atom states. Qubits based on this half-sound, half-matter quasi-particle, called a phoniton,more » may outcompete other quantum architectures in terms of combined emission rate, coherence lifetime, and fabrication demands.« less
NASA Astrophysics Data System (ADS)
Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.
2015-12-01
The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.
Fractionation of sheep cheese whey by a scalable method to sequentially isolate bioactive proteins.
Pilbrow, Jodi; Bekhit, Alaa El-Din A; Carne, Alan
2016-07-15
This study reports a procedure for the simultaneous purification of glyco(caseino)macropeptide, immunoglobulin, lactoperoxidase, lactoferrin, α-lactalbumin and β-lactoglobulin from sheep cheese sweet whey, an under-utilized by-product of cheese manufacture generated by an emerging sheep dairy industry in New Zealand. These proteins have recognized value in the nutrition, biomedical and health-promoting supplements industries. A sequential fractionation procedure using economical anion and cation exchange chromatography on HiTrap resins was evaluated. The whey protein fractionation is performed under mild conditions, requires only the adjustment of pH between ion exchange chromatography steps, does not require buffer exchange and uses minimal amounts of chemicals. The purity of the whey protein fractions generated were analyzed by reversed phase-high performance liquid chromatography and the identity of the proteins was confirmed by mass spectrometry. This scalable procedure demonstrates that several proteins of recognized value can be fractionated in reasonable yield and purity from sheep cheese whey in one streamlined process. Copyright © 2016 Elsevier Ltd. All rights reserved.
A lightweight network anomaly detection technique
Kim, Jinoh; Yoo, Wucherl; Sim, Alex; ...
2017-03-13
While the network anomaly detection is essential in network operations and management, it becomes further challenging to perform the first line of detection against the exponentially increasing volume of network traffic. In this paper, we develop a technique for the first line of online anomaly detection with two important considerations: (i) availability of traffic attributes during the monitoring time, and (ii) computational scalability for streaming data. The presented learning technique is lightweight and highly scalable with the beauty of approximation based on the grid partitioning of the given dimensional space. With the public traffic traces of KDD Cup 1999 andmore » NSL-KDD, we show that our technique yields 98.5% and 83% of detection accuracy, respectively, only with a couple of readily available traffic attributes that can be obtained without the help of post-processing. Finally, the results are at least comparable with the classical learning methods including decision tree and random forest, with approximately two orders of magnitude faster learning performance.« less
A lightweight network anomaly detection technique
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jinoh; Yoo, Wucherl; Sim, Alex
While the network anomaly detection is essential in network operations and management, it becomes further challenging to perform the first line of detection against the exponentially increasing volume of network traffic. In this paper, we develop a technique for the first line of online anomaly detection with two important considerations: (i) availability of traffic attributes during the monitoring time, and (ii) computational scalability for streaming data. The presented learning technique is lightweight and highly scalable with the beauty of approximation based on the grid partitioning of the given dimensional space. With the public traffic traces of KDD Cup 1999 andmore » NSL-KDD, we show that our technique yields 98.5% and 83% of detection accuracy, respectively, only with a couple of readily available traffic attributes that can be obtained without the help of post-processing. Finally, the results are at least comparable with the classical learning methods including decision tree and random forest, with approximately two orders of magnitude faster learning performance.« less
Jiang, Guoqian; Solbrig, Harold R; Chute, Christopher G
2011-01-01
A source of semantically coded Adverse Drug Event (ADE) data can be useful for identifying common phenotypes related to ADEs. We proposed a comprehensive framework for building a standardized ADE knowledge base (called ADEpedia) through combining ontology-based approach with semantic web technology. The framework comprises four primary modules: 1) an XML2RDF transformation module; 2) a data normalization module based on NCBO Open Biomedical Annotator; 3) a RDF store based persistence module; and 4) a front-end module based on a Semantic Wiki for the review and curation. A prototype is successfully implemented to demonstrate the capability of the system to integrate multiple drug data and ontology resources and open web services for the ADE data standardization. A preliminary evaluation is performed to demonstrate the usefulness of the system, including the performance of the NCBO annotator. In conclusion, the semantic web technology provides a highly scalable framework for ADE data source integration and standard query service.
High-Performance Monitoring Architecture for Large-Scale Distributed Systems Using Event Filtering
NASA Technical Reports Server (NTRS)
Maly, K.
1998-01-01
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. We propose a scalable high-performance monitoring architecture for LSD systems to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding end- points management applications such as debugging and reactive control tools to improve the application performance and reliability. A large volume of events may be generated due to the extensive demands of the monitoring applications and the high interaction of LSD systems. The monitoring architecture employs a high-performance event filtering mechanism to efficiently process the large volume of event traffic generated by LSD systems and minimize the intrusiveness of the monitoring process by reducing the event traffic flow in the system and distributing the monitoring computation. Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its Instrumentation and subscription components. As a case study, we show how our monitoring architecture can be utilized to improve the reliability and the performance of the Interactive Remote Instruction (IRI) system which is a large-scale distributed system for collaborative distance learning. The filtering mechanism represents an Intrinsic component integrated with the monitoring architecture to reduce the volume of event traffic flow in the system, and thereby reduce the intrusiveness of the monitoring process. We are developing an event filtering architecture to efficiently process the large volume of event traffic generated by LSD systems (such as distributed interactive applications). This filtering architecture is used to monitor collaborative distance learning application for obtaining debugging and feedback information. Our architecture supports the dynamic (re)configuration and optimization of event filters in large-scale distributed systems. Our work represents a major contribution by (1) survey and evaluating existing event filtering mechanisms In supporting monitoring LSD systems and (2) devising an integrated scalable high- performance architecture of event filtering that spans several kev application domains, presenting techniques to improve the functionality, performance and scalability. This paper describes the primary characteristics and challenges of developing high-performance event filtering for monitoring LSD systems. We survey existing event filtering mechanisms and explain key characteristics for each technique. In addition, we discuss limitations with existing event filtering mechanisms and outline how our architecture will improve key aspects of event filtering.
High-performance and scalable metal-chalcogenide semiconductors and devices via chalco-gel routes
Jo, Jeong-Wan; Kim, Hee-Joong; Kwon, Hyuck-In; Kim, Jaekyun; Ahn, Sangdoo; Kim, Yong-Hoon; Lee, Hyung-ik
2018-01-01
We report a general strategy for obtaining high-quality, large-area metal-chalcogenide semiconductor films from precursors combining chelated metal salts with chalcoureas or chalcoamides. Using conventional organic solvents, such precursors enable the expeditious formation of chalco-gels, which are easily transformed into the corresponding high-performance metal-chalcogenide thin films with large, uniform areas. Diverse metal chalcogenides and their alloys (MQx: M = Zn, Cd, In, Sb, Pb; Q = S, Se, Te) are successfully synthesized at relatively low processing temperatures (<400°C). The versatility of this scalable route is demonstrated by the fabrication of large-area thin-film transistors (TFTs), optoelectronic devices, and integrated circuits on a 4-inch Si wafer and 2.5-inch borosilicate glass substrates in ambient air using CdS, CdSe, and In2Se3 active layers. The CdSe TFTs exhibit a maximum field-effect mobility greater than 300 cm2 V−1 s−1 with an on/off current ratio of >107 and good operational stability (threshold voltage shift < 0.5 V at a positive gate bias stress of 10 ks). In addition, metal chalcogenide–based phototransistors with a photodetectivity of >1013 Jones and seven-stage ring oscillators operating at a speed of ~2.6 MHz (propagation delay of < 27 ns per stage) are demonstrated. PMID:29662951
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
HPC in a HEP lab: lessons learned from setting up cost-effective HPC clusters
NASA Astrophysics Data System (ADS)
Husejko, Michal; Agtzidis, Ioannis; Baehler, Pierre; Dul, Tadeusz; Evans, John; Himyr, Nils; Meinhard, Helge
2015-12-01
In this paper we present our findings gathered during the evaluation and testing of Windows Server High-Performance Computing (Windows HPC) in view of potentially using it as a production HPC system for engineering applications. The Windows HPC package, an extension of Microsofts Windows Server product, provides all essential interfaces, utilities and management functionality for creating, operating and monitoring a Windows-based HPC cluster infrastructure. The evaluation and test phase was focused on verifying the functionalities of Windows HPC, its performance, support of commercial tools and the integration with the users work environment. We describe constraints imposed by the way the CERN Data Centre is operated, licensing for engineering tools and scalability and behaviour of the HPC engineering applications used at CERN. We will present an initial set of requirements, which were created based on the above constraints and requests from the CERN engineering user community. We will explain how we have configured Windows HPC clusters to provide job scheduling functionalities required to support the CERN engineering user community, quality of service, user- and project-based priorities, and fair access to limited resources. Finally, we will present several performance tests we carried out to verify Windows HPC performance and scalability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Hyun-Kyung; Bak, Seong-Min; Lee, Suk Woo
Graphene nanomeshes (GNMs) with nanoscale periodic or quasi-periodic nanoholes have attracted considerable interest because of unique features such as their open energy band gap, enlarged specific surface area, and high optical transmittance. These features are useful for applications in semiconducting devices, photocatalysis, sensors, and energy-related systems. We report on the facile and scalable preparation of multifunctional micron-scale GNMs with high-density of nanoperforations by catalytic carbon gasification. The catalytic carbon gasification process induces selective decomposition on the graphene adjacent to the metal catalyst, thus forming nanoperforations. Furthermore, the pore size, pore density distribution, and neck size of the GNMs can bemore » controlled by adjusting the size and fraction of the metal oxide on graphene. The fabricated GNM electrodes exhibit superior electrochemical properties for supercapacitor (ultracapacitor) applications, including exceptionally high capacitance (253 F g -1 at 1 A g -1) and high rate capability (212 F g -1 at 100 A g -1) with excellent cycle stability (91% of the initial capacitance after 50 000 charge/discharge cycles). Moreover, the edge-enriched structure of GNMs plays an important role in achieving edge-selected and high-level nitrogen doping.« less
Kim, Hyun-Kyung; Bak, Seong-Min; Lee, Suk Woo; ...
2016-01-27
Graphene nanomeshes (GNMs) with nanoscale periodic or quasi-periodic nanoholes have attracted considerable interest because of unique features such as their open energy band gap, enlarged specific surface area, and high optical transmittance. These features are useful for applications in semiconducting devices, photocatalysis, sensors, and energy-related systems. We report on the facile and scalable preparation of multifunctional micron-scale GNMs with high-density of nanoperforations by catalytic carbon gasification. The catalytic carbon gasification process induces selective decomposition on the graphene adjacent to the metal catalyst, thus forming nanoperforations. Furthermore, the pore size, pore density distribution, and neck size of the GNMs can bemore » controlled by adjusting the size and fraction of the metal oxide on graphene. The fabricated GNM electrodes exhibit superior electrochemical properties for supercapacitor (ultracapacitor) applications, including exceptionally high capacitance (253 F g -1 at 1 A g -1) and high rate capability (212 F g -1 at 100 A g -1) with excellent cycle stability (91% of the initial capacitance after 50 000 charge/discharge cycles). Moreover, the edge-enriched structure of GNMs plays an important role in achieving edge-selected and high-level nitrogen doping.« less
Hao, Xiaoming; Zhu, Jian; Jiang, Xiong; Wu, Haitao; Qiao, Jinshuo; Sun, Wang; Wang, Zhenhua; Sun, Kening
2016-05-11
Polymeric nanomaterials emerge as key building blocks for engineering materials in a variety of applications. In particular, the high modulus polymeric nanofibers are suitable to prepare flexible yet strong membrane separators to prevent the growth and penetration of lithium dendrites for safe and reliable high energy lithium metal-based batteries. High ionic conductance, scalability, and low cost are other required attributes of the separator important for practical implementations. Available materials so far are difficult to comply with such stringent criteria. Here, we demonstrate a high-yield exfoliation of ultrastrong poly(p-phenylene benzobisoxazole) nanofibers from the Zylon microfibers. A highly scalable blade casting process is used to assemble these nanofibers into nanoporous membranes. These membranes possess ultimate strengths of 525 MPa, Young's moduli of 20 GPa, thermal stability up to 600 °C, and impressively low ionic resistance, enabling their use as dendrite-suppressing membrane separators in electrochemical cells. With such high-performance separators, reliable lithium-metal based batteries operated at 150 °C are also demonstrated. Those polyoxyzole nanofibers would enrich the existing library of strong nanomaterials and serve as a promising material for large-scale and cost-effective safe energy storage.
Scalable fabrication of self-aligned graphene transistors and circuits on glass.
Liao, Lei; Bai, Jingwei; Cheng, Rui; Zhou, Hailong; Liu, Lixin; Liu, Yuan; Huang, Yu; Duan, Xiangfeng
2012-06-13
Graphene transistors are of considerable interest for radio frequency (rf) applications. High-frequency graphene transistors with the intrinsic cutoff frequency up to 300 GHz have been demonstrated. However, the graphene transistors reported to date only exhibit a limited extrinsic cutoff frequency up to about 10 GHz, and functional graphene circuits demonstrated so far can merely operate in the tens of megahertz regime, far from the potential the graphene transistors could offer. Here we report a scalable approach to fabricate self-aligned graphene transistors with the extrinsic cutoff frequency exceeding 50 GHz and graphene circuits that can operate in the 1-10 GHz regime. The devices are fabricated on a glass substrate through a self-aligned process by using chemical vapor deposition (CVD) grown graphene and a dielectrophoretic assembled nanowire gate array. The self-aligned process allows the achievement of unprecedented performance in CVD graphene transistors with a highest transconductance of 0.36 mS/μm. The use of an insulating substrate minimizes the parasitic capacitance and has therefore enabled graphene transistors with a record-high extrinsic cutoff frequency (> 50 GHz) achieved to date. The excellent extrinsic cutoff frequency readily allows configuring the graphene transistors into frequency doubling or mixing circuits functioning in the 1-10 GHz regime, a significant advancement over previous reports (∼20 MHz). The studies open a pathway to scalable fabrication of high-speed graphene transistors and functional circuits and represent a significant step forward to graphene based radio frequency devices.
Strong, light, multifunctional fibers of carbon nanotubes with ultrahigh conductivity.
Behabtu, Natnael; Young, Colin C; Tsentalovich, Dmitri E; Kleinerman, Olga; Wang, Xuan; Ma, Anson W K; Bengio, E Amram; ter Waarbeek, Ron F; de Jong, Jorrit J; Hoogerwerf, Ron E; Fairchild, Steven B; Ferguson, John B; Maruyama, Benji; Kono, Junichiro; Talmon, Yeshayahu; Cohen, Yachin; Otto, Marcin J; Pasquali, Matteo
2013-01-11
Broader applications of carbon nanotubes to real-world problems have largely gone unfulfilled because of difficult material synthesis and laborious processing. We report high-performance multifunctional carbon nanotube (CNT) fibers that combine the specific strength, stiffness, and thermal conductivity of carbon fibers with the specific electrical conductivity of metals. These fibers consist of bulk-grown CNTs and are produced by high-throughput wet spinning, the same process used to produce high-performance industrial fibers. These scalable CNT fibers are positioned for high-value applications, such as aerospace electronics and field emission, and can evolve into engineered materials with broad long-term impact, from consumer electronics to long-range power transmission.
Final Report for DOE Award ER25756
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kesselman, Carl
2014-11-17
The SciDAC-funded Center for Enabling Distributed Petascale Science (CEDPS) was established to address technical challenges that arise due to the frequent geographic distribution of data producers (in particular, supercomputers and scientific instruments) and data consumers (people and computers) within the DOE laboratory system. Its goal is to produce technical innovations that meet DOE end-user needs for (a) rapid and dependable placement of large quantities of data within a distributed high-performance environment, and (b) the convenient construction of scalable science services that provide for the reliable and high-performance processing of computation and data analysis requests from many remote clients. The Centermore » is also addressing (c) the important problem of troubleshooting these and other related ultra-high-performance distributed activities from the perspective of both performance and functionality« less
Power-Scalable Blue-Green Bessel Beams
2016-02-23
19b. TELEPHONE NUMBER (Include area code) 02/23/2016 Final Technical JAN 2011 - DEC 2013 Power-Scalable Blue -Green Bessel Beams Siddharth Ramachandran...fiber lasers, non-traditional emission wavelengths, high-power blue -green tunabel lasers U U U SAR 11 Siddharth Ramachandran 617-353-9811 1 Power...Scalable Blue -Green Bessel Beams Siddharth Ramachandran Photonics Center, Boston University, 8 Saint Mary’s Street, Boston, MA 02215 phone: (617) 353
Wang, Sibo; Ren, Zheng; Guo, Yanbing; ...
2016-03-21
We report the scalable three-dimensional (3-D) integration of functional nanostructures into applicable platforms represents a promising technology to meet the ever-increasing demands of fabricating high performance devices featuring cost-effectiveness, structural sophistication and multi-functional enabling. Such an integration process generally involves a diverse array of nanostructural entities (nano-entities) consisting of dissimilar nanoscale building blocks such as nanoparticles, nanowires, and nanofilms made of metals, ceramics, or polymers. Various synthetic strategies and integration methods have enabled the successful assembly of both structurally and functionally tailored nano-arrays into a unique class of monolithic devices. The performance of nano-array based monolithic devices is dictated bymore » a few important factors such as materials substrate selection, nanostructure composition and nano-architecture geometry. Therefore, the rational material selection and nano-entity manipulation during the nano-array integration process, aiming to exploit the advantageous characteristics of nanostructures and their ensembles, are critical steps towards bridging the design of nanostructure integrated monolithic devices with various practical applications. In this article, we highlight the latest research progress of the two-dimensional (2-D) and 3-D metal and metal oxide based nanostructural integrations into prototype devices applicable with ultrahigh efficiency, good robustness and improved functionality. Lastly, selective examples of nano-array integration, scalable nanomanufacturing and representative monolithic devices such as catalytic converters, sensors and batteries will be utilized as the connecting dots to display a roadmap from hierarchical nanostructural assembly to practical nanotechnology implications ranging from energy, environmental, to chemical and biotechnology areas.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Sibo; Ren, Zheng; Guo, Yanbing
We report the scalable three-dimensional (3-D) integration of functional nanostructures into applicable platforms represents a promising technology to meet the ever-increasing demands of fabricating high performance devices featuring cost-effectiveness, structural sophistication and multi-functional enabling. Such an integration process generally involves a diverse array of nanostructural entities (nano-entities) consisting of dissimilar nanoscale building blocks such as nanoparticles, nanowires, and nanofilms made of metals, ceramics, or polymers. Various synthetic strategies and integration methods have enabled the successful assembly of both structurally and functionally tailored nano-arrays into a unique class of monolithic devices. The performance of nano-array based monolithic devices is dictated bymore » a few important factors such as materials substrate selection, nanostructure composition and nano-architecture geometry. Therefore, the rational material selection and nano-entity manipulation during the nano-array integration process, aiming to exploit the advantageous characteristics of nanostructures and their ensembles, are critical steps towards bridging the design of nanostructure integrated monolithic devices with various practical applications. In this article, we highlight the latest research progress of the two-dimensional (2-D) and 3-D metal and metal oxide based nanostructural integrations into prototype devices applicable with ultrahigh efficiency, good robustness and improved functionality. Lastly, selective examples of nano-array integration, scalable nanomanufacturing and representative monolithic devices such as catalytic converters, sensors and batteries will be utilized as the connecting dots to display a roadmap from hierarchical nanostructural assembly to practical nanotechnology implications ranging from energy, environmental, to chemical and biotechnology areas.« less
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Medusa: A Scalable MR Console Using USB
Stang, Pascal P.; Conolly, Steven M.; Santos, Juan M.; Pauly, John M.; Scott, Greig C.
2012-01-01
MRI pulse sequence consoles typically employ closed proprietary hardware, software, and interfaces, making difficult any adaptation for innovative experimental technology. Yet MRI systems research is trending to higher channel count receivers, transmitters, gradient/shims, and unique interfaces for interventional applications. Customized console designs are now feasible for researchers with modern electronic components, but high data rates, synchronization, scalability, and cost present important challenges. Implementing large multi-channel MR systems with efficiency and flexibility requires a scalable modular architecture. With Medusa, we propose an open system architecture using the Universal Serial Bus (USB) for scalability, combined with distributed processing and buffering to address the high data rates and strict synchronization required by multi-channel MRI. Medusa uses a modular design concept based on digital synthesizer, receiver, and gradient blocks, in conjunction with fast programmable logic for sampling and synchronization. Medusa is a form of synthetic instrument, being reconfigurable for a variety of medical/scientific instrumentation needs. The Medusa distributed architecture, scalability, and data bandwidth limits are presented, and its flexibility is demonstrated in a variety of novel MRI applications. PMID:21954200
2017-02-01
enable high scalability and reconfigurability for inter-CPU/Memory communications with an increased number of communication channels in frequency ...interconnect technology (MRFI) to enable high scalability and re-configurability for inter-CPU/Memory communications with an increased number of communication ...testing in the University of California, Los Angeles (UCLA) Center for High Frequency Electronics, and Dr. Afshin Momtaz at Broadcom Corporation for
NASA Astrophysics Data System (ADS)
Jubran, Mohammad K.; Bansal, Manu; Kondi, Lisimachos P.
2006-01-01
In this paper, we consider the problem of optimal bit allocation for wireless video transmission over fading channels. We use a newly developed hybrid scalable/multiple-description codec that combines the functionality of both scalable and multiple-description codecs. It produces a base layer and multiple-description enhancement layers. Any of the enhancement layers can be decoded (in a non-hierarchical manner) with the base layer to improve the reconstructed video quality. Two different channel coding schemes (Rate-Compatible Punctured Convolutional (RCPC)/Cyclic Redundancy Check (CRC) coding and, product code Reed Solomon (RS)+RCPC/CRC coding) are used for unequal error protection of the layered bitstream. Optimal allocation of the bitrate between source and channel coding is performed for discrete sets of source coding rates and channel coding rates. Experimental results are presented for a wide range of channel conditions. Also, comparisons with classical scalable coding show the effectiveness of using hybrid scalable/multiple-description coding for wireless transmission.
Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef
2012-10-01
In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.
High-efficiency and air-stable P3HT-based polymer solar cells with a new non-fullerene acceptor
Holliday, Sarah; Ashraf, Raja Shahid; Wadsworth, Andrew; Baran, Derya; Yousaf, Syeda Amber; Nielsen, Christian B.; Tan, Ching-Hong; Dimitrov, Stoichko D.; Shang, Zhengrong; Gasparini, Nicola; Alamoudi, Maha; Laquai, Frédéric; Brabec, Christoph J.; Salleo, Alberto; Durrant, James R.; McCulloch, Iain
2016-01-01
Solution-processed organic photovoltaics (OPV) offer the attractive prospect of low-cost, light-weight and environmentally benign solar energy production. The highest efficiency OPV at present use low-bandgap donor polymers, many of which suffer from problems with stability and synthetic scalability. They also rely on fullerene-based acceptors, which themselves have issues with cost, stability and limited spectral absorption. Here we present a new non-fullerene acceptor that has been specifically designed to give improved performance alongside the wide bandgap donor poly(3-hexylthiophene), a polymer with significantly better prospects for commercial OPV due to its relative scalability and stability. Thanks to the well-matched optoelectronic and morphological properties of these materials, efficiencies of 6.4% are achieved which is the highest reported for fullerene-free P3HT devices. In addition, dramatically improved air stability is demonstrated relative to other high-efficiency OPV, showing the excellent potential of this new material combination for future technological applications. PMID:27279376
Highly sensitive MoS2 photodetectors with graphene contacts
NASA Astrophysics Data System (ADS)
Han, Peize; St. Marie, Luke; Wang, Qing X.; Quirk, Nicholas; El Fatimy, Abdel; Ishigami, Masahiro; Barbara, Paola
2018-05-01
Two-dimensional materials such as graphene and transition metal dichalcogenides (TMDs) are ideal candidates to create ultra-thin electronics suitable for flexible substrates. Although optoelectronic devices based on TMDs have demonstrated remarkable performance, scalability is still a significant issue. Most devices are created using techniques that are not suitable for mass production, such as mechanical exfoliation of monolayer flakes and patterning by electron-beam lithography. Here we show that large-area MoS2 grown by chemical vapor deposition and patterned by photolithography yields highly sensitive photodetectors, with record shot-noise-limited detectivities of 8.7 × 1014 Jones in ambient condition and even higher when sealed with a protective layer. These detectivity values are higher than the highest values reported for photodetectors based on exfoliated MoS2. We study MoS2 devices with gold electrodes and graphene electrodes. The devices with graphene electrodes have a tunable band alignment and are especially attractive for scalable ultra-thin flexible optoelectronics.
Highly sensitive MoS2 photodetectors with graphene contacts.
Han, Peize; St Marie, Luke; Wang, Qing X; Quirk, Nicholas; El Fatimy, Abdel; Ishigami, Masahiro; Barbara, Paola
2018-05-18
Two-dimensional materials such as graphene and transition metal dichalcogenides (TMDs) are ideal candidates to create ultra-thin electronics suitable for flexible substrates. Although optoelectronic devices based on TMDs have demonstrated remarkable performance, scalability is still a significant issue. Most devices are created using techniques that are not suitable for mass production, such as mechanical exfoliation of monolayer flakes and patterning by electron-beam lithography. Here we show that large-area MoS 2 grown by chemical vapor deposition and patterned by photolithography yields highly sensitive photodetectors, with record shot-noise-limited detectivities of 8.7 × 10 14 Jones in ambient condition and even higher when sealed with a protective layer. These detectivity values are higher than the highest values reported for photodetectors based on exfoliated MoS 2 . We study MoS 2 devices with gold electrodes and graphene electrodes. The devices with graphene electrodes have a tunable band alignment and are especially attractive for scalable ultra-thin flexible optoelectronics.
Method and system for benchmarking computers
Gustafson, John L.
1993-09-14
A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
A Solar Dynamic Power Option for Space Solar Power
NASA Technical Reports Server (NTRS)
Mason, Lee S.
1999-01-01
A study was performed to determine the potential performance and related technology requirements of Solar Dynamic power systems for a Space Solar Power satellite. Space Solar Power is a concept where solar energy is collected in orbit and beamed to Earth receiving stations to supplement terrestrial electric power service. Solar Dynamic systems offer the benefits of high solar-to-electric efficiency, long life with minimal performance degradation, and high power scalability. System analyses indicate that with moderate component development, SD systems can exhibit excellent mass and deployed area characteristics. Using the analyses as a guide, a technology roadmap was -enerated which identifies the component advances necessary to make SD power generation a competitive option for the SSP mission.
A Solution Processable High-Performance Thermoelectric Copper Selenide Thin Film.
Lin, Zhaoyang; Hollar, Courtney; Kang, Joon Sang; Yin, Anxiang; Wang, Yiliu; Shiu, Hui-Ying; Huang, Yu; Hu, Yongjie; Zhang, Yanliang; Duan, Xiangfeng
2017-06-01
A solid-state thermoelectric device is attractive for diverse technological areas such as cooling, power generation and waste heat recovery with unique advantages of quiet operation, zero hazardous emissions, and long lifetime. With the rapid growth of flexible electronics and miniature sensors, the low-cost flexible thermoelectric energy harvester is highly desired as a potential power supply. Herein, a flexible thermoelectric copper selenide (Cu 2 Se) thin film, consisting of earth-abundant elements, is reported. The thin film is fabricated by a low-cost and scalable spin coating process using ink solution with a truly soluble precursor. The Cu 2 Se thin film exhibits a power factor of 0.62 mW/(m K 2 ) at 684 K on rigid Al 2 O 3 substrate and 0.46 mW/(m K 2 ) at 664 K on flexible polyimide substrate, which is much higher than the values obtained from other solution processed Cu 2 Se thin films (<0.1 mW/(m K 2 )) and among the highest values reported in all flexible thermoelectric films to date (≈0.5 mW/(m K 2 )). Additionally, the fabricated thin film shows great promise to be integrated with the flexible electronic devices, with negligible performance change after 1000 bending cycles. Together, the study demonstrates a low-cost and scalable pathway to high-performance flexible thin film thermoelectric devices from relatively earth-abundant elements. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Field of genes: using Apache Kafka as a bioinformatic data repository
Lynch, Richard; Walsh, Paul
2018-01-01
Abstract Background Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI’s) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI’s RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. Results The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Conclusions Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data. PMID:29635394
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.
2018-07-01
This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
Volumetric Medical Image Coding: An Object-based, Lossy-to-lossless and Fully Scalable Approach
Danyali, Habibiollah; Mertins, Alfred
2011-01-01
In this article, an object-based, highly scalable, lossy-to-lossless 3D wavelet coding approach for volumetric medical image data (e.g., magnetic resonance (MR) and computed tomography (CT)) is proposed. The new method, called 3DOBHS-SPIHT, is based on the well-known set partitioning in the hierarchical trees (SPIHT) algorithm and supports both quality and resolution scalability. The 3D input data is grouped into groups of slices (GOS) and each GOS is encoded and decoded as a separate unit. The symmetric tree definition of the original 3DSPIHT is improved by introducing a new asymmetric tree structure. While preserving the compression efficiency, the new tree structure allows for a small size of each GOS, which not only reduces memory consumption during the encoding and decoding processes, but also facilitates more efficient random access to certain segments of slices. To achieve more compression efficiency, the algorithm only encodes the main object of interest in each 3D data set, which can have any arbitrary shape, and ignores the unnecessary background. The experimental results on some MR data sets show the good performance of the 3DOBHS-SPIHT algorithm for multi-resolution lossy-to-lossless coding. The compression efficiency, full scalability, and object-based features of the proposed approach, beside its lossy-to-lossless coding support, make it a very attractive candidate for volumetric medical image information archiving and transmission applications. PMID:22606653
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B.; Perumalla, Kalyan S.
Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul
2013-01-01
Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
Continuous Variable Cluster State Generation over the Optical Spatial Mode Comb
Pooser, Raphael C.; Jing, Jietai
2014-10-20
One way quantum computing uses single qubit projective measurements performed on a cluster state (a highly entangled state of multiple qubits) in order to enact quantum gates. The model is promising due to its potential scalability; the cluster state may be produced at the beginning of the computation and operated on over time. Continuous variables (CV) offer another potential benefit in the form of deterministic entanglement generation. This determinism can lead to robust cluster states and scalable quantum computation. Recent demonstrations of CV cluster states have made great strides on the path to scalability utilizing either time or frequency multiplexingmore » in optical parametric oscillators (OPO) both above and below threshold. The techniques relied on a combination of entangling operators and beam splitter transformations. Here we show that an analogous transformation exists for amplifiers with Gaussian inputs states operating on multiple spatial modes. By judicious selection of local oscillators (LOs), the spatial mode distribution is analogous to the optical frequency comb consisting of axial modes in an OPO cavity. We outline an experimental system that generates cluster states across the spatial frequency comb which can also scale the amount of quantum noise reduction to potentially larger than in other systems.« less
Electrospun core-shell fibers for robust silicon nanoparticle-based lithium ion battery anodes.
Hwang, Tae Hoon; Lee, Yong Min; Kong, Byung-Seon; Seo, Jin-Seok; Choi, Jang Wook
2012-02-08
Because of its unprecedented theoretical capacity near 4000 mAh/g, which is approximately 10-fold larger compared to those of the current commercial graphite anodes, silicon has been the most promising anode for lithium ion batteries, particularly targeting large-scale energy storage applications including electrical vehicles and utility grids. Nevertheless, Si suffers from its short cycle life as well as the limitation for scalable electrode fabrication. Herein, we develop an electrospinning process to produce core-shell fiber electrodes using a dual nozzle in a scalable manner. In the core-shell fibers, commercially available nanoparticles in the core are wrapped by the carbon shell. The unique core-shell structure resolves various issues of Si anode operations, such as pulverization, vulnerable contacts between Si and carbon conductors, and an unstable sold-electrolyte interphase, thereby exhibiting outstanding cell performance: a gravimetric capacity as high as 1384 mAh/g, a 5 min discharging rate capability while retaining 721 mAh/g, and cycle life of 300 cycles with almost no capacity loss. The electrospun core-shell one-dimensional fibers suggest a new design principle for robust and scalable lithium battery electrodes suffering from volume expansion. © 2011 American Chemical Society
Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju
2014-03-21
A tracking service like asset management is essential in a dynamic hospital environment consisting of numerous mobile assets (e.g., wheelchairs or infusion pumps) that are continuously relocated throughout a hospital. The tracking service is accomplished based on the key technologies of an indoor location-based service (LBS), such as locating and monitoring multiple mobile targets inside a building in real time. An indoor LBS such as a tracking service entails numerous resource lookups being requested concurrently and frequently from several locations, as well as a network infrastructure requiring support for high scalability in indoor environments. A traditional centralized architecture needs to maintain a geographic map of the entire building or complex in its central server, which can cause low scalability and traffic congestion. This paper presents a self-organizing and fully distributed indoor mobile asset management (MAM) platform, and proposes an architecture for multiple trackees (such as mobile assets) and trackers based on the proposed distributed platform in real time. In order to verify the suggested platform, scalability performance according to increases in the number of concurrent lookups was evaluated in a real test bed. Tracking latency and traffic load ratio in the proposed tracking architecture was also evaluated.
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids
Yoginath, Srikanth B.; Perumalla, Kalyan S.
2018-01-31
Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Advanced Metalworking Solutions For Naval Systems That Go In Harm’s Way
2015-01-01
destroyers USS Momsen (DDG 92) and USS Preble (DDG 88) are underway in formation. U.S. Navy photo Front cover: Ingalls Shipbuilding welding photo...applies a variety of innovative welding technologies to address the challenges associated with joining weapon system components. Joining Technologies...friction stir welding process to manufacture edge-cooled naval electronic cold plate assemblies. The modular, high- performance, and scalable
High Productivity Computing Systems Analysis and Performance
2005-07-01
cubic grid Discrete Math Global Updates per second (GUP/S) RandomAccess Paper & Pencil Contact Bob Lucas (ISI) Multiple Precision none...can be found at the web site. One of the HPCchallenge codes, RandomAccess, is derived from the HPCS discrete math benchmarks that we released, and...Kernels Discrete Math … Graph Analysis … Linear Solvers … Signal Processi ng Execution Bounds Execution Indicators 6 Scalable Compact
Department of Defense High Performance Computing Modernization Program. 2006 Annual Report
2007-03-01
Department. We successfully completed several software development projects that introduced parallel, scalable production software now in use across the...imagined. They are developing and deploying weather and ocean models that allow our soldiers, sailors, marines and airmen to plan missions more effectively...and to navigate adverse environments safely. They are modeling molecular interactions leading to the development of higher energy fuels, munitions
Roll-to-roll production of spray coated N-doped carbon nanotube electrodes for supercapacitors
NASA Astrophysics Data System (ADS)
Karakaya, Mehmet; Zhu, Jingyi; Raghavendra, Achyut J.; Podila, Ramakrishna; Parler, Samuel G.; Kaplan, James P.; Rao, Apparao M.
2014-12-01
Although carbon nanomaterials are being increasingly used in energy storage, there has been a lack of inexpensive, continuous, and scalable synthesis methods. Here, we present a scalable roll-to-roll (R2R) spray coating process for synthesizing randomly oriented multi-walled carbon nanotubes electrodes on Al foils. The coin and jellyroll type supercapacitors comprised such electrodes yield high power densities (˜700 mW/cm3) and energy densities (1 mW h/cm3) on par with Li-ion thin film batteries. These devices exhibit excellent cycle stability with no loss in performance over more than a thousand cycles. Our cost analysis shows that the R2R spray coating process can produce supercapacitors with 10 times the energy density of conventional activated carbon devices at ˜17% lower cost.
Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
Linear static structural and vibration analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.
1993-01-01
Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
2012-01-01
Background High-throughput methods are widely-used for strain screening effectively resulting in binary information regarding high or low productivity. Nevertheless achieving quantitative and scalable parameters for fast bioprocess development is much more challenging, especially for heterologous protein production. Here, the nature of the foreign protein makes it impossible to predict the, e.g. best expression construct, secretion signal peptide, inductor concentration, induction time, temperature and substrate feed rate in fed-batch operation to name only a few. Therefore, a high number of systematic experiments are necessary to elucidate the best conditions for heterologous expression of each new protein of interest. Results To increase the throughput in bioprocess development, we used a microtiter plate based cultivation system (Biolector) which was fully integrated into a liquid-handling platform enclosed in laminar airflow housing. This automated cultivation platform was used for optimization of the secretory production of a cutinase from Fusarium solani pisi with Corynebacterium glutamicum. The online monitoring of biomass, dissolved oxygen and pH in each of the microtiter plate wells enables to trigger sampling or dosing events with the pipetting robot used for a reliable selection of best performing cutinase producers. In addition to this, further automated methods like media optimization and induction profiling were developed and validated. All biological and bioprocess parameters were exclusively optimized at microtiter plate scale and showed perfect scalable results to 1 L and 20 L stirred tank bioreactor scale. Conclusions The optimization of heterologous protein expression in microbial systems currently requires extensive testing of biological and bioprocess engineering parameters. This can be efficiently boosted by using a microtiter plate cultivation setup embedded into a liquid-handling system, providing more throughput by parallelization and automation. Due to improved statistics by replicate cultivations, automated downstream analysis, and scalable process information, this setup has superior performance compared to standard microtiter plate cultivation. PMID:23113930
NASA Astrophysics Data System (ADS)
Robbins, Hannah; Hu, Sijung; Liu, Changqing
2015-03-01
The demand for rapid screening technologies, to be used outside of a traditional healthcare setting, has been vastly expanding. This is requiring a new engineering platform for faster and cost effective techniques to be easily adopted through forward-thinking manufacturing procedures, i.e., advanced miniaturisation and heterogeneous integration of high performance microfluidics based point-of-care testing (POCT) systems. Although there has been a considerable amount of research into POCT systems, there exist tremendous challenges and bottlenecks in the design and manufacturing in order to reach a clinical acceptability of sensitivity and selectivity, as well as smart microsystems for healthcare. The project aims to research how to enable scalable production of such complex systems through 1) advanced miniaturisation of a physical layout and opto-electronic component allocation through an optimal design; and 2) heterogeneous integration of multiplexed fluorescence detection (MFD) for in vitro POCT. Verification is being arranged through experimental testing with a series of dilutions of commonly used fluorescence dye, i.e. Cy5. Iterative procedures will be engaged until satisfaction of the detection limit, of Cy5 dye, 1.209x10-10 M. The research creates a new avenue of rapid screening POCT manufacturing solutions with a particular view on high performance and multifunctional detection systems not only in POCT, but also life sciences and environmental applications.
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.
Urbanowicz, Ryan J; Moore, Jason H
2015-09-01
Algorithmic scalability is a major concern for any machine learning strategy in this age of 'big data'. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExS-TraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExS-TraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExS-TraCS usability was made simpler through the elimination of previously critical run parameters.
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Optimization of atmospheric transport models on HPC platforms
NASA Astrophysics Data System (ADS)
de la Cruz, Raúl; Folch, Arnau; Farré, Pau; Cabezas, Javier; Navarro, Nacho; Cela, José María
2016-12-01
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, memory access latency or lack of task overlapping. We investigate how different software optimizations and porting to non general-purpose hardware architectures improve code scalability and execution times considering, as an example, the FALL3D volcanic ash transport model. To this purpose, we implement the FALL3D model equations in the WARIS framework, a software designed from scratch to solve in a parallel and efficient way different geoscience problems on a wide variety of architectures. In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. Considering all these aspects together, the FALL3D execution times for a realistic test case running on general-purpose cluster architectures (Intel Sandy Bridge) decrease by a factor between 7 and 40 depending on the grid resolution. Finally, we port the application to Intel Xeon Phi (MIC) and NVIDIA GPUs (CUDA) accelerator-based architectures and compare performance, cost and power consumption on all the architectures. Implications on time-constrained operational model configurations are discussed.
Scalable Light Module for Low-Cost, High-Efficiency Light- Emitting Diode Luminaires
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tarsa, Eric
2015-08-31
During this two-year program Cree developed a scalable, modular optical architecture for low-cost, high-efficacy light emitting diode (LED) luminaires. Stated simply, the goal of this architecture was to efficiently and cost-effectively convey light from LEDs (point sources) to broad luminaire surfaces (area sources). By simultaneously developing warm-white LED components and low-cost, scalable optical elements, a high system optical efficiency resulted. To meet program goals, Cree evaluated novel approaches to improve LED component efficacy at high color quality while not sacrificing LED optical efficiency relative to conventional packages. Meanwhile, efficiently coupling light from LEDs into modular optical elements, followed by optimallymore » distributing and extracting this light, were challenges that were addressed via novel optical design coupled with frequent experimental evaluations. Minimizing luminaire bill of materials and assembly costs were two guiding principles for all design work, in the effort to achieve luminaires with significantly lower normalized cost ($/klm) than existing LED fixtures. Chief project accomplishments included the achievement of >150 lm/W warm-white LEDs having primary optics compatible with low-cost modular optical elements. In addition, a prototype Light Module optical efficiency of over 90% was measured, demonstrating the potential of this scalable architecture for ultra-high-efficacy LED luminaires. Since the project ended, Cree has continued to evaluate optical element fabrication and assembly methods in an effort to rapidly transfer this scalable, cost-effective technology to Cree production development groups. The Light Module concept is likely to make a strong contribution to the development of new cost-effective, high-efficacy luminaries, thereby accelerating widespread adoption of energy-saving SSL in the U.S.« less
Extraordinary Corrosion Protection from Polymer-Clay Nanobrick Wall Thin Films.
Schindelholz, Eric J; Spoerke, Erik D; Nguyen, Hai-Duy; Grunlan, Jaime C; Qin, Shuang; Bufford, Daniel C
2018-06-20
Metals across all industries demand anticorrosion surface treatments and drive a continual need for high-performing and low-cost coatings. Here we demonstrate polymer-clay nanocomposite thin films as a new class of transparent conformal barrier coatings for protection in corrosive atmospheres. Films assembled via layer-by-layer deposition, as thin as 90 nm, are shown to reduce copper corrosion rates by >1000× in an aggressive H 2 S atmosphere. These multilayer nanobrick wall coatings hold promise as high-performing anticorrosion treatment alternatives to costlier, more toxic, and less scalable thin films, such as graphene, hexavalent chromium, or atomic-layer-deposited metal oxides.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures
NASA Astrophysics Data System (ADS)
Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.
2016-12-01
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhary, Alok; Samatova, Nagiza; Wu, Kesheng
This project developed a generic and optimized set of core data analytics functions. These functions organically consolidate a broad constellation of high performance analytical pipelines. As the architectures of emerging HPC systems become inherently heterogeneous, there is a need to design algorithms for data analysis kernels accelerated on hybrid multi-node, multi-core HPC architectures comprised of a mix of CPUs, GPUs, and SSDs. Furthermore, the power-aware trend drives the advances in our performance-energy tradeoff analysis framework which enables our data analysis kernels algorithms and software to be parameterized so that users can choose the right power-performance optimizations.
High Temperature Carbonized Grass as a High Performance Sodium Ion Battery Anode.
Zhang, Fang; Yao, Yonggang; Wan, Jiayu; Henderson, Doug; Zhang, Xiaogang; Hu, Liangbing
2017-01-11
Hard carbon is currently considered the most promising anode candidate for room temperature sodium ion batteries because of its relatively high capacity, low cost, and good scalability. In this work, switchgrass as a biomass example was carbonized under an ultrahigh temperature, 2050 °C, induced by Joule heating to create hard carbon anodes for sodium ion batteries. Switchgrass derived carbon materials intrinsically inherit its three-dimensional porous hierarchical architecture, with an average interlayer spacing of 0.376 nm. The larger interlayer spacing than that of graphite allows for the significant Na ion storage performance. Compared to the sample carbonized under 1000 °C, switchgrass derived carbon at 2050 °C induced an improved initial Coulombic efficiency. Additionally, excellent rate capability and superior cycling performance are demonstrated for the switchgrass derived carbon due to the unique high temperature treatment.
Platform for efficient switching between multiple devices in the intensive care unit.
De Backere, F; Vanhove, T; Dejonghe, E; Feys, M; Herinckx, T; Vankelecom, J; Decruyenaere, J; De Turck, F
2015-01-01
This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". Handheld computers, such as tablets and smartphones, are becoming more and more accessible in the clinical care setting and in Intensive Care Units (ICUs). By making the most useful and appropriate data available on multiple devices and facilitate the switching between those devices, staff members can efficiently integrate them in their workflow, allowing for faster and more accurate decisions. This paper addresses the design of a platform for the efficient switching between multiple devices in the ICU. The key functionalities of the platform are the integration of the platform into the workflow of the medical staff and providing tailored and dynamic information at the point of care. The platform is designed based on a 3-tier architecture with a focus on extensibility, scalability and an optimal user experience. After identification to a device using Near Field Communication (NFC), the appropriate medical information will be shown on the selected device. The visualization of the data is adapted to the type of the device. A web-centric approach was used to enable extensibility and portability. A prototype of the platform was thoroughly evaluated. The scalability, performance and user experience were evaluated. Performance tests show that the response time of the system scales linearly with the amount of data. Measurements with up to 20 devices have shown no performance loss due to the concurrent use of multiple devices. The platform provides a scalable and responsive solution to enable the efficient switching between multiple devices. Due to the web-centric approach new devices can easily be integrated. The performance and scalability of the platform have been evaluated and it was shown that the response time and scalability of the platform was within an acceptable range.
Improved inter-layer prediction for light field content coding with display scalability
NASA Astrophysics Data System (ADS)
Conti, Caroline; Ducla Soares, Luís.; Nunes, Paulo
2016-09-01
Light field imaging based on microlens arrays - also known as plenoptic, holoscopic and integral imaging - has recently risen up as feasible and prospective technology due to its ability to support functionalities not straightforwardly available in conventional imaging systems, such as: post-production refocusing and depth of field changing. However, to gradually reach the consumer market and to provide interoperability with current 2D and 3D representations, a display scalable coding solution is essential. In this context, this paper proposes an improved display scalable light field codec comprising a three-layer hierarchical coding architecture (previously proposed by the authors) that provides interoperability with 2D (Base Layer) and 3D stereo and multiview (First Layer) representations, while the Second Layer supports the complete light field content. For further improving the compression performance, novel exemplar-based inter-layer coding tools are proposed here for the Second Layer, namely: (i) an inter-layer reference picture construction relying on an exemplar-based optimization algorithm for texture synthesis, and (ii) a direct prediction mode based on exemplar texture samples from lower layers. Experimental results show that the proposed solution performs better than the tested benchmark solutions, including the authors' previous scalable codec.
Unequal error control scheme for dimmable visible light communication systems
NASA Astrophysics Data System (ADS)
Deng, Keyan; Yuan, Lei; Wan, Yi; Li, Huaan
2017-01-01
Visible light communication (VLC), which has the advantages of a very large bandwidth, high security, and freedom from license-related restrictions and electromagnetic-interference, has attracted much interest. Because a VLC system simultaneously performs illumination and communication functions, dimming control, efficiency, and reliable transmission are significant and challenging issues of such systems. In this paper, we propose a novel unequal error control (UEC) scheme in which expanding window fountain (EWF) codes in an on-off keying (OOK)-based VLC system are used to support different dimming target values. To evaluate the performance of the scheme for various dimming target values, we apply it to H.264 scalable video coding bitstreams in a VLC system. The results of the simulations that are performed using additive white Gaussian noises (AWGNs) with different signal-to-noise ratios (SNRs) are used to compare the performance of the proposed scheme for various dimming target values. It is found that the proposed UEC scheme enables earlier base layer recovery compared to the use of the equal error control (EEC) scheme for different dimming target values and therefore afford robust transmission for scalable video multicast over optical wireless channels. This is because of the unequal error protection (UEP) and unequal recovery time (URT) of the EWF code in the proposed scheme.
Validation of a Scalable Solar Sailcraft
NASA Technical Reports Server (NTRS)
Murphy, D. M.
2006-01-01
The NASA In-Space Propulsion (ISP) program sponsored intensive solar sail technology and systems design, development, and hardware demonstration activities over the past 3 years. Efforts to validate a scalable solar sail system by functional demonstration in relevant environments, together with test-analysis correlation activities on a scalable solar sail system have recently been successfully completed. A review of the program, with descriptions of the design, results of testing, and analytical model validations of component and assembly functional, strength, stiffness, shape, and dynamic behavior are discussed. The scaled performance of the validated system is projected to demonstrate the applicability to flight demonstration and important NASA road-map missions.
Innovative ceramic slab lasers for high power laser applications
NASA Astrophysics Data System (ADS)
Lapucci, Antonio; Ciofini, Marco
2005-09-01
Diode Pumped Solid State Lasers (DPSSL) are gaining increasing interest for high power industrial application, given the continuous improvement in high power diode laser technology reliability and affordability. These sources open new windows in the parameter space for traditional applications such as cutting , welding, marking and engraving for high reflectance metallic materials. Other interesting applications for this kind of sources include high speed thermal printing, precision drilling, selective soldering and thin film etching. In this paper we examine the most important DPSS laser source types for industrial applications and we describe in details the performances of some slab laser configurations investigated at our facilities. The different architectures' advantages and draw-backs are briefly compared in terms of performances, system complexity and ease of scalability to the multi-kW level.
Highly Efficient Perovskite Solar Modules by Scalable Fabrication and Interconnection Optimization
Yang, Mengjin; Kim, Dong Hoe; Klein, Talysa R.; ...
2018-01-02
To push perovskite solar cell (PSC) technology toward practical applications, large-area perovskite solar modules with multiple subcells need to be developed by fully scalable deposition approaches. Here, we demonstrate a deposition scheme for perovskite module fabrication with spray coating of a TiO2 electron transport layer (ETL) and blade coating of both a perovskite absorber layer and a spiro-OMeTAD-based hole transport layer (HTL). The TiO2 ETL remaining in the interconnection between subcells significantly affects the module performance. Reducing the TiO2 thickness changes the interconnection contact from a Schottky diode to ohmic behavior. Owing to interconnection resistance reduction, the perovskite modules withmore » a 10 nm TiO2 layer show enhanced performance mainly associated with an improved fill factor. Finally, we demonstrate a four-cell MA0.7FA0.3PbI3 perovskite module with a stabilized power conversion efficiency (PCE) of 15.6% measured from an aperture area of ~10.36 cm2, corresponding to an active-area module PCE of 17.9% with a geometric fill factor of ~87.3%.« less
Stretched Lens Array Squarerigger (SLASR) Technology Maturation
NASA Technical Reports Server (NTRS)
O'Neill, Mark; McDanal, A.J.; Howell, Joe; Lollar, Louis; Carrington, Connie; Hoppe, David; Piszczor, Michael; Suszuki, Nantel; Eskenazi, Michael; Aiken, Dan;
2007-01-01
Since April 2005, our team has been underway on a competitively awarded program sponsored by NASA s Exploration Systems Mission Directorate to develop, refine, and mature the unique solar array technology known as Stretched Lens Array SquareRigger (SLASR). SLASR offers an unprecedented portfolio of performance metrics, SLASR offers an unprecedented portfolio of performance metrics, including the following: Areal Power Density = 300 W/m2 (2005) - 400 W/m2 (2008 Target) Specific Power = 300 W/kg (2005) - 500 W/kg (2008 Target) for a Full 100 kW Solar Array Stowed Power = 80 kW/cu m (2005) - 120 kW/m3 (2008 Target) for a Full 100 kW Solar Array Scalable Array Capacity = 100 s of W s to 100 s of kW s Super-Insulated Small Cell Circuit = High-Voltage (300-600 V) Operation at Low Mass Penalty Super-Shielded Small Cell Circuit = Excellent Radiation Hardness at Low Mass Penalty 85% Cell Area Savings = 75% Lower Array Cost per Watt than One-Sun Array Modular, Scalable, & Mass-Producible at MW s per Year Using Existing Processes and Capacities
Highly Efficient Perovskite Solar Modules by Scalable Fabrication and Interconnection Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Mengjin; Kim, Dong Hoe; Klein, Talysa R.
To push perovskite solar cell (PSC) technology toward practical applications, large-area perovskite solar modules with multiple subcells need to be developed by fully scalable deposition approaches. Here, we demonstrate a deposition scheme for perovskite module fabrication with spray coating of a TiO2 electron transport layer (ETL) and blade coating of both a perovskite absorber layer and a spiro-OMeTAD-based hole transport layer (HTL). The TiO2 ETL remaining in the interconnection between subcells significantly affects the module performance. Reducing the TiO2 thickness changes the interconnection contact from a Schottky diode to ohmic behavior. Owing to interconnection resistance reduction, the perovskite modules withmore » a 10 nm TiO2 layer show enhanced performance mainly associated with an improved fill factor. Finally, we demonstrate a four-cell MA0.7FA0.3PbI3 perovskite module with a stabilized power conversion efficiency (PCE) of 15.6% measured from an aperture area of ~10.36 cm2, corresponding to an active-area module PCE of 17.9% with a geometric fill factor of ~87.3%.« less
NASA Astrophysics Data System (ADS)
Boott, Charlotte E.; Gwyther, Jessica; Harniman, Robert L.; Hayward, Dominic W.; Manners, Ian
2017-08-01
The preparation of well-defined nanoparticles based on soft matter, using solution-processing techniques on a commercially viable scale, is a major challenge of widespread importance. Self-assembly of block copolymers in solvents that selectively solvate one of the segments provides a promising route to core-corona nanoparticles (micelles) with a wide range of potential uses. Nevertheless, significant limitations to this approach also exist. For example, the solution processing of block copolymers generally follows a separate synthesis step and is normally performed at high dilution. Moreover, non-spherical micelles—which are promising for many applications—are generally difficult to access, samples are polydisperse and precise dimensional control is not possible. Here we demonstrate the formation of platelet and cylindrical micelles at concentrations up to 25% solids via a one-pot approach—starting from monomers—that combines polymerization-induced and crystallization-driven self-assembly. We also show that performing the procedure in the presence of small seed micelles allows the scalable formation of low dispersity samples of cylindrical micelles of controlled length up to three micrometres.
Liu, Bin; Liu, Boyang; Wang, Qiufan; Wang, Xianfu; Xiang, Qingyi; Chen, Di; Shen, Guozhen
2013-10-23
Hierarchical ZnCo2O4/nickel foam architectures were first fabricated from a simple scalable solution approach, exhibiting outstanding electrochemical performance in supercapacitors with high specific capacitance (∼1400 F g(-1) at 1 A g(-1)), excellent rate capability (72.5% capacity retention at 20 A g(-1)), and good cycling stability (only 3% loss after 1000 cycles at 6 A g(-1)). All-solid-state supercapacitors were also fabricated by assembling two pieces of the ZnCo2O4-based electrodes, showing superior performance in terms of high specific capacitance and long cycling stability. Our work confirms that the as-prepared architectures can not only be applied in high energy density fields, but also be used in high power density applications, such as electric vehicles, flexible electronics, and energy storage devices.
An integrated analog O/E/O link for multi-channel laser neurons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nahmias, Mitchell A., E-mail: mnahmias@princeton.edu; Tait, Alexander N.; Tolias, Leonidas
2016-04-11
We demonstrate an analog O/E/O electronic link to allow integrated laser neurons to accept many distinguishable, high bandwidth input signals simultaneously. This device utilizes wavelength division multiplexing to achieve multi-channel fan-in, a photodetector to sum signals together, and a laser cavity to perform a nonlinear operation. Its speed outpaces accelerated-time neuromorphic electronics, and it represents a viable direction towards scalable networking approaches.
Implementation of Virtualization Oriented Architecture: A Healthcare Industry Case Study
NASA Astrophysics Data System (ADS)
Rao, G. Subrahmanya Vrk; Parthasarathi, Jinka; Karthik, Sundararaman; Rao, Gvn Appa; Ganesan, Suresh
This paper presents a Virtualization Oriented Architecture (VOA) and an implementation of VOA for Hridaya - a Telemedicine initiative. Hadoop Compute cloud was established at our labs and jobs which require a massive computing capability such as ECG signal analysis were submitted and the study is presented in this current paper. VOA takes advantage of inexpensive community PCs and provides added advantages such as Fault Tolerance, Scalability, Performance, High Availability.
Compact Buried Ducts in a Hot-Humid Climate House
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mallay, D.
2016-01-01
A system of compact, buried ducts provides a high-performance and cost-effective solution for delivering conditioned air throughout the building. This report outlines research activities that are expected to facilitate adoption of compact buried duct systems by builders. The results of this research would be scalable to many new house designs in most climates and markets, leading to wider industry acceptance and building code and energy program approval.
A high performance parallel algorithm for 1-D FFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, R.C.; Gustavson, F.G.; Zubair, M.
1994-12-31
In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
Feasibility of optically interconnected parallel processors using wavelength division multiplexing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deri, R.J.; De Groot, A.J.; Haigh, R.E.
1996-03-01
New national security demands require enhanced computing systems for nearly ab initio simulations of extremely complex systems and analyzing unprecedented quantities of remote sensing data. This computational performance is being sought using parallel processing systems, in which many less powerful processors are ganged together to achieve high aggregate performance. Such systems require increased capability to communicate information between individual processor and memory elements. As it is likely that the limited performance of today`s electronic interconnects will prevent the system from achieving its ultimate performance, there is great interest in using fiber optic technology to improve interconnect communication. However, little informationmore » is available to quantify the requirements on fiber optical hardware technology for this application. Furthermore, we have sought to explore interconnect architectures that use the complete communication richness of the optical domain rather than using optics as a simple replacement for electronic interconnects. These considerations have led us to study the performance of a moderate size parallel processor with optical interconnects using multiple optical wavelengths. We quantify the bandwidth, latency, and concurrency requirements which allow a bus-type interconnect to achieve scalable computing performance using up to 256 nodes, each operating at GFLOP performance. Our key conclusion is that scalable performance, to {approx}150 GFLOPS, is achievable for several scientific codes using an optical bus with a small number of WDM channels (8 to 32), only one WDM channel received per node, and achievable optoelectronic bandwidth and latency requirements. 21 refs. , 10 figs.« less
Pamela M. Kinsey
2015-09-30
The work evaluates, develops and demonstrates flexible, scalable mineral extraction technology for geothermal brines based upon solid phase sorbent materials with a specific focus upon rare earth elements (REEs). The selected organic and inorganic sorbent materials demonstrated high performance for collection of trace REEs, precious and valuable metals. The nanostructured materials typically performed better than commercially available sorbents. Data contains organic and inorganic sorbent removal efficiency, Sharkey Hot Springs (Idaho) water chemsitry analysis, and rare earth removal efficiency from select sorbents.
Ding, Yuxiao; Klyushin, Alexander; Huang, Xing; Jones, Travis; Teschner, Detre; Girgsdies, Frank; Rodenas, Tania; Schlögl, Robert; Heumann, Saskia
2018-03-19
By taking inspiration from the catalytic properties of single-site catalysts and the enhancement of performance through ionic liquids on metal catalysts, we exploited a scalable way to place single cobalt ions on a carbon-nanotube surface bridged by polymerized ionic liquid. Single dispersed cobalt ions coordinated by ionic liquid are used as heterogeneous catalysts for the oxygen evolution reaction (OER). Performance data reveals high activity and stable operation without chemical instability. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, Adam
2007-05-22
MpiGraph consists of an MPI application called mpiGraph written in C to measure message bandwidth and an associated crunch_mpiGraph script written in Perl to process the application output into an HTMO report. The mpiGraph application is designed to inspect the health and scalability of a high-performance interconnect while under heavy load. This is useful to detect hardware and software problems in a system, such as slow nodes, links, switches, or contention in switch routing. It is also useful to characterize how interconnect performance changes with different settings or how one interconnect type compares to another.
Climate Data Assimilation on a Massively Parallel Supercomputer
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.
Peterson, Kevin J.; Pathak, Jyotishman
2014-01-01
Automated execution of electronic Clinical Quality Measures (eCQMs) from electronic health records (EHRs) on large patient populations remains a significant challenge, and the testability, interoperability, and scalability of measure execution are critical. The High Throughput Phenotyping (HTP; http://phenotypeportal.org) project aligns with these goals by using the standards-based HL7 Health Quality Measures Format (HQMF) and Quality Data Model (QDM) for measure specification, as well as Common Terminology Services 2 (CTS2) for semantic interpretation. The HQMF/QDM representation is automatically transformed into a JBoss® Drools workflow, enabling horizontal scalability via clustering and MapReduce algorithms. Using Project Cypress, automated verification metrics can then be produced. Our results show linear scalability for nine executed 2014 Center for Medicare and Medicaid Services (CMS) eCQMs for eligible professionals and hospitals for >1,000,000 patients, and verified execution correctness of 96.4% based on Project Cypress test data of 58 eCQMs. PMID:25954459
Hyperspectral Cubesat Constellation for Rapid Natural Hazard Response
NASA Astrophysics Data System (ADS)
Mandl, D.; Huemmrich, K. F.; Ly, V. T.; Handy, M.; Ong, L.; Crum, G.
2015-12-01
With the advent of high performance space networks that provide total coverage for Cubesats, the paradigm for low cost, high temporal coverage with hyperspectral instruments becomes more feasible. The combination of ground cloud computing resources, high performance with low power consumption onboard processing, total coverage for the cubesats and social media provide an opprotunity for an architecture that provides cost-effective hyperspectral data products for natural hazard response and decision support. This paper provides a series of pathfinder efforts to create a scalable Intelligent Payload Module(IPM) that has flown on a variety of airborne vehicles including Cessna airplanes, Citation jets and a helicopter and will fly on an Unmanned Aerial System (UAS) hexacopter to monitor natural phenomena. The IPM's developed thus far were developed on platforms that emulate a satellite environment which use real satellite flight software, real ground software. In addition, science processing software has been developed that perform hyperspectral processing onboard using various parallel processing techniques to enable creation of onboard hyperspectral data products while consuming low power. A cubesat design was developed that is low cost and that is scalable to larger consteallations and thus can provide daily hyperspectral observations for any spot on earth. The design was based on the existing IPM prototypes and metrics that were developed over the past few years and a shrunken IPM that can perform up to 800 Mbps throughput. Thus this constellation of hyperspectral cubesats could be constantly monitoring spectra with spectral angle mappers after Level 0, Level 1 Radiometric Correction, Atmospheric Correction processing. This provides the opportunity daily monitoring of any spot on earth on a daily basis at 30 meter resolution which is not available today.
PyMCT: A Very High Level Language Coupling Tool For Climate System Models
NASA Astrophysics Data System (ADS)
Tobis, M.; Pierrehumbert, R. T.; Steder, M.; Jacob, R. L.
2006-12-01
At the Climate Systems Center of the University of Chicago, we have been examining strategies for applying agile programming techniques to complex high-performance modeling experiments. While the "agile" development methodology differs from a conventional requirements process and its associated milestones, the process remain a formal one. It is distinguished by continuous improvement in functionality, large numbers of small releases, extensive and ongoing testing strategies, and a strong reliance on very high level languages (VHLL). Here we report on PyMCT, which we intend as a core element in a model ensemble control superstructure. PyMCT is a set of Python bindings for MCT, the Fortran-90 based Model Coupling Toolkit, which forms the infrastructure for the inter-component communication in the Community Climate System Model (CCSM). MCT provides a scalable model communication infrastructure. In order to take maximum advantage of agile software development methodologies, we exposed MCT functionality to Python, a prominent VHLL. We describe how the scalable architecture of MCT allows us to overcome the relatively weak runtime performance of Python, so that the performance of the combined system is not severely impacted. To demonstrate these advantages, we reimplemented the CCSM coupler in Python. While this alone offers no new functionality, it does provide a rigorous test of PyMCT functionality and performance. We reimplemented the CPL6 library, presenting an interesting case study of the comparison between conventional Fortran-90 programming and the higher abstraction level provided by a VHLL. The powerful abstractions provided by Python will allow much more complex experimental paradigms. In particular, we hope to build on the scriptability of our coupling strategy to enable systematic sensitivity tests. Our most ambitious objective is to combine our efforts with Bayesian inverse modeling techniques toward objective tuning at the highest level, across model architectures.
Model-Based Self-Tuning Multiscale Method for Combustion Control
NASA Technical Reports Server (NTRS)
Le, Dzu, K.; DeLaat, John C.; Chang, Clarence T.; Vrnak, Daniel R.
2006-01-01
A multi-scale representation of the combustor dynamics was used to create a self-tuning, scalable controller to suppress multiple instability modes in a liquid-fueled aero engine-derived combustor operating at engine-like conditions. Its self-tuning features designed to handle the uncertainties in the combustor dynamics and time-delays are essential for control performance and robustness. The controller was implemented to modulate a high-frequency fuel valve with feedback from dynamic pressure sensors. This scalable algorithm suppressed pressure oscillations of different instability modes by as much as 90 percent without the peak-splitting effect. The self-tuning logic guided the adjustment of controller parameters and converged quickly toward phase-lock for optimal suppression of the instabilities. The forced-response characteristics of the control model compare well with those of the test rig on both the frequency-domain and the time-domain.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Zhen; Klein, Talysa R.; Kim, Dong Hoe
Perovskite materials use earth-abundant elements, have low formation energies for deposition and are compatible with roll-to-roll and other high-volume manufacturing techniques. These features make perovskite solar cells (PSCs) suitable for terawatt-scale energy production with low production costs and low capital expenditure. Demonstrations of performance comparable to that of other thin-film photovoltaics (PVs) and improvements in laboratory-scale cell stability have recently made scale up of this PV technology an intense area of research focus. Here, we review recent progress and challenges in scaling up PSCs and related efforts to enable the terawatt-scale manufacturing and deployment of this PV technology. We discussmore » common device and module architectures, scalable deposition methods and progress in the scalable deposition of perovskite and charge-transport layers. We also provide an overview of device and module stability, module-level characterization techniques and techno-economic analyses of perovskite PV modules.« less
Shen, Yiwen; Hattink, Maarten H N; Samadi, Payman; Cheng, Qixiang; Hu, Ziyiz; Gazman, Alexander; Bergman, Keren
2018-04-16
Silicon photonics based switches offer an effective option for the delivery of dynamic bandwidth for future large-scale Datacom systems while maintaining scalable energy efficiency. The integration of a silicon photonics-based optical switching fabric within electronic Datacom architectures requires novel network topologies and arbitration strategies to effectively manage the active elements in the network. We present a scalable software-defined networking control plane to integrate silicon photonic based switches with conventional Ethernet or InfiniBand networks. Our software-defined control plane manages both electronic packet switches and multiple silicon photonic switches for simultaneous packet and circuit switching. We built an experimental Dragonfly network testbed with 16 electronic packet switches and 2 silicon photonic switches to evaluate our control plane. Observed latencies occupied by each step of the switching procedure demonstrate a total of 344 µs control plane latency for data-center and high performance computing platforms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implementedmore » and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lorenz, Daniel; Wolf, Felix
2016-02-17
The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics,more » hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universität Darmstadt (TU Darmstadt) starting February 1st, 2015, the project ended at GRS on January 31st, 2015. This report reflects the work accomplished at GRS until then. The work of GRS is expected to be continued at TU Darmstadt. The first main accomplishment of GRS is the design of different thread-level aggregation techniques. We created a prototype capable of aggregating the thread-level information in performance profiles using these techniques. The next step will be the integration of the most promising techniques into the Score-P measurement system and their evaluation. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving the design of the system-tree representation in Score-P’s profile format. We developed a new representation and a distributed algorithm to create the scalable system tree representation. Finally, we developed a lightweight approach to MPI wait-state profiling. Former algorithms either needed piggy-backing, which can cause significant runtime overhead, or tracing, which comes with its own set of scaling challenges. Our approach works with local data only and, thus, is scalable and has very little overhead.« less
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S
2018-06-01
Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.
Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng
2018-01-01
Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
The TeraShake Computational Platform for Large-Scale Earthquake Simulations
NASA Astrophysics Data System (ADS)
Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas
Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
Towards Scalable Graph Computation on Mobile Devices.
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2014-10-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2015-01-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
Zhang, Hao; Zhang, Mengru; Zhang, Meiling; Zhang, Lin; Zhang, Anping; Zhou, Yiming; Wu, Ping; Tang, Yawen
2017-09-01
Nanoporous networks of tin-based alloys immobilized within carbon matrices possess unique structural and compositional superiorities toward lithium-storage, and are expected to manifest improved strain-accommodation and charge-transport capabilities and thus desirable anodic performance for advanced lithium-ion batteries (LIBs). Herein, a facile and scalable hybrid aerogel-derived thermal-autoreduction route has been developed for the construction of nanoporous network of SnNi alloy immobilized within carbon/graphene dual matrices (SnNi@C/G network). When applied as an anode material for LIBs, the SnNi@C/G network manifests desirable lithium-storage performances in terms of specific capacities, cycle life, and rate capability. The facile aerogel-derived route and desirable Li-storage performance of the SnNi@C/G network facilitate its practical application as a high-capacity, long-life, and high-rate anode material for advanced LIBs. Copyright © 2017 Elsevier Inc. All rights reserved.
Characterization of MoS2-Graphene Composites for High-Performance Coin Cell Supercapacitors.
Bissett, Mark A; Kinloch, Ian A; Dryfe, Robert A W
2015-08-12
Two-dimensional materials, such as graphene and molybdenum disulfide (MoS2), can greatly increase the performance of electrochemical energy storage devices because of the combination of high surface area and electrical conductivity. Here, we have investigated the performance of solution exfoliated MoS2 thin flexible membranes as supercapacitor electrodes in a symmetrical coin cell arrangement using an aqueous electrolyte (Na2SO4). By adding highly conductive graphene to form nanocomposite membranes, it was possible to increase the specific capacitance by reducing the resistivity of the electrode and altering the morphology of the membrane. With continued charge/discharge cycles the performance of the membranes was found to increase significantly (up to 800%), because of partial re-exfoliation of the layered material with continued ion intercalation, as well as increasing the specific capacitance through intercalation pseudocapacitance. These results demonstrate a simple and scalable application of layered 2D materials toward electrochemical energy storage.
Programming with BIG data in R: Scaling analytics from one to thousands of nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmidt, Drew; Chen, Wei -Chen; Matheson, Michael A.
Here, we present a tutorial overview showing how one can achieve scalable performance with R. We do so by utilizing several package extensions, including those from the pbdR project. These packages consist of high performance, high-level interfaces to and extensions of MPI, PBLAS, ScaLAPACK, I/O libraries, profiling libraries, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and often, surprisingly, even on a laptop with only two cores. Our tutorial begins with recommendations on how to get more performance out of your R code before considering parallel implementations. Because Rmore » is a high-level language, a function can have a deep hierarchy of operations. For big data, this can easily lead to inefficiency. Profiling is an important tool to understand the performance of an R code for both serial and parallel improvements.« less
Programming with BIG data in R: Scaling analytics from one to thousands of nodes
Schmidt, Drew; Chen, Wei -Chen; Matheson, Michael A.; ...
2016-11-09
Here, we present a tutorial overview showing how one can achieve scalable performance with R. We do so by utilizing several package extensions, including those from the pbdR project. These packages consist of high performance, high-level interfaces to and extensions of MPI, PBLAS, ScaLAPACK, I/O libraries, profiling libraries, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and often, surprisingly, even on a laptop with only two cores. Our tutorial begins with recommendations on how to get more performance out of your R code before considering parallel implementations. Because Rmore » is a high-level language, a function can have a deep hierarchy of operations. For big data, this can easily lead to inefficiency. Profiling is an important tool to understand the performance of an R code for both serial and parallel improvements.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanjib, Das; Yang, Bin; Gu, Gong
Realizing the commercialization of high-performance and robust perovskite solar cells urgently requires the development of economically scalable processing techniques. Here we report a high-throughput ultrasonic spray-coating (USC) process capable of fabricating perovskite film-based solar cells on glass substrates with power conversion efficiency (PCE) as high as 13.04%. Perovskite films with high uniformity, crystallinity, and surface coverage are obtained in a single step. Moreover, we report USC processing on TiOx/ITO-coated polyethylene terephthalate (PET) substrates to realize flexible perovskite solar cells with PCE as high as 8.02% that are robust under mechanical stress. In this case, an optical curing technique was usedmore » to achieve a highly-conductive TiOx layer on flexible PET substrates for the first time. The high device performance and reliability obtained by this combination of USC processing with optical curing appears very promising for roll-to-roll manufacturing of high-efficiency, flexible perovskite solar cells.« less
Input-independent, Scalable and Fast String Matching on the Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Chavarría-Miranda, Daniel; Maschhoff, Kristyn J
2009-05-25
String searching is at the core of many security and network applications like search engines, intrusion detection systems, virus scanners and spam filters. The growing size of on-line content and the increasing wire speeds push the need for fast, and often real- time, string searching solutions. For these conditions, many software implementations (if not all) targeting conventional cache-based microprocessors do not perform well. They either exhibit overall low performance or exhibit highly variable performance depending on the types of inputs. For this reason, real-time state of the art solutions rely on the use of either custom hardware or Field-Programmable Gatemore » Arrays (FPGAs) at the expense of overall system flexibility and programmability. This paper presents a software based implementation of the Aho-Corasick string searching algorithm on the Cray XMT multithreaded shared memory machine. Our so- lution relies on the particular features of the XMT architecture and on several algorith- mic strategies: it is fast, scalable and its performance is virtually content-independent. On a 128-processor Cray XMT, it reaches a scanning speed of ≈ 28 Gbps with a performance variability below 10 %. In the 10 Gbps performance range, variability is below 2.5%. By comparison, an Intel dual-socket, 8-core system running at 2.66 GHz achieves a peak performance which varies from 500 Mbps to 10 Gbps depending on the type of input and dictionary size.« less
Sun, Gongchen; Senapati, Satyajyoti; Chang, Hsueh-Chia
2016-04-07
A microfluidic ion exchange membrane hybrid chip is fabricated using polymer-based, lithography-free methods to achieve ionic diode, transistor and amplifier functionalities with the same four-terminal design. The high ionic flux (>100 μA) feature of the chip can enable a scalable integrated ionic circuit platform for micro-total-analytical systems.
NASA Astrophysics Data System (ADS)
Mapakshi, N. K.; Chang, J.; Nakshatrala, K. B.
2018-04-01
Mathematical models for flow through porous media typically enjoy the so-called maximum principles, which place bounds on the pressure field. It is highly desirable to preserve these bounds on the pressure field in predictive numerical simulations, that is, one needs to satisfy discrete maximum principles (DMP). Unfortunately, many of the existing formulations for flow through porous media models do not satisfy DMP. This paper presents a robust, scalable numerical formulation based on variational inequalities (VI), to model non-linear flows through heterogeneous, anisotropic porous media without violating DMP. VI is an optimization technique that places bounds on the numerical solutions of partial differential equations. To crystallize the ideas, a modification to Darcy equations by taking into account pressure-dependent viscosity will be discretized using the lowest-order Raviart-Thomas (RT0) and Variational Multi-scale (VMS) finite element formulations. It will be shown that these formulations violate DMP, and, in fact, these violations increase with an increase in anisotropy. It will be shown that the proposed VI-based formulation provides a viable route to enforce DMP. Moreover, it will be shown that the proposed formulation is scalable, and can work with any numerical discretization and weak form. A series of numerical benchmark problems are solved to demonstrate the effects of heterogeneity, anisotropy and non-linearity on DMP violations under the two chosen formulations (RT0 and VMS), and that of non-linearity on solver convergence for the proposed VI-based formulation. Parallel scalability on modern computational platforms will be illustrated through strong-scaling studies, which will prove the efficiency of the proposed formulation in a parallel setting. Algorithmic scalability as the problem size is scaled up will be demonstrated through novel static-scaling studies. The performed static-scaling studies can serve as a guide for users to be able to select an appropriate discretization for a given problem size.
Providing scalable system software for high-end simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greenberg, D.
1997-12-31
Detailed, full-system, complex physics simulations have been shown to be feasible on systems containing thousands of processors. In order to manage these computer systems it has been necessary to create scalable system services. In this talk Sandia`s research on scalable systems will be described. The key concepts of low overhead data movement through portals and of flexible services through multi-partition architectures will be illustrated in detail. The talk will conclude with a discussion of how these techniques can be applied outside of the standard monolithic MPP system.
Chai, Zhimin; Abbasi, Salman A; Busnaina, Ahmed A
2018-05-30
Assembly of organic semiconductors with ordered crystal structure has been actively pursued for electronics applications such as organic field-effect transistors (OFETs). Among various film deposition methods, solution-based film growth from small molecule semiconductors is preferable because of its low material and energy consumption, low cost, and scalability. Here, we show scalable and controllable directed assembly of highly crystalline 2,7-dioctyl[1]benzothieno[3,2- b][1]benzothiophene (C8-BTBT) films via a dip-coating process. Self-aligned stripe patterns with tunable thickness and morphology over a centimeter scale are obtained by adjusting two governing parameters: the pulling speed of a substrate and the solution concentration. OFETs are fabricated using the C8-BTBT films assembled at various conditions. A field-effect hole mobility up to 3.99 cm 2 V -1 s -1 is obtained. Owing to the highly scalable crystalline film formation, the dip-coating directed assembly process could be a great candidate for manufacturing next-generation electronics. Meanwhile, the film formation mechanism discussed in this paper could provide a general guideline to prepare other organic semiconducting films from small molecule solutions.
Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju
2014-01-01
A tracking service like asset management is essential in a dynamic hospital environment consisting of numerous mobile assets (e.g., wheelchairs or infusion pumps) that are continuously relocated throughout a hospital. The tracking service is accomplished based on the key technologies of an indoor location-based service (LBS), such as locating and monitoring multiple mobile targets inside a building in real time. An indoor LBS such as a tracking service entails numerous resource lookups being requested concurrently and frequently from several locations, as well as a network infrastructure requiring support for high scalability in indoor environments. A traditional centralized architecture needs to maintain a geographic map of the entire building or complex in its central server, which can cause low scalability and traffic congestion. This paper presents a self-organizing and fully distributed indoor mobile asset management (MAM) platform, and proposes an architecture for multiple trackees (such as mobile assets) and trackers based on the proposed distributed platform in real time. In order to verify the suggested platform, scalability performance according to increases in the number of concurrent lookups was evaluated in a real test bed. Tracking latency and traffic load ratio in the proposed tracking architecture was also evaluated. PMID:24662407
Electro-chemical arsenic remediation: field trials in West Bengal.
Amrose, Susan E; Bandaru, Siva R S; Delaire, Caroline; van Genuchten, Case M; Dutta, Amit; DebSarkar, Anupam; Orr, Christopher; Roy, Joyashree; Das, Abhijit; Gadgil, Ashok J
2014-08-01
Millions of people in rural South Asia are exposed to high levels of arsenic through groundwater used for drinking. Many deployed arsenic remediation technologies quickly fail because they are not maintained, repaired, accepted, or affordable. It is therefore imperative that arsenic remediation technologies be evaluated for their ability to perform within a sustainable and scalable business model that addresses these challenges. We present field trial results of a 600 L Electro-Chemical Arsenic Remediation (ECAR) reactor operating over 3.5 months in West Bengal. These results are evaluated through the lens of a community scale micro-utility business model as a potential sustainable and scalable safe water solution for rural communities in South Asia. We demonstrate ECAR's ability to consistently reduce arsenic concentrations of ~266 μg/L to <5 μg/L in real groundwater, simultaneously meeting the international standards for iron and aluminum in drinking water. ECAR operating costs (amortized capital plus consumables) are estimated as $0.83-$1.04/m(3) under realistic conditions. We discuss the implications of these results against the constraints of a sustainable and scalable business model to argue that ECAR is a promising technology to help provide a clean water solution in arsenic-affected areas of South Asia. Copyright © 2013 Elsevier B.V. All rights reserved.
Blueprint for a microwave trapped ion quantum computer
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.
2017-01-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154
Luque, Joaquín; Larios, Diego F; Personal, Enrique; Barbancho, Julio; León, Carlos
2016-05-18
Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance.
Luque, Joaquín; Larios, Diego F.; Personal, Enrique; Barbancho, Julio; León, Carlos
2016-01-01
Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance. PMID:27213375
Jumping-Droplet-Enhanced Condensation on Scalable Superhydrophobic Nanostructured Surfaces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miljkovic, N; Enright, R; Nam, Y
When droplets coalesce on a superhydrophobic nanostructured surface, the resulting droplet can jump from the surface due to the release of excess surface energy. If designed properly, these superhydrophobic nanostructured surfaces can not only allow for easy droplet removal at micrometric length scales during condensation but also promise to enhance heat transfer performance. However, the rationale for the design of an ideal nanostructured surface as well as heat transfer experiments demonstrating the advantage of this jumping behavior are lacking. Here, we show that silanized copper oxide surfaces created via a simple fabrication method can achieve highly efficient jumping-droplet condensation heatmore » transfer. We experimentally demonstrated a 25% higher overall heat flux and 30% higher condensation heat transfer coefficient compared to state-of-the-art hydrophobic condensing surfaces at low supersaturations (<1.12). This work not only shows significant condensation heat transfer enhancement but also promises a low cost and scalable approach to increase efficiency for applications such as atmospheric water harvesting and dehumidification. Furthermore, the results offer insights and an avenue to achieve high flux superhydrophobic condensation.« less
Perovskite ink with wide processing window for scalable high-efficiency solar cells
Yang, Mengjin; Li, Zhen; Reese, Matthew O.; ...
2017-03-20
Perovskite solar cells have made tremendous progress using laboratory-scale spin-coating methods in the past few years owing to advances in controls of perovskite film deposition. However, devices made via scalable methods are still lagging behind state-of-the-art spin-coated devices because of the complicated nature of perovskite crystallization from a precursor state. Here we demonstrate a chlorine-containing methylammonium lead iodide precursor formulation along with solvent tuning to enable a wide precursor-processing window (up to ~8 min) and a rapid grain growth rate (as short as ~1 min). Coupled with antisolvent extraction, this precursor ink delivers high-quality perovskite films with large-scale uniformity. Themore » ink can be used by both spin-coating and blade-coating methods with indistinguishable film morphology and device performance. Using a blade-coated absorber, devices with 0.12-cm 2 and 1.2-cm 2 areas yield average efficiencies of 18.55% and 17.33%, respectively. As a result, we further demonstrate a 12.6-cm 2 four-cell module (88% geometric fill factor) with 13.3% stabilized active-area efficiency output.« less
Open release of the DCA++ project
NASA Astrophysics Data System (ADS)
Haehner, Urs; Solca, Raffaele; Staar, Peter; Alvarez, Gonzalo; Maier, Thomas; Summers, Michael; Schulthess, Thomas
We present the first open release of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems with cutting edge quantum cluster algorithms. The implemented dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy capture nonlocal correlations in strongly correlated electron systems thereby allowing insight into high-Tc superconductivity. With the increasing heterogeneity of modern machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU-GPU and Xeon Phi, sustaining multiple petaflops on ORNL's Titan and CSCS' Piz Daint. Moreover, we will describe how best practices in software engineering can be applied to make software development sustainable and scalable in a research group. Software testing and documentation not only prevent productivity collapse, but more importantly, they are necessary for correctness, credibility and reproducibility of scientific results. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Numerical Simulations of Reacting Flows Using Asynchrony-Tolerant Schemes for Exascale Computing
NASA Astrophysics Data System (ADS)
Cleary, Emmet; Konduri, Aditya; Chen, Jacqueline
2017-11-01
Communication and data synchronization between processing elements (PEs) are likely to pose a major challenge in scalability of solvers at the exascale. Recently developed asynchrony-tolerant (AT) finite difference schemes address this issue by relaxing communication and synchronization between PEs at a mathematical level while preserving accuracy, resulting in improved scalability. The performance of these schemes has been validated for simple linear and nonlinear homogeneous PDEs. However, many problems of practical interest are governed by highly nonlinear PDEs with source terms, whose solution may be sensitive to perturbations caused by communication asynchrony. The current work applies the AT schemes to combustion problems with chemical source terms, yielding a stiff system of PDEs with nonlinear source terms highly sensitive to temperature. Examples shown will use single-step and multi-step CH4 mechanisms for 1D premixed and nonpremixed flames. Error analysis will be discussed both in physical and spectral space. Results show that additional errors introduced by the AT schemes are negligible and the schemes preserve their accuracy. We acknowledge funding from the DOE Computational Science Graduate Fellowship administered by the Krell Institute.
Low-cost scalable quartz crystal microbalance array for environmental sensing
NASA Astrophysics Data System (ADS)
Muckley, Eric S.; Anazagasty, Cristain; Jacobs, Christopher B.; Hianik, Tibor; Ivanov, Ilia N.
2016-09-01
Proliferation of environmental sensors for internet of things (IoT) applications has increased the need for low-cost platforms capable of accommodating multiple sensors. Quartz crystal microbalance (QCM) crystals coated with nanometer-thin sensor films are suitable for use in high-resolution ( 1 ng) selective gas sensor applications. We demonstrate a scalable array for measuring frequency response of six QCM sensors controlled by low-cost Arduino microcontrollers and a USB multiplexer. Gas pulses and data acquisition were controlled by a LabVIEW user interface. We test the sensor array by measuring the frequency shift of crystals coated with different compositions of polymer composites based on poly(3,4-ethylenedioxythiophene):polystyrene sulfonate (PEDOT:PSS) while films are exposed to water vapor and oxygen inside a controlled environmental chamber. Our sensor array exhibits comparable performance to that of a commercial QCM system, while enabling high-throughput 6 QCM testing for under $1,000. We use deep neural network structures to process sensor response and demonstrate that the QCM array is suitable for gas sensing, environmental monitoring, and electronic-nose applications.
Perovskite ink with wide processing window for scalable high-efficiency solar cells
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Mengjin; Li, Zhen; Reese, Matthew O.
Perovskite solar cells have made tremendous progress using laboratory-scale spin-coating methods in the past few years owing to advances in controls of perovskite film deposition. However, devices made via scalable methods are still lagging behind state-of-the-art spin-coated devices because of the complicated nature of perovskite crystallization from a precursor state. Here we demonstrate a chlorine-containing methylammonium lead iodide precursor formulation along with solvent tuning to enable a wide precursor-processing window (up to ~8 min) and a rapid grain growth rate (as short as ~1 min). Coupled with antisolvent extraction, this precursor ink delivers high-quality perovskite films with large-scale uniformity. Themore » ink can be used by both spin-coating and blade-coating methods with indistinguishable film morphology and device performance. Using a blade-coated absorber, devices with 0.12-cm 2 and 1.2-cm 2 areas yield average efficiencies of 18.55% and 17.33%, respectively. As a result, we further demonstrate a 12.6-cm 2 four-cell module (88% geometric fill factor) with 13.3% stabilized active-area efficiency output.« less
NASA Astrophysics Data System (ADS)
Cai, Hongyan; Han, Kai; Jiang, Heng; Wang, Jingwen; Liu, Hui
2017-10-01
Silicon/carbon (Si/C) composite shows great potential to replace graphite as lithium-ion battery (LIB) anode owing to its high theoretical capacity. Exploring low-cost scalable approach for synthesizing Si/C composites with excellent electrochemical performance is critical for practical application of Si/C anodes. In this study, we rationally applied a scalable in situ approach to produce Si-carbon nanotube (Si-CNT) composite via acid etching of commercial inexpensive micro-sized Al-Si alloy powder and CNT mixture. In the Si-CNT composite, ∼10 nm Si particles were uniformly deposited on the CNT surface. After combining with graphene sheets, a flexible self-standing Si-CNT/graphene paper was fabricated with three-dimensional (3D) sandwich-like structure. The in situ presence of CNT during acid-etching process shows remarkable two advantages: providing deposition sites for Si atoms to restrain agglomeration of Si nanoparticles after Al removal from Al-Si alloy powder, increasing the cross-layer conductivity of the paper anode to provide excellent conductive contact sites for each Si nanoparticles. When used as binder-free anode for LIBs without any further treatment, in situ addition of CNT especially plays important role to improve the initial electrochemical activity of Si nanoparticles synthesized from low-cost Al-Si alloy powder, thus resulting in about twice higher capacity than Si/G paper anode. The self-standing Si-CNT/graphene paper anode exhibited a high specific capacity of 1100 mAh g-1 even after 100 cycles at 200 mA g-1 current density with a Coulombic efficiency of >99%. It also showed remarkable rate capability improvement compared to Si/G paper without CNT. The present work demonstrates a low-cost scalable in situ approach from commercial micro-sized Al-Si alloy powder for Si-based composites with specific nanostructure. The Si-CNT/graphene paper is a promising anode candidate with high capacity and cycling stability for LIBs, especially for the flexible batteries application.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
Tunable Nitride Josephson Junctions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Missert, Nancy A.; Henry, Michael David; Lewis, Rupert M.
We have developed an ambient temperature, SiO 2/Si wafer - scale process for Josephson junctions based on Nb electrodes and Ta x N barriers with tunable electronic properties. The films are fabricated by magnetron sputtering. The electronic properties of the Ta xN barriers are controlled by adjusting the nitrogen flow during sputtering. This technology offers a scalable alternative to the more traditional junctions based on AlO x barriers for low - power, high - performance computing.
2004-02-01
UNCLASSIFIED − Conducted experiments to determine the usability of general-purpose anomaly detection algorithms to monitor a large, complex military...reaction and detection modules to perform tailored analysis sequences to monitor environmental conditions, health hazards and physiological states...scalability of lab proven anomaly detection techniques for intrusion detection in real world high volume environments. Narrative Title FY 2003
History of Significant Vehicle and Fuel Introductions in the United States
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shirk, Matthew; Alleman, Teresa; Melendez, Margo
This is one of a series of reports produced as a result of the Co-Optimization of Fuels & Engines (Co-Optima) project, a Department of Energy (DOE)-sponsored multi-agency project initiated to accelerate the introduction of affordable, scalable, and sustainable biofuels and high-efficiency, low-emission vehicle engines. The simultaneous fuels and vehicles research and development is designed to deliver maximum energy savings, emissions reduction, and on-road performance.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
Parallel Unsteady Overset Mesh Methodology for Adaptive and Moving Grids with Multiple Solvers
2010-01-01
Research Laboratory Hampton, Virginia Jayanarayanan Sitaraman National Institute of Aerospace Hampton, Virginia ABSTRACT This paper describes a new...Army Research Laboratory ,Hampton, VA, , , 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) NATO/RTO...results section ( 3.6 and 3.5). Good linear scalability was observed for all three cases up to 12 processors. Beyond that the scalability drops off
On the Suitability of MPI as a PGAS Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Vishnu, Abhinav; Palmer, Bruce J.
2014-12-18
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes {\\em get, put,} and {\\em atomic memory operations}. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA,more » as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speed-up on 1008 processors over the other MPI-based designs.« less
Designing and application of SAN extension interface based on CWDM
NASA Astrophysics Data System (ADS)
Qin, Leihua; Yu, Shengsheng; Zhou, Jingli
2005-11-01
As Fibre Channel (FC) becomes the protocol of choice within corporate data centers, enterprises are increasingly deploying SANs in their data central. In order to mitigate the risk of losing data and improve the availability of data, more and more enterprises are increasingly adopting storage extension technologies to replicate their business critical data to a secondary site. Transmitting this information over distance requires a carrier grade environment with zero data loss, scalable throughput, low jitter, high security and ability to travel long distance. To address this business requirements, there are three basic architectures for storage extension, they are Storage over Internet Protocol, Storage over Synchronous Optical Network/Synchronous Digital Hierarchy (SONET/SDH) and Storage over Dense Wavelength Division Multiplexing (DWDM). Each approach varies in functionality, complexity, cost, scalability, security, availability , predictable behavior (bandwidth, jitter, latency) and multiple carrier limitations. Compared with these connectiviy technologies,Coarse Wavelength Division Multiplexing (CWDM) is a Simplified, Low Cost and High Performance connectivity solutions for enterprises to deploy their storage extension. In this paper, we design a storage extension connectivity over CWDM and test it's electrical characteristic and random read and write performance of disk array through the CWDM connectivity, testing result show us that the performance of the connectivity over CWDM is acceptable. Furthermore, we propose three kinds of network architecture of SAN extension based on CWDM interface. Finally the credit-Based flow control mechanism of FC, and the relationship between credits and extension distance is analyzed.
Heo, Jae Sang; Kim, Taehoon; Ban, Seok-Gyu; Kim, Daesik; Lee, Jun Ho; Jur, Jesse S; Kim, Myung-Gil; Kim, Yong-Hoon; Hong, Yongtaek; Park, Sung Kyu
2017-08-01
The realization of large-area electronics with full integration of 1D thread-like devices may open up a new era for ultraflexible and human adaptable electronic systems because of their potential advantages in demonstrating scalable complex circuitry by a simply integrated weaving technology. More importantly, the thread-like fiber electronic devices can be achieved using a simple reel-to-reel process, which is strongly required for low-cost and scalable manufacturing technology. Here, high-performance reel-processed complementary metal-oxide-semiconductor (CMOS) integrated circuits are reported on 1D fiber substrates by using selectively chemical-doped single-walled carbon nanotube (SWCNT) transistors. With the introduction of selective n-type doping and a nonrelief photochemical patterning process, p- and n-type SWCNT transistors are successfully implemented on cylindrical fiber substrates under air ambient, enabling high-performance and reliable thread-like CMOS inverter circuits. In addition, it is noteworthy that the optimized reel-coating process can facilitate improvement in the arrangement of SWCNTs, building uniformly well-aligned SWCNT channels, and enhancement of the electrical performance of the devices. The p- and n-type SWCNT transistors exhibit field-effect mobility of 4.03 and 2.15 cm 2 V -1 s -1 , respectively, with relatively narrow distribution. Moreover, the SWCNT CMOS inverter circuits demonstrate a gain of 6.76 and relatively good dynamic operation at a supply voltage of 5.0 V. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Zhang, Hongwei; Sun, Xiaoran; Huang, Xiaodan; Zhou, Liang
2015-02-01
A novel ``spray drying-carbonization-oxidation'' strategy has been developed for the fabrication of α-Fe2O3-graphitic carbon (α-Fe2O3@GC) composite microspheres, in which α-Fe2O3 nanoparticles with sizes of 30-50 nm are well-encapsulated by onion-like graphitic carbon shells with a thickness of 5-10 nm. In the constructed composite, the α-Fe2O3 nanoparticles act as the primary active material, providing a high capacity. Meanwhile, the graphitic carbon shells serve as the secondary active component, structural stabilizer, interfacial stabilizer, and electron-highway. As a result, the synthesized α-Fe2O3@GC nanocomposite exhibits a superior lithium-ion battery performance with a high reversible capacity (898 mA h g-1 at 400 mA g-1), outstanding rate capability, and excellent cycling stability. Our product, in terms of the facile and scalable preparation process and excellent electrochemical performance, demonstrates its great potential as a high-performance anode material for lithium-ion batteries.A novel ``spray drying-carbonization-oxidation'' strategy has been developed for the fabrication of α-Fe2O3-graphitic carbon (α-Fe2O3@GC) composite microspheres, in which α-Fe2O3 nanoparticles with sizes of 30-50 nm are well-encapsulated by onion-like graphitic carbon shells with a thickness of 5-10 nm. In the constructed composite, the α-Fe2O3 nanoparticles act as the primary active material, providing a high capacity. Meanwhile, the graphitic carbon shells serve as the secondary active component, structural stabilizer, interfacial stabilizer, and electron-highway. As a result, the synthesized α-Fe2O3@GC nanocomposite exhibits a superior lithium-ion battery performance with a high reversible capacity (898 mA h g-1 at 400 mA g-1), outstanding rate capability, and excellent cycling stability. Our product, in terms of the facile and scalable preparation process and excellent electrochemical performance, demonstrates its great potential as a high-performance anode material for lithium-ion batteries. Electronic supplementary information (ESI) available: XRD pattern, XPS spectrum, CV curves, TEM and SEM images, and table. See DOI: 10.1039/c4nr06771a
The Scalable Checkpoint/Restart Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.
The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
Sun, Gongchen; Senapati, Satyajyoti
2016-01-01
A microfluidic-ion exchange membrane hybrid chip is fabricated by polymer-based, lithography-free methods to achieve ionic diode, transistor and amplifier functionalities with the same four-terminal design. The high ionic flux (> 100 μA) feature of the chip can enable a scalable integrated ionic circuit platform for micro-total-analytical systems. PMID:26960551
On the energy footprint of I/O management in Exascale HPC systems
Dorier, Matthieu; Yildiz, Orcun; Ibrahim, Shadi; ...
2016-03-21
The advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. But, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. Tomore » accomplish this, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid'5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. This proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches.« less
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Bisson, M.; Salvadore, F.
2014-10-01
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Motivation and Design of the Sirocco Storage System Version 1.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curry, Matthew Leon; Ward, H. Lee; Danielson, Geoffrey Charles
Sirocco is a massively parallel, high performance storage system for the exascale era. It emphasizes client-to-client coordination, low server-side coupling, and free data movement to improve resilience and performance. Its architecture is inspired by peer-to-peer and victim- cache architectures. By leveraging these ideas, Sirocco natively supports several media types, including RAM, flash, disk, and archival storage, with automatic migration between levels. Sirocco also includes storage interfaces and support that are more advanced than typical block storage. Sirocco enables clients to efficiently use key-value storage or block-based storage with the same interface. It also provides several levels of transactional data updatesmore » within a single storage command, including full ACID-compliant updates. This transaction support extends to updating several objects within a single transaction. Further support is provided for con- currency control, enabling greater performance for workloads while providing safe concurrent modification. By pioneering these and other technologies and techniques in the storage system, Sirocco is poised to fulfill a need for a massively scalable, write-optimized storage system for exascale systems. This is version 1.0 of a document reflecting the current and planned state of Sirocco. Further versions of this document will be accessible at http://www.cs.sandia.gov/Scalable_IO/ sirocco .« less
SP-100 - The national space reactor power system program in response to future needs
NASA Astrophysics Data System (ADS)
Armijo, J. S.; Josloff, A. T.; Bailey, H. S.; Matteo, D. N.
The SP-100 system has been designed to meet comprehensive and demanding NASA/DOD/DOE requirements. The key requirements include: nuclear safety for all mission phases, scalability from 10's to 100's of kWe, reliable performance at full power for seven years of partial power for ten years, survivability in civil or military threat environments, capability to operate autonomously for up to six months, capability to protect payloads from excessive radiation, and compatibility with shuttle and expendable launch vehicles. The authors address of major progress in terms of design, flexibility/scalability, survivability, and development. These areas, with the exception of survivability, are discussed in detail. There has been significant improvement in the generic flight system design with substantial mass savings and simplification that enhance performance and reliability. Design activity has confirmed the scalability and flexibility of the system and the ability to efficiently meet NASA, AF, and SDIO needs. SP-100 development continues to make significant progress in all key technology areas.
Pham, Viet Hung; Dickerson, James H.
2016-02-21
Graphene hydrogels have been considered as ideal materials for high-performance supercapacitors. However, their low volumetric capacitance significantly limits its real application. In this study, we report an environment-friendly and scalable method to prepare high packing density, electrochemically reduced graphene oxide hydrogels (ERGO) for supercapacitor application by the electrophoretic deposition of graphene oxide onto nickel foam, followed by the electrochemical reduction and hydraulic compression of the deposited materials. The as-prepared ERGO on nickel foam was hydraulic compressed up to 20 tons, resulting in an increase of the packing density of ERGO from 0.0098 to 1.32 g cm –3. Consequently, the volumetricmore » capacitance and volumetric energy density of ERGOs greatly increased from 1.58 F cm –3 and 0.053 Wh cm –3 (as-prepared ERGO) to 176.5 F cm –3 and 6.02 Wh cm –3 (ERGO compressed at 20 tons), respectively. The ERGOs also exhibited long-term electrochemical stability with a capacitance retention in the range of approximately 79–90% after 10 000 cycles. Lastly, we believe that these high packing density ERGOs are promising for real-world energy storage devices for which scalable, cost-effective manufacturing is of significance and for which space constraints are paramount.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, Viet Hung; Dickerson, James H.
Graphene hydrogels have been considered as ideal materials for high-performance supercapacitors. However, their low volumetric capacitance significantly limits its real application. In this study, we report an environment-friendly and scalable method to prepare high packing density, electrochemically reduced graphene oxide hydrogels (ERGO) for supercapacitor application by the electrophoretic deposition of graphene oxide onto nickel foam, followed by the electrochemical reduction and hydraulic compression of the deposited materials. The as-prepared ERGO on nickel foam was hydraulic compressed up to 20 tons, resulting in an increase of the packing density of ERGO from 0.0098 to 1.32 g cm –3. Consequently, the volumetricmore » capacitance and volumetric energy density of ERGOs greatly increased from 1.58 F cm –3 and 0.053 Wh cm –3 (as-prepared ERGO) to 176.5 F cm –3 and 6.02 Wh cm –3 (ERGO compressed at 20 tons), respectively. The ERGOs also exhibited long-term electrochemical stability with a capacitance retention in the range of approximately 79–90% after 10 000 cycles. Lastly, we believe that these high packing density ERGOs are promising for real-world energy storage devices for which scalable, cost-effective manufacturing is of significance and for which space constraints are paramount.« less
NASA Astrophysics Data System (ADS)
Nam, Young Jin; Oh, Dae Yang; Jung, Sung Hoo; Jung, Yoon Seok
2018-01-01
Owing to their potential for greater safety, higher energy density, and scalable fabrication, bulk-type all-solid-state lithium-ion batteries (ASLBs) employing deformable sulfide superionic conductors are considered highly promising for applications in battery electric vehicles. While fabrication of sheet-type electrodes is imperative from the practical point of view, reports on relevant research are scarce. This might be attributable to issues that complicate the slurry-based fabrication process and/or issues with ionic contacts and percolation. In this work, we systematically investigate the electrochemical performance of conventional dry-mixed electrodes and wet-slurry fabricated electrodes for ASLBs, by varying the different fractions of solid electrolytes and the mass loading. This information calls for a need to develop well-designed electrodes with better ionic contacts and to improve the ionic conductivity of solid electrolytes. As a scalable proof-of-concept to achieve better ionic contacts, a premixing process for active materials and solid electrolytes is demonstrated to significantly improve electrochemical performance. Pouch-type 80 × 60 mm2 all-solid-state LiNi0·6Co0·2Mn0·2O2/graphite full-cells fabricated by the slurry process show high cell-based energy density (184 W h kg-1 and 432 W h L-1). For the first time, their excellent safety is also demonstrated by simple tests (cutting with scissors and heating at 110 °C).
Hou, Bao-Hua; Wang, Ying-Ying; Guo, Jin-Zhi; Zhang, Yu; Ning, Qiu-Li; Yang, Yang; Li, Wen-Hao; Zhang, Jing-Ping; Wang, Xin-Long; Wu, Xing-Long
2018-01-31
A novel core-shell Fe 3 O 4 @FeS composed of Fe 3 O 4 core and FeS shell with the morphology of regular octahedra has been prepared via a facile and scalable strategy via employing commercial Fe 3 O 4 as the precursor. When used as anode material for sodium-ion batteries (SIBs), the prepared Fe 3 O 4 @FeS combines the merits of FeS and Fe 3 O 4 with high Na-storage capacity and superior cycling stability, respectively. The optimized Fe 3 O 4 @FeS electrode shows ultralong cycle life and outstanding rate capability. For instance, it remains a capacity retention of 90.8% with a reversible capacity of 169 mAh g -1 after 750 cycles at 0.2 A g -1 and 151 mAh g -1 at a high current density of 2 A g -1 , which is about 7.5 times in comparison to the Na-storage capacity of commercial Fe 3 O 4 . More importantly, the prepared Fe 3 O 4 @FeS also exhibits excellent full-cell performance. The assembled Fe 3 O 4 @FeS//Na 3 V 2 (PO 4 ) 2 O 2 F sodium-ion full battery gives a reversible capacity of 157 mAh g -1 after 50 cycles at 0.5 A g -1 with a capacity retention of 92.3% and the Coulombic efficiency of around 100%, demonstrating its applicability for sodium-ion full batteries as a promising anode. Furthermore, it is also disclosed that such superior electrochemical properties can be attributed to the pseudocapacitive behavior of FeS shell as demonstrated by the kinetics studies as well as the core-shell structure. In view of the large-scale availability of commercial precursor and ease of preparation, this study provide a scalable strategy to develop advanced anode materials for SIBs.
A Systems Approach to Scalable Transportation Network Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2006-01-01
Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lopez, Cecilia C.; Theoretische Physik, Universitaet des Saarlandes, D-66041 Saarbruecken; Departament de Fisica, Universitat Autonoma de Barcelona, E-08193 Bellaterra
2010-06-15
We present in a unified manner the existing methods for scalable partial quantum process tomography. We focus on two main approaches: the one presented in Bendersky et al. [Phys. Rev. Lett. 100, 190403 (2008)] and the ones described, respectively, in Emerson et al. [Science 317, 1893 (2007)] and Lopez et al. [Phys. Rev. A 79, 042328 (2009)], which can be combined together. The methods share an essential feature: They are based on the idea that the tomography of a quantum map can be efficiently performed by studying certain properties of a twirling of such a map. From this perspective, inmore » this paper we present extensions, improvements, and comparative analyses of the scalable methods for partial quantum process tomography. We also clarify the significance of the extracted information, and we introduce interesting and useful properties of the {chi}-matrix representation of quantum maps that can be used to establish a clearer path toward achieving full tomography of quantum processes in a scalable way.« less
A Fast and Scalable Algorithm for Calculating the Achievable Capacity of a Wireless Mesh Network
2016-04-10
to interference from a given transmission . We then use our algorithm to perform a network capacity analysis comparing different wireless technologies...A Fast and Scalable Algorithm for Calculating the Achievable Capacity of a Wireless Mesh Network Greg Kuperman, Jun Sun, and Aradhana Narula-Tam MIT...the maximum achievable capacity of a multi-hop wireless mesh network subject to interference constraints. Being able to quickly determine the maximum
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha
2005-01-01
This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.
A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions
Taylor, Richard L.; Bentley, Christopher D. B.; Pedernales, Julen S.; Lamata, Lucas; Solano, Enrique; Carvalho, André R. R.; Hope, Joseph J.
2017-01-01
Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10−5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period. PMID:28401945
A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions.
Taylor, Richard L; Bentley, Christopher D B; Pedernales, Julen S; Lamata, Lucas; Solano, Enrique; Carvalho, André R R; Hope, Joseph J
2017-04-12
Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10 -5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period.
Scalable cluster administration - Chiba City I approach and lessons learned.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Navarro, J. P.; Evard, R.; Nurmi, D.
2002-07-01
Systems administrators of large clusters often need to perform the same administrative activity hundreds or thousands of times. Often such activities are time-consuming, especially the tasks of installing and maintaining software. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control, cluster administrators can automate all administrative tasks. Scalable cluster administration addresses the following challenge: What systems design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linuxmore » cluster; and we analyze the scalability, flexibility, and reliability benefits and limitations from that approach.« less
Polymer waveguides for electro-optical integration in data centers and high-performance computers.
Dangel, Roger; Hofrichter, Jens; Horst, Folkert; Jubin, Daniel; La Porta, Antonio; Meier, Norbert; Soganci, Ibrahim Murat; Weiss, Jonas; Offrein, Bert Jan
2015-02-23
To satisfy the intra- and inter-system bandwidth requirements of future data centers and high-performance computers, low-cost low-power high-throughput optical interconnects will become a key enabling technology. To tightly integrate optics with the computing hardware, particularly in the context of CMOS-compatible silicon photonics, optical printed circuit boards using polymer waveguides are considered as a formidable platform. IBM Research has already demonstrated the essential silicon photonics and interconnection building blocks. A remaining challenge is electro-optical packaging, i.e., the connection of the silicon photonics chips with the system. In this paper, we present a new single-mode polymer waveguide technology and a scalable method for building the optical interface between silicon photonics chips and single-mode polymer waveguides.
NASA Astrophysics Data System (ADS)
Jin, Sung Hun; Dunham, Simon; Xie, Xu; Rogers, John A.
2015-09-01
Among the remarkable variety of semiconducting nanomaterials that have been discovered over the past two decades, single-walled carbon nanotubes remain uniquely well suited for applications in high-performance electronics, sensors and other technologies. The most advanced opportunities demand the ability to form perfectly aligned, horizontal arrays of purely semiconducting, chemically pristine carbon nanotubes. Here, we present strategies that offer this capability. Nanoscale thermos-capillary flows in thin-film organic coatings followed by reactive ion etching serve as highly efficient means for selectively removing metallic carbon nanotubes from electronically heterogeneous aligned arrays grown on quartz substrates. The low temperatures and unusual physics associated with this process enable robust, scalable operation, with clear potential for practical use. Especially for the purpose of selective joule heating over only metallic nanotubes, two representative platforms are proposed and confirmed. One is achieved by selective joule heating associated with thin film transistors with partial gate structure. The other is based on a simple, scalable, large-area scheme through microwave irradiation by using micro-strip dipole antennas of low work-function metals. In this study, based on purified semiconducting SWNTs, we demonstrated field effect transistors with mobility (> 1,000 cm2/Vsec) and on/off switching ratio (~10,000) with current outputs in the milliamp range. Furthermore, as one demonstration of the effectiveness over large area-scalability and simplicity, implementing the micro-wave based purification, on large arrays consisting of ~20,000 SWNTs completely removes all of the m-SWNTs (~7,000) to yield a purity of s-SWNTs that corresponds, quantitatively, to at least to 99.9925% and likely significantly higher.
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Zhang, Tianchang; Kim, Christine H J; Cheng, Yingwen; Ma, Yanwen; Zhang, Hongbo; Liu, Jie
2015-02-21
A "top-down" and scalable approach for processing carbon fiber cloth (CFC) into flexible and all-carbon electrodes with remarkable areal capacity and cyclic stability was developed. CFC is commercially available in large quantities but its use as an electrode material in supercapacitors is not satisfactory. The approach demonstrated in this work is based on the sequential treatment of CFC with KOH activation and high temperature annealing that can effectively improve its specific surface area to a remarkable 2780 m(2) g(-1) while at the same time achieving a good electrical conductivity of 320 S m(-1) without sacrificing its intrinsic mechanical strength and flexibility. The processed CFC can be directly used as an electrode for supercapacitors without any binders, conductive additives and current collectors while avoiding elaborate electrode processing steps to deliver a specific capacitance of ∼0.5 F cm(-2) and ∼197 F g(-1) with remarkable rate performance and excellent cyclic stability. The properties of these processed CFCs are comparable or better than graphene and carbon nanotube based electrodes. We further demonstrate symmetric solid-state supercapacitors based on these processed CFCs with very good flexibility. This "top-down" and scalable approach can be readily applied to other types of commercially available carbon materials and therefore can have a substantial significance for high performance supercapacitor devices.
Thickness-independent capacitance of vertically aligned liquid-crystalline MXenes
Xia, Yu; Mathis, Tyler S.; Zhao, Meng -Qiang; ...
2018-05-16
The scalable and sustainable manufacture of thick electrode films with high energy and power densities is critical for the large-scale storage of electrochemical energy for application in transportation and stationary electric grids. Two-dimensional nanomaterials have become the predominant choice of electrode material in the pursuit of high energy and power densities owing to their large surface-area-to-volume ratios and lack of solid-state diffusion. However, traditional electrode fabrication methods often lead to restacking of two-dimensional nanomaterials, which limits ion transport in thick films and results in systems in which the electrochemical performance is highly dependent on the thickness of the film. Strategiesmore » for facilitating ion transport—such as increasing the interlayer spacing by intercalation or introducing film porosity by designing nanoarchitectures—result in materials with low volumetric energy storage as well as complex and lengthy ion transport paths that impede performance at high charge–discharge rates. Vertical alignment of two-dimensional flakes enables directional ion transport that can lead to thickness-independent electrochemical performances in thick films. However, so far only limited success has been reported, and the mitigation of performance losses remains a major challenge when working with films of two-dimensional nanomaterials with thicknesses that are near to or exceed the industrial standard of 100 micrometres. Here we demonstrate electrochemical energy storage that is independent of film thickness for vertically aligned two-dimensional titanium carbide (Ti 3C 2T x), a material from the MXene family (two-dimensional carbides and nitrides of transition metals (M), where X stands for carbon or nitrogen). The vertical alignment was achieved by mechanical shearing of a discotic lamellar liquid-crystal phase of Ti 3C 2T x. The resulting electrode films show excellent performance that is nearly independent of film thickness up to 200 micrometres, which makes them highly attractive for energy storage applications. In conclusion, the self-assembly approach presented here is scalable and can be extended to other systems that involve directional transport, such as catalysis and filtration.« less
Thickness-independent capacitance of vertically aligned liquid-crystalline MXenes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xia, Yu; Mathis, Tyler S.; Zhao, Meng -Qiang
The scalable and sustainable manufacture of thick electrode films with high energy and power densities is critical for the large-scale storage of electrochemical energy for application in transportation and stationary electric grids. Two-dimensional nanomaterials have become the predominant choice of electrode material in the pursuit of high energy and power densities owing to their large surface-area-to-volume ratios and lack of solid-state diffusion. However, traditional electrode fabrication methods often lead to restacking of two-dimensional nanomaterials, which limits ion transport in thick films and results in systems in which the electrochemical performance is highly dependent on the thickness of the film. Strategiesmore » for facilitating ion transport—such as increasing the interlayer spacing by intercalation or introducing film porosity by designing nanoarchitectures—result in materials with low volumetric energy storage as well as complex and lengthy ion transport paths that impede performance at high charge–discharge rates. Vertical alignment of two-dimensional flakes enables directional ion transport that can lead to thickness-independent electrochemical performances in thick films. However, so far only limited success has been reported, and the mitigation of performance losses remains a major challenge when working with films of two-dimensional nanomaterials with thicknesses that are near to or exceed the industrial standard of 100 micrometres. Here we demonstrate electrochemical energy storage that is independent of film thickness for vertically aligned two-dimensional titanium carbide (Ti 3C 2T x), a material from the MXene family (two-dimensional carbides and nitrides of transition metals (M), where X stands for carbon or nitrogen). The vertical alignment was achieved by mechanical shearing of a discotic lamellar liquid-crystal phase of Ti 3C 2T x. The resulting electrode films show excellent performance that is nearly independent of film thickness up to 200 micrometres, which makes them highly attractive for energy storage applications. In conclusion, the self-assembly approach presented here is scalable and can be extended to other systems that involve directional transport, such as catalysis and filtration.« less
Performance evaluation of OpenFOAM on many-core architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.
Nguyen, H Q; Yu, H W; Luc, Q H; Tang, Y Z; Phan, V T H; Hsu, C H; Chang, E Y; Tseng, Y C
2014-12-05
Using a step-graded (SG) buffer structure via metal-organic chemical vapor deposition, we demonstrate a high suitability of In0.5Ga0.5As epitaxial layers on a GaAs substrate for electronic device application. Taking advantage of the technique's precise control, we were able to increase the number of SG layers to achieve a fairly low dislocation density (∼10(6) cm(-2)), while keeping each individual SG layer slightly exceeding the critical thickness (∼80 nm) for strain relaxation. This met the demanded but contradictory requirements, and even offered excellent scalability by lowering the whole buffer structure down to 2.3 μm. This scalability overwhelmingly excels the forefront studies. The effects of the SG misfit strain on the crystal quality and surface morphology of In0.5Ga0.5As epitaxial layers were carefully investigated, and were correlated to threading dislocation (TD) blocking mechanisms. From microstructural analyses, TDs can be blocked effectively through self-annihilation reactions, or hindered randomly by misfit dislocation mechanisms. Growth conditions for avoiding phase separation were also explored and identified. The buffer-improved, high-quality In0.5Ga0.5As epitaxial layers enabled a high-performance, metal-oxide-semiconductor capacitor on a GaAs substrate. The devices displayed remarkable capacitance-voltage responses with small frequency dispersion. A promising interface trap density of 3 × 10(12) eV(-1) cm(-2) in a conductance test was also obtained. These electrical performances are competitive to those using lattice-coherent but pricey InGaAs/InP systems.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Innovative HPC architectures for the study of planetary plasma environments
NASA Astrophysics Data System (ADS)
Amaya, Jorge; Wolf, Anna; Lembège, Bertrand; Zitz, Anke; Alvarez, Damian; Lapenta, Giovanni
2016-04-01
DEEP-ER is an European Commission founded project that develops a new type of High Performance Computer architecture. The revolutionary system is currently used by KU Leuven to study the effects of the solar wind on the global environments of the Earth and Mercury. The new architecture combines the versatility of Intel Xeon computing nodes with the power of the upcoming Intel Xeon Phi accelerators. Contrary to classical heterogeneous HPC architectures, where it is customary to find CPU and accelerators in the same computing nodes, in the DEEP-ER system CPU nodes are grouped together (Cluster) and independently from the accelerator nodes (Booster). The system is equipped with a state of the art interconnection network, a highly scalable and fast I/O and a fail recovery resiliency system. The final objective of the project is to introduce a scalable system that can be used to create the next generation of exascale supercomputers. The code iPic3D from KU Leuven is being adapted to this new architecture. This particle-in-cell code can now perform the computation of the electromagnetic fields in the Cluster while the particles are moved in the Booster side. Using fast and scalable Xeon Phi accelerators in the Booster we can introduce many more particles per cell in the simulation than what is possible in the current generation of HPC systems, allowing to calculate fully kinetic plasmas with very low interpolation noise. The system will be used to perform fully kinetic, low noise, 3D simulations of the interaction of the solar wind with the magnetosphere of the Earth and Mercury. Preliminary simulations have been performed in other HPC centers in order to compare the results in different systems. In this presentation we show the complexity of the plasma flow around the planets, including the development of hydrodynamic instabilities at the flanks, the presence of the collision-less shock, the magnetosheath, the magnetopause, reconnection zones, the formation of the plasma sheet and the magnetotail, and the variation of ion/electron plasma flows when crossing these frontiers. The simulations also give access to detailed information about the particle dynamics and their velocity distribution at locations that can be used for comparison with satellite data.
Scalability of grid- and subbasin-based land surface modeling approaches for hydrologic simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tesfa, Teklu K.; Ruby Leung, L.; Huang, Maoyi
2014-03-27
This paper investigates the relative merits of grid- and subbasin-based land surface modeling approaches for hydrologic simulations, with a focus on their scalability (i.e., abilities to perform consistently across a range of spatial resolutions) in simulating runoff generation. Simulations produced by the grid- and subbasin-based configurations of the Community Land Model (CLM) are compared at four spatial resolutions (0.125o, 0.25o, 0.5o and 1o) over the topographically diverse region of the U.S. Pacific Northwest. Using the 0.125o resolution simulation as the “reference”, statistical skill metrics are calculated and compared across simulations at 0.25o, 0.5o and 1o spatial resolutions of each modelingmore » approach at basin and topographic region levels. Results suggest significant scalability advantage for the subbasin-based approach compared to the grid-based approach for runoff generation. Basin level annual average relative errors of surface runoff at 0.25o, 0.5o, and 1o compared to 0.125o are 3%, 4%, and 6% for the subbasin-based configuration and 4%, 7%, and 11% for the grid-based configuration, respectively. The scalability advantages of the subbasin-based approach are more pronounced during winter/spring and over mountainous regions. The source of runoff scalability is found to be related to the scalability of major meteorological and land surface parameters of runoff generation. More specifically, the subbasin-based approach is more consistent across spatial scales than the grid-based approach in snowfall/rainfall partitioning, which is related to air temperature and surface elevation. Scalability of a topographic parameter used in the runoff parameterization also contributes to improved scalability of the rain driven saturated surface runoff component, particularly during winter. Hence this study demonstrates the importance of spatial structure for multi-scale modeling of hydrological processes, with implications to surface heat fluxes in coupled land-atmosphere modeling.« less
Scalability Test of Multiscale Fluid-Platelet Model for Three Top Supercomputers
Zhang, Peng; Zhang, Na; Gao, Chao; Zhang, Li; Gao, Yuxiang; Deng, Yuefan; Bluestein, Danny
2016-01-01
We have tested the scalability of three supercomputers: the Tianhe-2, Stampede and CS-Storm with multiscale fluid-platelet simulations, in which a highly-resolved and efficient numerical model for nanoscale biophysics of platelets in microscale viscous biofluids is considered. Three experiments involving varying problem sizes were performed: Exp-S: 680,718-particle single-platelet; Exp-M: 2,722,872-particle 4-platelet; and Exp-L: 10,891,488-particle 16-platelet. Our implementations of multiple time-stepping (MTS) algorithm improved the performance of single time-stepping (STS) in all experiments. Using MTS, our model achieved the following simulation rates: 12.5, 25.0, 35.5 μs/day for Exp-S and 9.09, 6.25, 14.29 μs/day for Exp-M on Tianhe-2, CS-Storm 16-K80 and Stampede K20. The best rate for Exp-L was 6.25 μs/day for Stampede. Utilizing current advanced HPC resources, the simulation rates achieved by our algorithms bring within reach performing complex multiscale simulations for solving vexing problems at the interface of biology and engineering, such as thrombosis in blood flow which combines millisecond-scale hematology with microscale blood flow at resolutions of micro-to-nanoscale cellular components of platelets. This study of testing the performance characteristics of supercomputers with advanced computational algorithms that offer optimal trade-off to achieve enhanced computational performance serves to demonstrate that such simulations are feasible with currently available HPC resources. PMID:27570250
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
Zhou, Wenbin; Fan, Qingxia; Zhang, Qiang; Cai, Le; Li, Kewei; Gu, Xiaogang; Yang, Feng; Zhang, Nan; Wang, Yanchun; Liu, Huaping; Zhou, Weiya; Xie, Sishen
2017-01-01
It is a great challenge to substantially improve the practical performance of flexible thermoelectric modules due to the absence of air-stable n-type thermoelectric materials with high-power factor. Here an excellent flexible n-type thermoelectric film is developed, which can be conveniently and rapidly prepared based on the as-grown carbon nanotube continuous networks with high conductivity. The optimum n-type film exhibits ultrahigh power factor of ∼1,500 μW m−1 K−2 and outstanding stability in air without encapsulation. Inspired by the findings, we design and successfully fabricate the compact-configuration flexible TE modules, which own great advantages compared with the conventional π-type configuration modules and well integrate the superior thermoelectric properties of p-type and n-type carbon nanotube films resulting in a markedly high performance. Moreover, the research results are highly scalable and also open opportunities for the large-scale production of flexible thermoelectric modules. PMID:28337987
Zhang, Qiaobao; Chen, Huixin; Han, Xiang; Cai, Junjie; Yang, Yong; Liu, Meilin; Zhang, Kaili
2016-01-01
The appropriate combination of hierarchical transition-metal oxide (TMO) micro-/nanostructures constructed from porous nanobuilding blocks with graphene sheets (GNS) in a core/shell geometry is highly desirable for high-performance lithium-ion batteries (LIBs). A facile and scalable process for the fabrication of 3D hierarchical porous zinc-nickel-cobalt oxide (ZNCO) microspheres constructed from porous ultrathin nanosheets encapsulated by GNS to form a core/shell geometry is reported for improved electrochemical performance of the TMOs as an anode in LIBs. By virtue of their intriguing structural features, the produced ZNCO/GNS core/shell hybrids exhibit an outstanding reversible capacity of 1015 mA h g(-1) at 0.1 C after 50 cycles. Even at a high rate of 1 C, a stable capacity as high as 420 mA h g(-1) could be maintained after 900 cycles, which suggested their great potential as efficient electrodes for high-performance LIBs. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
MicROS-drt: supporting real-time and scalable data distribution in distributed robotic systems.
Ding, Bo; Wang, Huaimin; Fan, Zedong; Zhang, Pengfei; Liu, Hui
A primary requirement in distributed robotic software systems is the dissemination of data to all interested collaborative entities in a timely and scalable manner. However, providing such a service in a highly dynamic and resource-limited robotic environment is a challenging task, and existing robot software infrastructure has limitations in this aspect. This paper presents a novel robot software infrastructure, micROS-drt, which supports real-time and scalable data distribution. The solution is based on a loosely coupled data publish-subscribe model with the ability to support various time-related constraints. And to realize this model, a mature data distribution standard, the data distribution service for real-time systems (DDS), is adopted as the foundation of the transport layer of this software infrastructure. By elaborately adapting and encapsulating the capability of the underlying DDS middleware, micROS-drt can meet the requirement of real-time and scalable data distribution in distributed robotic systems. Evaluation results in terms of scalability, latency jitter and transport priority as well as the experiment on real robots validate the effectiveness of this work.
BactoGeNIE: A large-scale comparative genome visualization for big displays
Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...
2015-08-13
The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less
BactoGeNIE: a large-scale comparative genome visualization for big displays
2015-01-01
Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021
BactoGeNIE: A large-scale comparative genome visualization for big displays
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aurisano, Jillian; Reda, Khairi; Johnson, Andrew
The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less
NASA Astrophysics Data System (ADS)
Bucay, Igal; Helal, Ahmed; Dunsky, David; Leviyev, Alex; Mallavarapu, Akhila; Sreenivasan, S. V.; Raizen, Mark
2017-04-01
Ionization of atoms and molecules is an important process in many applications and processes such as mass spectrometry. Ionization is typically accomplished by electron bombardment, and while it is scalable to large volumes, is also very inefficient due to the small cross section of electron-atom collisions. Photoionization methods can be highly efficient, but are not scalable due to the small ionization volume. Electric field ionization is accomplished using ultra-sharp conducting tips biased to a few kilovolts, but suffers from a low ionization volume and tip fabrication limitations. We report on our progress towards an efficient, robust, and scalable method of atomic and molecular ionization using orderly arrays of sharp, gold-doped silicon nanowires. As demonstrated in earlier work, the presence of the gold greatly enhances the ionization probability, which was attributed to an increase in available acceptor surface states. We present here a novel process used to fabricate the nanowire array, results of simulations aimed at optimizing the configuration of the array, and our progress towards demonstrating efficient and scalable ionization.
Bech, P; Carrozzino, D; Austin, S F; Møller, S B; Vassend, O
2016-03-15
Whereas the Eysenck Neuroticism Scale only contains items covering negative mental health to measure dysthymia, the NEO Personality Inventory (NEO-PI) contains neuroticism items covering both negative mental health and positive mental health (or euthymia). The consequence of wording items both positively and negatively within the NEO-PI has never been psychometrically investigated. The aim of this study was to perform a validation analysis of the NEO-PI neuroticism scale. Using a Norwegian general population study we examined the structure of the negatively and positively formulated items by principal component analysis (PCA). The scalability of the identified two groups of euthymia versus dysthymia items was examined by Mokken analysis. With a response rate of 90%, 1082 individuals with a completed NEO-PI were available. The PCA identified the neuroticism scale as the most distinct where 14 items had acceptable loadings for the euthymia subscale, another 14 items for the dysthymia subscale. However, the Mokken analysis coefficient of homogeneity only found acceptable scalability for the euthymia subscale. A comparison with the Eysenck Neuroticism Scale was not performed. The NEO-PI neuroticism scale contains two subscales consisting of items worded in an opposite direction where only the positive euthymia items have an acceptable scalability. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Steiger, Damian S.; Haener, Thomas; Troyer, Matthias
Quantum computers promise to transform our notions of computation by offering a completely new paradigm. A high level quantum programming language and optimizing compilers are essential components to achieve scalable quantum computation. In order to address this, we introduce the ProjectQ software framework - an open source effort to support both theorists and experimentalists by providing intuitive tools to implement and run quantum algorithms. Here, we present our ProjectQ quantum compiler, which compiles a quantum algorithm from our high-level Python-embedded language down to low-level quantum gates available on the target system. We demonstrate how this compiler can be used to control actual hardware and to run high-performance simulations.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Ethoscopes: An open platform for high-throughput ethomics.
Geissmann, Quentin; Garcia Rodriguez, Luis; Beckwith, Esteban J; French, Alice S; Jamasb, Arian R; Gilestro, Giorgio F
2017-10-01
Here, we present the use of ethoscopes, which are machines for high-throughput analysis of behavior in Drosophila and other animals. Ethoscopes provide a software and hardware solution that is reproducible and easily scalable. They perform, in real-time, tracking and profiling of behavior by using a supervised machine learning algorithm, are able to deliver behaviorally triggered stimuli to flies in a feedback-loop mode, and are highly customizable and open source. Ethoscopes can be built easily by using 3D printing technology and rely on Raspberry Pi microcomputers and Arduino boards to provide affordable and flexible hardware. All software and construction specifications are available at http://lab.gilest.ro/ethoscope.
Enabling parallel simulation of large-scale HPC network systems
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; ...
2016-04-07
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
Enabling parallel simulation of large-scale HPC network systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Development of performance measurement for freight transportation.
DOT National Transportation Integrated Search
2014-09-01
In this project, the researchers built a set of performance measures that are unified, user-oriented, scalable, systematic, effective, and : calculable for intermodal freight management and developed methodologies to calculate and use the measures. :...
A Highly Flexible and Efficient Passive Optical Network Employing Dynamic Wavelength Allocation
NASA Astrophysics Data System (ADS)
Hsueh, Yu-Li; Rogge, Matthew S.; Yamamoto, Shu; Kazovsky, Leonid G.
2005-01-01
A novel and high-performance passive optical network (PON), the SUCCESS-DWA PON, employs dynamic wavelength allocation to provide bandwidth sharing across multiple physical PONs. In the downstream, tunable lasers, an arrayed waveguide grating, and coarse/fine filtering combine to create a flexible new optical access solution. In the upstream, several distributed and centralized schemes are proposed and investigated. The network performance is compared to conventional TDM-PONs under different traffic models, including the self-similar traffic model and the transaction-oriented model. Broadcast support and deployment issues are addressed. The network's excellent scalability can bridge the gap between conventional TDM-PONs and WDM-PONs. The powerful architecture is a promising candidate for next generation optical access networks.
Piezoresistive Sensor with High Elasticity Based on 3D Hybrid Network of Sponge@CNTs@Ag NPs.
Zhang, Hui; Liu, Nishuang; Shi, Yuling; Liu, Weijie; Yue, Yang; Wang, Siliang; Ma, Yanan; Wen, Li; Li, Luying; Long, Fei; Zou, Zhengguang; Gao, Yihua
2016-08-31
Pressure sensors with high elasticity are in great demand for the realization of intelligent sensing, but there is a need to develope a simple, inexpensive, and scalable method for the manufacture of the sensors. Here, we reported an efficient, simple, facile, and repeatable "dipping and coating" process to manufacture a piezoresistive sensor with high elasticity, based on homogeneous 3D hybrid network of carbon nanotubes@silver nanoparticles (CNTs@Ag NPs) anchored on a skeleton sponge. Highly elastic, sensitive, and wearable sensors are obtained using the porous structure of sponge and the synergy effect of CNTs/Ag NPs. Our sensor was also tested for over 2000 compression-release cycles, exhibiting excellent elasticity and cycling stability. Sensors with high performance and a simple fabrication process are promising devices for commercial production in various electronic devices, for example, sport performance monitoring and man-machine interfaces.
Scalable Robust Principal Component Analysis Using Grassmann Averages.
Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J
2016-11-01
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
NASA Astrophysics Data System (ADS)
Snyder, P. L.; Brown, V. W.
2017-12-01
IBM has created a general purpose, data-agnostic solution that provides high performance, low data latency, high availability, scalability, and persistent access to the captured data, regardless of source or type. This capability is hosted on commercially available cloud environments and uses much faster, more efficient, reliable, and secure data transfer protocols than the more typically used FTP. The design incorporates completely redundant data paths at every level, including at the cloud data center level, in order to provide the highest assurance of data availability to the data consumers. IBM has been successful in building and testing a Proof of Concept instance on our IBM Cloud platform to receive and disseminate actual GOES-16 data as it is being downlinked. This solution leverages the inherent benefits of a cloud infrastructure configured and tuned for continuous, stable, high-speed data dissemination to data consumers worldwide at the downlink rate. It also is designed to ingest data from multiple simultaneous sources and disseminate data to multiple consumers. Nearly linear scalability is achieved by adding servers and storage.The IBM Proof of Concept system has been tested with our partners to achieve in excess of 5 Gigabits/second over public internet infrastructure. In tests with live GOES-16 data, the system routinely achieved 2.5 Gigabits/second pass-through to The Weather Company from the University of Wisconsin-Madison SSEC. Simulated data was also transferred from the Cooperative Institute for Climate and Satellites — North Carolina to The Weather Company, as well. The storage node allocated to our Proof of Concept system as tested was sized at 480 Terabytes of RAID protected disk as a worst case sizing to accommodate the data from four GOES-16 class satellites for 30 days in a circular buffer. This shows that an abundance of performance and capacity headroom exists in the IBM design that can be applied to additional missions.
Scalable Synthesis of Defect Abundant Si Nanorods for High-Performance Li-Ion Battery Anodes.
Wang, Jing; Meng, Xiangcai; Fan, Xiulin; Zhang, Wenbo; Zhang, Hongyong; Wang, Chunsheng
2015-06-23
Microsized nanostructured silicon-carbon composite is a promising anode material for high energy Li-ion batteries. However, large-scale synthesis of high-performance nano-Si materials at a low cost still remains a significant challenge. We report a scalable low cost method to synthesize Al/Na-doped and defect-abundant Si nanorods that have excellent electrochemical performance with high first-cycle Coulombic efficiency (90%). The unique Si nanorods are synthesized by acid etching the refined and rapidly solidified eutectic Al-Si ingot. To maintain the high electronic conductivity, a thin layer of carbon is then coated on the Si nanorods by carbonization of self-polymerized polydopamine (PDA) at 800 °C. The carbon coated Si nanorods (Si@C) electrode at 0.9 mg cm(-2) loading (corresponding to area-specific-capacity of ∼2.0 mAh cm(-2)) exhibits a reversible capacity of ∼2200 mAh g(-1) at 100 mA g(-1) current, and maintains ∼700 mAh g(-1) over 1000 cycles at 1000 mA g(-1) with a capacity decay rate of 0.02% per cycle. High Coulombic efficiencies of 87% in the first cycle and ∼99.7% after 5 cycles are achieved due to the formation of an artificial Al2O3 solid electrolyte interphase (SEI) on the Si surface, and the low surface area (31 m(2) g(-1)), which has never been reported before for nano-Si anodes. The excellent electrochemical performance results from the massive defects (twins, stacking faults, dislocations) and Al/Na doping in Si nanorods induced by rapid solidification and Na salt modifications; this greatly enhances the robustness of Si from the volume changes and alleviates the mechanical stress/strain of the Si nanorods during the lithium insertion/extraction process. Introducing massive defects and Al/Na doping in eutectic Si nanorods for Li-ion battery anodes is unexplored territory. We venture this uncharted territory to commercialize this nanostructured Si anode for the next generation of Li-ion batteries.
NASA Astrophysics Data System (ADS)
Zhu, F.; Yu, H.; Rilee, M. L.; Kuo, K. S.; Yu, L.; Pan, Y.; Jiang, H.
2017-12-01
Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer from data archive centers to scientists' computers through low-bandwidth Internet connections. Data transfer becomes a major performance bottleneck in such an approach. Combined with generally constrained local compute/storage resources, they limit the extent of scientists' studies and deprive them of timely outcomes. Thus, this conventional approach is not scalable with respect to both the volume and variety of geoscience data. A much more viable solution is to couple analysis and storage systems to minimize data transfer. In our study, we compare loosely coupled approaches (exemplified by Spark and Hadoop) and tightly coupled approaches (exemplified by parallel distributed database management systems, e.g., SciDB). In particular, we investigate the optimization of data placement and movement to effectively tackle the variety challenge, and boost the popularization of parallelization to address the volume challenge. Our goal is to enable high-performance interactive analysis for a good portion of geoscience data analysis exercise. We show that tightly coupled approaches can concentrate data traffic between local storage systems and compute units, and thereby optimizing bandwidth utilization to achieve a better throughput. Based on our observations, we develop a geoscience data analysis system that tightly couples analysis engines with storages, which has direct access to the detailed map of data partition locations. Through an innovation data partitioning and distribution scheme, our system has demonstrated scalable and interactive performance in real-world geoscience data analysis applications.
NASA Astrophysics Data System (ADS)
Wang, Wei; Ruiz, Isaac; Lee, Ilkeun; Zaera, Francisco; Ozkan, Mihrimah; Ozkan, Cengiz S.
2015-04-01
Optimization of the electrode/electrolyte double-layer interface is a key factor for improving electrode performance of aqueous electrolyte based supercapacitors (SCs). Here, we report the improved functionality of carbon materials via a non-invasive, high-throughput, and inexpensive UV generated ozone (UV-ozone) treatment. This process allows precise tuning of the graphene and carbon nanotube hybrid foam (GM) transitionally from ultrahydrophobic to hydrophilic within 60 s. The continuous tuning of surface energy can be controlled by simply varying the UV-ozone exposure time, while the ozone-oxidized carbon nanostructure maintains its integrity. Symmetric SCs based on the UV-ozone treated GM foam demonstrated enhanced rate performance. This technique can be readily applied to other CVD-grown carbonaceous materials by taking advantage of its ease of processing, low cost, scalability, and controllability.Optimization of the electrode/electrolyte double-layer interface is a key factor for improving electrode performance of aqueous electrolyte based supercapacitors (SCs). Here, we report the improved functionality of carbon materials via a non-invasive, high-throughput, and inexpensive UV generated ozone (UV-ozone) treatment. This process allows precise tuning of the graphene and carbon nanotube hybrid foam (GM) transitionally from ultrahydrophobic to hydrophilic within 60 s. The continuous tuning of surface energy can be controlled by simply varying the UV-ozone exposure time, while the ozone-oxidized carbon nanostructure maintains its integrity. Symmetric SCs based on the UV-ozone treated GM foam demonstrated enhanced rate performance. This technique can be readily applied to other CVD-grown carbonaceous materials by taking advantage of its ease of processing, low cost, scalability, and controllability. Electronic supplementary information (ESI) available. See DOI: 10.1039/c4nr06795a
Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam
2012-01-15
Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gentile, Ann C.; Brandt, James M.; Tucker, Thomas
2011-09-01
This report provides documentation for the completion of the Sandia Level II milestone 'Develop feedback system for intelligent dynamic resource allocation to improve application performance'. This milestone demonstrates the use of a scalable data collection analysis and feedback system that enables insight into how an application is utilizing the hardware resources of a high performance computing (HPC) platform in a lightweight fashion. Further we demonstrate utilizing the same mechanisms used for transporting data for remote analysis and visualization to provide low latency run-time feedback to applications. The ultimate goal of this body of work is performance optimization in the facemore » of the ever increasing size and complexity of HPC systems.« less
Low Cost High Performance Nanostructured Spectrally Selective Coating
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Sungho
2017-04-05
Sunlight absorbing coating is a key enabling technology to achieve high-temperature high-efficiency concentrating solar power operation. A high-performance solar absorbing material must simultaneously meet all the following three stringent requirements: high thermal efficiency (usually measured by figure of merit), high-temperature durability, and oxidation resistance. The objective of this research is to employ a highly scalable process to fabricate and coat black oxide nanoparticles onto solar absorber surface to achieve ultra-high thermal efficiency. Black oxide nanoparticles have been synthesized using a facile process and coated onto absorber metal surface. The material composition, size distribution and morphology of the nanoparticle are guidedmore » by numeric modeling. Optical and thermal properties have been both modeled and measured. High temperature durability has been achieved by using nanocomposites and high temperature annealing. Mechanical durability on thermal cycling have also been investigated and optimized. This technology is promising for commercial applications in next-generation high-temperature concentration solar power (CSP) plants.« less
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units
NASA Astrophysics Data System (ADS)
Corral, Roque; Gisbert, Fernando; Pueblas, Jesus
2017-02-01
The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
Simultaneous Purification and Perforation of Low-Grade Si Sources for Lithium-Ion Battery Anode.
Jin, Yan; Zhang, Su; Zhu, Bin; Tan, Yingling; Hu, Xiaozhen; Zong, Linqi; Zhu, Jia
2015-11-11
Silicon is regarded as one of the most promising candidates for lithium-ion battery anodes because of its abundance and high theoretical capacity. Various silicon nanostructures have been heavily investigated to improve electrochemical performance by addressing issues related to structure fracture and unstable solid-electrolyte interphase (SEI). However, to further enable widespread applications, scalable and cost-effective processes need to be developed to produce these nanostructures at large quantity with finely controlled structures and morphologies. In this study, we develop a scalable and low cost process to produce porous silicon directly from low grade silicon through ball-milling and modified metal-assisted chemical etching. The morphology of porous silicon can be drastically changed from porous-network to nanowire-array by adjusting the component in reaction solutions. Meanwhile, this perforation process can also effectively remove the impurities and, therefore, increase Si purity (up to 99.4%) significantly from low-grade and low-cost ferrosilicon (purity of 83.4%) sources. The electrochemical examinations indicate that these porous silicon structures with carbon treatment can deliver a stable capacity of 1287 mAh g(-1) over 100 cycles at a current density of 2 A g(-1). This type of purified porous silicon with finely controlled morphology, produced by a scalable and cost-effective fabrication process, can also serve as promising candidates for many other energy applications, such as thermoelectrics and solar energy conversion devices.
Wen, Wei; Wu, Jin-ming; Jiang, Yin-zhu; Yu, Sheng-lan; Bai, Jun-qiang; Cao, Min-hua; Cui, Jie
2015-01-01
Lithium-ion batteries (LIBs) are promising energy storage devices for portable electronics, electric vehicles, and power-grid applications. It is highly desirable yet challenging to develop a simple and scalable method for constructions of sustainable materials for fast and safe LIBs. Herein, we exploit a novel and scalable route to synthesize ultrathin nanobelts of anatase TiO2, which is resource abundant and is eligible for safe anodes in LIBs. The achieved ultrathin nanobelts demonstrate outstanding performances for lithium storage because of the unique nanoarchitecture and appropriate composition. Unlike conventional alkali-hydrothermal approaches to hydrogen titanates, the present room temperature alkaline-free wet chemistry strategy guarantees the ultrathin thickness for the resultant titanate nanobelts. The anatase TiO2 ultrathin nanobelts were achieved simply by a subsequent calcination in air. The synthesis route is convenient for metal decoration and also for fabricating thin films of one/three dimensional arrays on various substrates at low temperatures, in absence of any seed layers. PMID:26133276
Shen, Yiwen; Hattink, Maarten; Samadi, Payman; ...
2018-04-13
Silicon photonics based switches offer an effective option for the delivery of dynamic bandwidth for future large-scale Datacom systems while maintaining scalable energy efficiency. The integration of a silicon photonics-based optical switching fabric within electronic Datacom architectures requires novel network topologies and arbitration strategies to effectively manage the active elements in the network. Here, we present a scalable software-defined networking control plane to integrate silicon photonic based switches with conventional Ethernet or InfiniBand networks. Our software-defined control plane manages both electronic packet switches and multiple silicon photonic switches for simultaneous packet and circuit switching. We built an experimental Dragonfly networkmore » testbed with 16 electronic packet switches and 2 silicon photonic switches to evaluate our control plane. Observed latencies occupied by each step of the switching procedure demonstrate a total of 344 microsecond control plane latency for data-center and high performance computing platforms.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, Yiwen; Hattink, Maarten; Samadi, Payman
Silicon photonics based switches offer an effective option for the delivery of dynamic bandwidth for future large-scale Datacom systems while maintaining scalable energy efficiency. The integration of a silicon photonics-based optical switching fabric within electronic Datacom architectures requires novel network topologies and arbitration strategies to effectively manage the active elements in the network. Here, we present a scalable software-defined networking control plane to integrate silicon photonic based switches with conventional Ethernet or InfiniBand networks. Our software-defined control plane manages both electronic packet switches and multiple silicon photonic switches for simultaneous packet and circuit switching. We built an experimental Dragonfly networkmore » testbed with 16 electronic packet switches and 2 silicon photonic switches to evaluate our control plane. Observed latencies occupied by each step of the switching procedure demonstrate a total of 344 microsecond control plane latency for data-center and high performance computing platforms.« less
The Simulation of Read-time Scalable Coherent Interface
NASA Technical Reports Server (NTRS)
Li, Qiang; Grant, Terry; Grover, Radhika S.
1997-01-01
Scalable Coherent Interface (SCI, IEEE/ANSI Std 1596-1992) (SCI1, SCI2) is a high performance interconnect for shared memory multiprocessor systems. In this project we investigate an SCI Real Time Protocols (RTSCI1) using Directed Flow Control Symbols. We studied the issues of efficient generation of control symbols, and created a simulation model of the protocol on a ring-based SCI system. This report presents the results of the study. The project has been implemented using SES/Workbench. The details that follow encompass aspects of both SCI and Flow Control Protocols, as well as the effect of realistic client/server processing delay. The report is organized as follows. Section 2 provides a description of the simulation model. Section 3 describes the protocol implementation details. The next three sections of the report elaborate on the workload, results and conclusions. Appended to the report is a description of the tool, SES/Workbench, used in our simulation, and internal details of our implementation of the protocol.
Using S3 cloud storage with ROOT and CvmFS
NASA Astrophysics Data System (ADS)
Arsuaga-Ríos, María; Heikkilä, Seppo S.; Duellmann, Dirk; Meusel, René; Blomer, Jakob; Couturier, Ben
2015-12-01
Amazon S3 is a widely adopted web API for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present an evaluation of two versions of the Huawei UDS storage system stressed with a large number of clients executing HEP software applications. The performance of concurrently storing individual objects is presented alongside with more complex data access patterns as produced by the ROOT data analysis framework. Both Huawei UDS generations show a successful scalability by supporting multiple byte-range requests in contrast with Amazon S3 or Ceph which do not support these commonly used HEP operations. We further report the S3 integration with recent CvmFS versions and summarize the experience with CvmFS/S3 for publishing daily releases of the full LHCb experiment software stack.
Ceph-based storage services for Run2 and beyond
NASA Astrophysics Data System (ADS)
van der Ster, Daniel C.; Lamanna, Massimo; Mascetti, Luca; Peters, Andreas J.; Rousseau, Hervé
2015-12-01
In 2013, CERN IT evaluated then deployed a petabyte-scale Ceph cluster to support OpenStack use-cases in production. With now more than a year of smooth operations, we will present our experience and tuning best-practices. Beyond the cloud storage use-cases, we have been exploring Ceph-based services to satisfy the growing storage requirements during and after Run2. First, we have developed a Ceph back-end for CASTOR, allowing this service to deploy thin disk server nodes which act as gateways to Ceph; this feature marries the strong data archival and cataloging features of CASTOR with the resilient and high performance Ceph subsystem for disk. Second, we have developed RADOSFS, a lightweight storage API which builds a POSIX-like filesystem on top of the Ceph object layer. When combined with Xrootd, RADOSFS can offer a scalable object interface compatible with our HEP data processing applications. Lastly the same object layer is being used to build a scalable and inexpensive NFS service for several user communities.
MarFS, a Near-POSIX Interface to Cloud Objects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson
The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
Robust resistive memory devices using solution-processable metal-coordinated azo aromatics
NASA Astrophysics Data System (ADS)
Goswami, Sreetosh; Matula, Adam J.; Rath, Santi P.; Hedström, Svante; Saha, Surajit; Annamalai, Meenakshi; Sengupta, Debabrata; Patra, Abhijeet; Ghosh, Siddhartha; Jani, Hariom; Sarkar, Soumya; Motapothula, Mallikarjuna Rao; Nijhuis, Christian A.; Martin, Jens; Goswami, Sreebrata; Batista, Victor S.; Venkatesan, T.
2017-12-01
Non-volatile memories will play a decisive role in the next generation of digital technology. Flash memories are currently the key player in the field, yet they fail to meet the commercial demands of scalability and endurance. Resistive memory devices, and in particular memories based on low-cost, solution-processable and chemically tunable organic materials, are promising alternatives explored by the industry. However, to date, they have been lacking the performance and mechanistic understanding required for commercial translation. Here we report a resistive memory device based on a spin-coated active layer of a transition-metal complex, which shows high reproducibility (~350 devices), fast switching (<=30 ns), excellent endurance (~1012 cycles), stability (>106 s) and scalability (down to ~60 nm2). In situ Raman and ultraviolet-visible spectroscopy alongside spectroelectrochemistry and quantum chemical calculations demonstrate that the redox state of the ligands determines the switching states of the device whereas the counterions control the hysteresis. This insight may accelerate the technological deployment of organic resistive memories.
MarFS, a Near-POSIX Interface to Cloud Objects
Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson; ...
2017-01-01
The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F
2017-05-25
Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Electrochemical performances of graphene nanoribbons interlacing hollow NiCo oxide nanocages
NASA Astrophysics Data System (ADS)
Zhao, Xiyu; Li, Xinlu; Huang, Yanchun; Su, Zelong; Long, Junjun; Zhang, Shilei; Sha, Junwei; Wu, Tianli; Wang, Ronghua
2017-12-01
A hybrid of graphene nanoribbons (GNRs) interlacing hollow NiCoO2 (G-HNCO) nanocages in a size range of 300 500 nm with rough surface is synthesized by a chemical etching Cu2O templates and followed by GNR interlacing process. The G-HNCO showed high electrochemical performance of oxygen evolution reaction (OER), which exhibited small onset potential of 1.50 V and achieved current densities of 10 mA cm-2 at potentials of 1.62 V. Also, the hybrid delivered high capacitance of 937.8 F g-1 at 1 A g-1 in supercapacitor (SC) tests as well as stable cycling performance in both OER and SC measurements. The approach to synthesize the hybrid is simple and scalable for other graphene nanoribbon-based electrocatalysts. [Figure not available: see fulltext.
Scalable quantum memory in the ultrastrong coupling regime.
Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C
2015-03-02
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.
Scalable quantum memory in the ultrastrong coupling regime
Kyaw, T. H.; Felicetti, S.; Romero, G.; Solano, E.; Kwek, L.-C.
2015-01-01
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances. PMID:25727251
NASA Astrophysics Data System (ADS)
Vincenti, Henri; Vay, Jean-Luc
2018-07-01
The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense of harder-to-scale global very high-order spectral methods. Indeed, FFT-based methods, which were very popular on shared memory computers, have been largely replaced by finite-difference (FD) methods for the solution of many problems, including plasmas simulations with electromagnetic Particle-In-Cell methods. For some problems, such as the modeling of so-called "plasma mirrors" for the generation of high-energy particles and ultra-short radiations, we have shown that the inaccuracies of standard FD-based PIC methods prevent the modeling on present supercomputers at sufficient accuracy. We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors in 3D.
Liu, Libin; Yu, You; Yan, Casey; Li, Kan; Zheng, Zijian
2015-06-11
One-dimensional flexible supercapacitor yarns are of considerable interest for future wearable electronics. The bottleneck in this field is how to develop devices of high energy and power density, by using economically viable materials and scalable fabrication technologies. Here we report a hierarchical graphene-metallic textile composite electrode concept to address this challenge. The hierarchical composite electrodes consist of low-cost graphene sheets immobilized on the surface of Ni-coated cotton yarns, which are fabricated by highly scalable electroless deposition of Ni and electrochemical deposition of graphene on commercial cotton yarns. Remarkably, the volumetric energy density and power density of the all solid-state supercapacitor yarn made of one pair of these composite electrodes are 6.1 mWh cm(-3) and 1,400 mW cm(-3), respectively. In addition, this SC yarn is lightweight, highly flexible, strong, durable in life cycle and bending fatigue tests, and integratable into various wearable electronic devices.
A scalable and operationally simple radical trifluoromethylation
Beatty, Joel W.; Douglas, James J.; Cole, Kevin P.; Stephenson, Corey R. J.
2015-01-01
The large number of reagents that have been developed for the synthesis of trifluoromethylated compounds is a testament to the importance of the CF3 group as well as the associated synthetic challenge. Current state-of-the-art reagents for appending the CF3 functionality directly are highly effective; however, their use on preparative scale has minimal precedent because they require multistep synthesis for their preparation, and/or are prohibitively expensive for large-scale application. For a scalable trifluoromethylation methodology, trifluoroacetic acid and its anhydride represent an attractive solution in terms of cost and availability; however, because of the exceedingly high oxidation potential of trifluoroacetate, previous endeavours to use this material as a CF3 source have required the use of highly forcing conditions. Here we report a strategy for the use of trifluoroacetic anhydride for a scalable and operationally simple trifluoromethylation reaction using pyridine N-oxide and photoredox catalysis to affect a facile decarboxylation to the CF3 radical. PMID:26258541
Liu, Libin; Yu, You; Yan, Casey; Li, Kan; Zheng, Zijian
2015-01-01
One-dimensional flexible supercapacitor yarns are of considerable interest for future wearable electronics. The bottleneck in this field is how to develop devices of high energy and power density, by using economically viable materials and scalable fabrication technologies. Here we report a hierarchical graphene–metallic textile composite electrode concept to address this challenge. The hierarchical composite electrodes consist of low-cost graphene sheets immobilized on the surface of Ni-coated cotton yarns, which are fabricated by highly scalable electroless deposition of Ni and electrochemical deposition of graphene on commercial cotton yarns. Remarkably, the volumetric energy density and power density of the all solid-state supercapacitor yarn made of one pair of these composite electrodes are 6.1 mWh cm−3 and 1,400 mW cm−3, respectively. In addition, this SC yarn is lightweight, highly flexible, strong, durable in life cycle and bending fatigue tests, and integratable into various wearable electronic devices. PMID:26068809
High-performance nanostructured supercapacitors on a sponge.
Chen, Wei; Rakhi, R B; Hu, Liangbing; Xie, Xing; Cui, Yi; Alshareef, H N
2011-12-14
A simple and scalable method has been developed to fabricate nanostructured MnO2-carbon nanotube (CNT)-sponge hybrid electrodes. A novel supercapacitor, henceforth referred to as "sponge supercapacitor", has been fabricated using these hybrid electrodes with remarkable performance. A specific capacitance of 1,230 F/g (based on the mass of MnO2) can be reached. Capacitors based on CNT-sponge substrates (without MnO2) can be operated even under a high scan rate of 200 V/s, and they exhibit outstanding cycle performance with only 2% degradation after 100,000 cycles under a scan rate of 10 V/s. The MnO2-CNT-sponge supercapacitors show only 4% of degradation after 10,000 cycles at a charge-discharge specific current of 5 A/g. The specific power and energy of the MnO2-CNT-sponge supercapacitors are high with values of 63 kW/kg and 31 Wh/kg, respectively. The attractive performances exhibited by these sponge supercapacitors make them potentially promising candidates for future energy storage systems.
Yi, Fang; Wang, Xiaofeng; Niu, Simiao; Li, Shengming; Yin, Yajiang; Dai, Keren; Zhang, Guangjie; Lin, Long; Wen, Zhen; Guo, Hengyu; Wang, Jie; Yeh, Min-Hsin; Zi, Yunlong; Liao, Qingliang; You, Zheng; Zhang, Yue; Wang, Zhong Lin
2016-01-01
The rapid growth of deformable and stretchable electronics calls for a deformable and stretchable power source. We report a scalable approach for energy harvesters and self-powered sensors that can be highly deformable and stretchable. With conductive liquid contained in a polymer cover, a shape-adaptive triboelectric nanogenerator (saTENG) unit can effectively harvest energy in various working modes. The saTENG can maintain its performance under a strain of as large as 300%. The saTENG is so flexible that it can be conformed to any three-dimensional and curvilinear surface. We demonstrate applications of the saTENG as a wearable power source and self-powered sensor to monitor biomechanical motion. A bracelet-like saTENG worn on the wrist can light up more than 80 light-emitting diodes. Owing to the highly scalable manufacturing process, the saTENG can be easily applied for large-area energy harvesting. In addition, the saTENG can be extended to extract energy from mechanical motion using flowing water as the electrode. This approach provides a new prospect for deformable and stretchable power sources, as well as self-powered sensors, and has potential applications in various areas such as robotics, biomechanics, physiology, kinesiology, and entertainment. PMID:27386560
High-Performance Screen-Printed Thermoelectric Films on Fabrics
Shin, Sunmi; Kumar, Rajan; Roh, Jong Wook; ...
2017-08-04
Printing techniques could offer a scalable approach to fabricate thermoelectric (TE) devices on flexible substrates for power generation used in wearable devices and personalized thermo-regulation. However, typical printing processes need a large concentration of binder additives, which often render a detrimental effect on electrical transport of the printed TE layers. Here, we report scalable screenprinting of TE layers on flexible fiber glass fabrics, by rationally optimizing the printing inks consisting of TE particles (p-type Bi 0.5Sb 1.5Te 3 or n-type Bi 2Te 2.7Se 0.3), binders, and organic solvents. We identified a suitable binder additive, methyl cellulose, which offers suitable viscositymore » for printability at a very small concentration (0.45–0.60 wt.%), thus minimizing its negative impact on electrical transport. Following printing, the binders were subsequently burnt off via sintering and hot pressing. We found that the nanoscale defects left behind after the binder burnt off became effective phonon scattering centers, leading to low lattice thermal conductivity in the printed n-type material. With the high electrical conductivity and low thermal conductivity, the screen-printed TE layers showed high room-temperature ZT values of 0.65 and 0.81 for p-type and n-type, respectively.« less
Scalable clustering algorithms for continuous environmental flow cytometry.
Hyrkas, Jeremy; Clayton, Sophie; Ribalet, Francois; Halperin, Daniel; Armbrust, E Virginia; Howe, Bill
2016-02-01
Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. hyrkas@cs.washington.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
High-Performance Screen-Printed Thermoelectric Films on Fabrics.
Shin, Sunmi; Kumar, Rajan; Roh, Jong Wook; Ko, Dong-Su; Kim, Hyun-Sik; Kim, Sang Il; Yin, Lu; Schlossberg, Sarah M; Cui, Shuang; You, Jung-Min; Kwon, Soonshin; Zheng, Jianlin; Wang, Joseph; Chen, Renkun
2017-08-04
Printing techniques could offer a scalable approach to fabricate thermoelectric (TE) devices on flexible substrates for power generation used in wearable devices and personalized thermo-regulation. However, typical printing processes need a large concentration of binder additives, which often render a detrimental effect on electrical transport of the printed TE layers. Here, we report scalable screen-printing of TE layers on flexible fiber glass fabrics, by rationally optimizing the printing inks consisting of TE particles (p-type Bi 0.5 Sb 1.5 Te 3 or n-type Bi 2 Te 2.7 Se 0.3 ), binders, and organic solvents. We identified a suitable binder additive, methyl cellulose, which offers suitable viscosity for printability at a very small concentration (0.45-0.60 wt.%), thus minimizing its negative impact on electrical transport. Following printing, the binders were subsequently burnt off via sintering and hot pressing. We found that the nanoscale defects left behind after the binder burnt off became effective phonon scattering centers, leading to low lattice thermal conductivity in the printed n-type material. With the high electrical conductivity and low thermal conductivity, the screen-printed TE layers showed high room-temperature ZT values of 0.65 and 0.81 for p-type and n-type, respectively.
Yi, Fang; Wang, Xiaofeng; Niu, Simiao; Li, Shengming; Yin, Yajiang; Dai, Keren; Zhang, Guangjie; Lin, Long; Wen, Zhen; Guo, Hengyu; Wang, Jie; Yeh, Min-Hsin; Zi, Yunlong; Liao, Qingliang; You, Zheng; Zhang, Yue; Wang, Zhong Lin
2016-06-01
The rapid growth of deformable and stretchable electronics calls for a deformable and stretchable power source. We report a scalable approach for energy harvesters and self-powered sensors that can be highly deformable and stretchable. With conductive liquid contained in a polymer cover, a shape-adaptive triboelectric nanogenerator (saTENG) unit can effectively harvest energy in various working modes. The saTENG can maintain its performance under a strain of as large as 300%. The saTENG is so flexible that it can be conformed to any three-dimensional and curvilinear surface. We demonstrate applications of the saTENG as a wearable power source and self-powered sensor to monitor biomechanical motion. A bracelet-like saTENG worn on the wrist can light up more than 80 light-emitting diodes. Owing to the highly scalable manufacturing process, the saTENG can be easily applied for large-area energy harvesting. In addition, the saTENG can be extended to extract energy from mechanical motion using flowing water as the electrode. This approach provides a new prospect for deformable and stretchable power sources, as well as self-powered sensors, and has potential applications in various areas such as robotics, biomechanics, physiology, kinesiology, and entertainment.
NASA Astrophysics Data System (ADS)
Xu, Feng; Ge, Binghui; Chen, Jing; Nathan, Arokia; Xin, Linhuo L.; Ma, Hongyu; Min, Huihua; Zhu, Chongyang; Xia, Weiwei; Li, Zhengrui; Li, Shengli; Yu, Kaihao; Wu, Lijun; Cui, Yiping; Sun, Litao; Zhu, Yimei
2016-06-01
Atomically thin black phosphorus (called phosphorene) holds great promise as an alternative to graphene and other two-dimensional transition-metal dichalcogenides as an anode material for lithium-ion batteries (LIBs). However, bulk black phosphorus (BP) suffers from rapid capacity fading and poor rechargeable performance. This work reports for the first time the use of in situ transmission electron microscopy (TEM) to construct nanoscale phosphorene LIBs. This enables direct visualization of the mechanisms underlying capacity fading in thick multilayer phosphorene through real-time capture of delithiation-induced structural decomposition, which serves to reduce electrical conductivity thus causing irreversibility of the lithiated phases. We further demonstrate that few-layer-thick phosphorene successfully circumvents the structural decomposition and holds superior structural restorability, even when subject to multi-cycle lithiation/delithiation processes and concomitant huge volume expansion. This finding provides breakthrough insights into thickness-dependent lithium diffusion kinetics in phosphorene. More importantly, a scalable liquid-phase shear exfoliation route has been developed to produce high-quality ultrathin phosphorene using simple means such as a high-speed shear mixer or even a household kitchen blender with the shear rate threshold of ˜1.25 × 104 s-1. The results reported here will pave the way for industrial-scale applications of rechargeable phosphorene LIBs.
Xu, Feng; Ge, Binghui; Chen, Jing; ...
2016-03-30
Atomically thin black phosphorus (called phosphorene) holds great promise as an alternative to graphene and other two-dimensional transition-metal dichalcogenides as an anode material for lithium-ion batteries (LIBs). But, bulk black phosphorus (BP) suffers from rapid capacity fading and poor rechargeable performance. This work reports for the first time the use of in situ transmission electron microscopy (TEM) to construct nanoscale phosphorene LIBs. This enables direct visualization of the mechanisms underlying capacity fading in thick multilayer phosphorene through real-time capture of delithiation-induced structural decomposition, which serves to reduce electrical conductivity thus causing irreversibility of the lithiated phases. Furthermore, we demonstrate thatmore » few-layer-thick phosphorene successfully circumvents the structural decomposition and holds superior structural restorability, even when subject to multi-cycle lithiation/delithiation processes and concomitant huge volume expansion. This finding provides breakthrough insights into thickness-dependent lithium diffusion kinetics in phosphorene. More importantly, a scalable liquid-phase shear exfoliation route has been developed to produce high-quality ultrathin phosphorene using simple means such as a high-speed shear mixer or even a household kitchen blender with the shear rate threshold of ~1.25 × 10 4 s -1. Our results reported here will pave the way for industrial-scale applications of rechargeable phosphorene LIBs.« less
Highly scalable and robust rule learner: performance evaluation and comparison.
Kurgan, Lukasz A; Cios, Krzysztof J; Dick, Scott
2006-02-01
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.
High-Performance Screen-Printed Thermoelectric Films on Fabrics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Sunmi; Kumar, Rajan; Roh, Jong Wook
Printing techniques could offer a scalable approach to fabricate thermoelectric (TE) devices on flexible substrates for power generation used in wearable devices and personalized thermo-regulation. However, typical printing processes need a large concentration of binder additives, which often render a detrimental effect on electrical transport of the printed TE layers. Here, we report scalable screenprinting of TE layers on flexible fiber glass fabrics, by rationally optimizing the printing inks consisting of TE particles (p-type Bi 0.5Sb 1.5Te 3 or n-type Bi 2Te 2.7Se 0.3), binders, and organic solvents. We identified a suitable binder additive, methyl cellulose, which offers suitable viscositymore » for printability at a very small concentration (0.45–0.60 wt.%), thus minimizing its negative impact on electrical transport. Following printing, the binders were subsequently burnt off via sintering and hot pressing. We found that the nanoscale defects left behind after the binder burnt off became effective phonon scattering centers, leading to low lattice thermal conductivity in the printed n-type material. With the high electrical conductivity and low thermal conductivity, the screen-printed TE layers showed high room-temperature ZT values of 0.65 and 0.81 for p-type and n-type, respectively.« less
Next generation laser for Inertial Confinement Fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marshall, C.D.; Beach, J.; Bibeau, C.
1997-07-18
We are in the process of developing and building the ``Mercury`` laser system as the first in a series of a new generation of diode-pumped solid-state Inertial Confinement Fusion (ICF) lasers at LLNL. Mercury will be the first integrated demonstration of a scalable laser architecture compatible with advanced high energy density (HED) physics applications. Primary performance goals include 10% efficiencies at 10 Hz and a 1-10 ns pulse with 1{omega} energies of 100 J and with 2{omega}/3{omega} frequency conversion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ali, Amjad Majid; Albert, Don; Andersson, Par
SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small computer clusters. As a cluster resource manager, SLURM has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work 9normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
Shir, Daniel; Ballard, Zachary S.; Ozcan, Aydogan
2016-01-01
Mechanical flexibility and the advent of scalable, low-cost, and high-throughput fabrication techniques have enabled numerous potential applications for plasmonic sensors. Sensitive and sophisticated biochemical measurements can now be performed through the use of flexible plasmonic sensors integrated into existing medical and industrial devices or sample collection units. More robust sensing schemes and practical techniques must be further investigated to fully realize the potentials of flexible plasmonics as a framework for designing low-cost, embedded and integrated sensors for medical, environmental, and industrial applications. PMID:27547023
Density-matrix-based algorithm for solving eigenvalue problems
NASA Astrophysics Data System (ADS)
Polizzi, Eric
2009-03-01
A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.
Photon-photon entanglement with a single trapped atom.
Weber, B; Specht, H P; Müller, T; Bochmann, J; Mücke, M; Moehring, D L; Rempe, G
2009-01-23
An experiment is performed where a single rubidium atom trapped within a high-finesse optical cavity emits two independently triggered entangled photons. The entanglement is mediated by the atom and is characterized both by a Bell inequality violation of S=2.5, as well as full quantum-state tomography, resulting in a fidelity exceeding F=90%. The combination of cavity-QED and trapped atom techniques makes our protocol inherently deterministic--an essential step for the generation of scalable entanglement between the nodes of a distributed quantum network.
NASA Astrophysics Data System (ADS)
Qiao, Mu
2015-03-01
Service Oriented Architecture1 (SOA) is widely used in building flexible and scalable web sites and services. In most of the web or mobile photo book and gifting business space, the products ordered are highly variable without a standard template that one can substitute texts or images from similar to that of commercial variable data printing. In this paper, the author describes a SOA workflow in a multi-sites, multi-product lines fulfillment system where three major challenges are addressed: utilization of hardware and equipment, highly automation with fault recovery, and highly scalable and flexible with order volume fluctuation.
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy
2015-05-01
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.
Toward Scalable Benchmarks for Mass Storage Systems
NASA Technical Reports Server (NTRS)
Miller, Ethan L.
1996-01-01
This paper presents guidelines for the design of a mass storage system benchmark suite, along with preliminary suggestions for programs to be included. The benchmarks will measure both peak and sustained performance of the system as well as predicting both short- and long-term behavior. These benchmarks should be both portable and scalable so they may be used on storage systems from tens of gigabytes to petabytes or more. By developing a standard set of benchmarks that reflect real user workload, we hope to encourage system designers and users to publish performance figures that can be compared with those of other systems. This will allow users to choose the system that best meets their needs and give designers a tool with which they can measure the performance effects of improvements to their systems.
SLURM: Simple Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Dunlap, C; Garlick, J
2002-07-08
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. The design also includes a scalable, general-purpose communication infrastructure. This paper presents a overview of the SLURM architecture and functionality.
A high-performance supercapacitor electrode based on N-doped porous graphene
NASA Astrophysics Data System (ADS)
Dai, Shuge; Liu, Zhen; Zhao, Bote; Zeng, Jianhuang; Hu, Hao; Zhang, Qiaobao; Chen, Dongchang; Qu, Chong; Dang, Dai; Liu, Meilin
2018-05-01
The development of high-performance supercapacitors (SCs) often faces some contradictory and competing requirements such as excellent rate capability, long cycling life, and high energy density. One effective strategy is to explore electrode materials of high capacitance, electrode architectures of fast charge and mass transfer, and electrolytes of wide voltage window. Here we report a facile and readily scalable strategy to produce high-performance N-doped graphene with a high specific capacitance (∼390 F g-1). A symmetric SC device with a wide voltage window of 3.5 V is also successfully fabricated based on the N-doped graphene electrode. More importantly, the as-assembled symmetric SC delivers a high energy density of 55 Wh kg-1 at a power density of 1800 W kg-1 while maintaining superior cycling life (retaining 96.6% of the initial capacitance after 20,000 cycles). Even at a power density as high as 8800 W kg-1, it still retains an energy density of 29 Wh kg-1, higher than those of previously reported graphene-based symmetric SCs.
NASA Astrophysics Data System (ADS)
Tsang, Sik-Ho; Chan, Yui-Lam; Siu, Wan-Chi
2017-01-01
Weighted prediction (WP) is an efficient video coding tool that was introduced since the establishment of the H.264/AVC video coding standard, for compensating the temporal illumination change in motion estimation and compensation. WP parameters, including a multiplicative weight and an additive offset for each reference frame, are required to be estimated and transmitted to the decoder by slice header. These parameters cause extra bits in the coded video bitstream. High efficiency video coding (HEVC) provides WP parameter prediction to reduce the overhead. Therefore, WP parameter prediction is crucial to research works or applications, which are related to WP. Prior art has been suggested to further improve the WP parameter prediction by implicit prediction of image characteristics and derivation of parameters. By exploiting both temporal and interlayer redundancies, we propose three WP parameter prediction algorithms, enhanced implicit WP parameter, enhanced direct WP parameter derivation, and interlayer WP parameter, to further improve the coding efficiency of HEVC. Results show that our proposed algorithms can achieve up to 5.83% and 5.23% bitrate reduction compared to the conventional scalable HEVC in the base layer for SNR scalability and 2× spatial scalability, respectively.
Beyond NextGen: AutoMax Overview and Update
NASA Technical Reports Server (NTRS)
Kopardekar, Parimal; Alexandrov, Natalia
2013-01-01
Main Message: National and Global Needs - Develop scalable airspace operations management system to accommodate increased mobility needs, emerging airspace uses, mix, future demand. Be affordable and economically viable. Sense of Urgency. Saturation (delays), emerging airspace uses, proactive development. Autonomy is Needed for Airspace Operations to Meet Future Needs. Costs, time critical decisions, mobility, scalability, limits of cognitive workload. AutoMax to Accommodate National and Global Needs. Auto: Automation, autonomy, autonomicity for airspace operations. Max: Maximizing performance of the National Airspace System. Interesting Challenges and Path Forward.
Scalable Coding of Plenoptic Images by Using a Sparse Set and Disparities.
Li, Yun; Sjostrom, Marten; Olsson, Roger; Jennehag, Ulf
2016-01-01
One of the light field capturing techniques is the focused plenoptic capturing. By placing a microlens array in front of the photosensor, the focused plenoptic cameras capture both spatial and angular information of a scene in each microlens image and across microlens images. The capturing results in a significant amount of redundant information, and the captured image is usually of a large resolution. A coding scheme that removes the redundancy before coding can be of advantage for efficient compression, transmission, and rendering. In this paper, we propose a lossy coding scheme to efficiently represent plenoptic images. The format contains a sparse image set and its associated disparities. The reconstruction is performed by disparity-based interpolation and inpainting, and the reconstructed image is later employed as a prediction reference for the coding of the full plenoptic image. As an outcome of the representation, the proposed scheme inherits a scalable structure with three layers. The results show that plenoptic images are compressed efficiently with over 60 percent bit rate reduction compared with High Efficiency Video Coding intra coding, and with over 20 percent compared with an High Efficiency Video Coding block copying mode.
Machine learning patterns for neuroimaging-genetic studies in the cloud.
Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand
2014-01-01
Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
An integrated semiconductor device enabling non-optical genome sequencing.
Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James
2011-07-20
The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
Surface engineering of hierarchical platinum-cobalt nanowires for efficient electrocatalysis
Bu, Lingzheng; Guo, Shaojun; Zhang, Xu; ...
2016-06-29
Despite intense research in past decades, the lack of high-performance catalysts for fuel cell reactions remains a challenge in realizing fuel cell technologies for transportation applications. Here we report a facile strategy for synthesizing hierarchical platinum-cobalt nanowires with high-index, platinum-rich facets and ordered intermetallic structure. These structural features enable unprecedented performance for the oxygen reduction and alcohol oxidation reactions. The specific/mass activities of the platinum-cobalt nanowires for oxygen reduction reaction are 39.6/33.7 times higher than commercial Pt/C catalyst, respectively. Density functional theory simulations reveal that the active threefold hollow sites on the platinum-rich high-index facets provide an additional factor inmore » enhancing oxygen reduction reaction activities. The nanowires are stable in the electrochemical conditions and also thermally stable. Furthermore, this work may represent a key step towards scalable production of high performance platinum-based nanowires for applications in catalysis and energy conversion.« less
NASA Astrophysics Data System (ADS)
Choi, Shinhyun; Tan, Scott H.; Li, Zefan; Kim, Yunjo; Choi, Chanyeol; Chen, Pai-Yu; Yeon, Hanwool; Yu, Shimeng; Kim, Jeehwan
2018-01-01
Although several types of architecture combining memory cells and transistors have been used to demonstrate artificial synaptic arrays, they usually present limited scalability and high power consumption. Transistor-free analog switching devices may overcome these limitations, yet the typical switching process they rely on—formation of filaments in an amorphous medium—is not easily controlled and hence hampers the spatial and temporal reproducibility of the performance. Here, we demonstrate analog resistive switching devices that possess desired characteristics for neuromorphic computing networks with minimal performance variations using a single-crystalline SiGe layer epitaxially grown on Si as a switching medium. Such epitaxial random access memories utilize threading dislocations in SiGe to confine metal filaments in a defined, one-dimensional channel. This confinement results in drastically enhanced switching uniformity and long retention/high endurance with a high analog on/off ratio. Simulations using the MNIST handwritten recognition data set prove that epitaxial random access memories can operate with an online learning accuracy of 95.1%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.
2015-01-20
In order to run Monte Carlo particle transport calculations on new supercomputers with hundreds of thousands or millions of processors, care must be taken to implement scalable algorithms. This means that the algorithms must continue to perform well as the processor count increases. In this paper, we examine the scalability of:(1) globally resolving the particle locations on the correct processor, (2) deciding that particle streaming communication has finished, and (3) efficiently coupling neighbor domains together with different replication levels. We have run domain decomposed Monte Carlo particle transport on up to 2 21 = 2,097,152 MPI processes on the IBMmore » BG/Q Sequoia supercomputer and observed scalable results that agree with our theoretical predictions. These calculations were carefully constructed to have the same amount of work on every processor, i.e. the calculation is already load balanced. We also examine load imbalanced calculations where each domain’s replication level is proportional to its particle workload. In this case we show how to efficiently couple together adjacent domains to maintain within workgroup load balance and minimize memory usage.« less
Highly Sensitive Bulk Silicon Chemical Sensors with Sub-5 nm Thin Charge Inversion Layers.
Fahad, Hossain M; Gupta, Niharika; Han, Rui; Desai, Sujay B; Javey, Ali
2018-03-27
There is an increasing demand for mass-producible, low-power gas sensors in a wide variety of industrial and consumer applications. Here, we report chemical-sensitive field-effect-transistors (CS-FETs) based on bulk silicon wafers, wherein an electrostatically confined sub-5 nm thin charge inversion layer is modulated by chemical exposure to achieve a high-sensitivity gas-sensing platform. Using hydrogen sensing as a "litmus" test, we demonstrate large sensor responses (>1000%) to 0.5% H 2 gas, with fast response (<60 s) and recovery times (<120 s) at room temperature and low power (<50 μW). On the basis of these performance metrics as well as standardized benchmarking, we show that bulk silicon CS-FETs offer similar or better sensing performance compared to emerging nanostructures semiconductors while providing a highly scalable and manufacturable platform.
Gil-Santos, Eduardo; Baker, Christopher; Lemaître, Aristide; Gomez, Carmen; Leo, Giuseppe; Favero, Ivan
2017-01-01
Photonic lattices of mutually interacting indistinguishable cavities represent a cornerstone of collective phenomena in optics and could become important in advanced sensing or communication devices. The disorder induced by fabrication technologies has so far hindered the development of such resonant cavity architectures, while post-fabrication tuning methods have been limited by complexity and poor scalability. Here we present a new simple and scalable tuning method for ensembles of microphotonic and nanophotonic resonators, which enables their permanent collective spectral alignment. The method introduces an approach of cavity-enhanced photoelectrochemical etching in a fluid, a resonant process triggered by sub-bandgap light that allows for high selectivity and precision. The technique is presented on a gallium arsenide nanophotonic platform and illustrated by finely tuning one, two and up to five resonators. It opens the way to applications requiring large networks of identical resonators and their spectral referencing to external etalons. PMID:28117394
Fischer, Michael G; Hua, Xiao; Wilts, Bodo D; Castillo-Martínez, Elizabeth; Steiner, Ullrich
2018-01-17
Lithium iron phosphate (LFP) is currently one of the main cathode materials used in lithium-ion batteries due to its safety, relatively low cost, and exceptional cycle life. To overcome its poor ionic and electrical conductivities, LFP is often nanostructured, and its surface is coated with conductive carbon (LFP/C). Here, we demonstrate a sol-gel based synthesis procedure that utilizes a block copolymer (BCP) as a templating agent and a homopolymer as an additional carbon source. The high-molecular-weight BCP produces self-assembled aggregates with the precursor-sol on the 10 nm scale, stabilizing the LFP structure during crystallization at high temperatures. This results in a LFP nanonetwork consisting of interconnected ∼10 nm-sized particles covered by a uniform carbon coating that displays a high rate performance and an excellent cycle life. Our "one-pot" method is facile and scalable for use in established battery production methodologies.
Ultrasensitive plano-concave optical microresonators for ultrasound sensing
NASA Astrophysics Data System (ADS)
Guggenheim, James A.; Li, Jing; Allen, Thomas J.; Colchester, Richard J.; Noimark, Sacha; Ogunlade, Olumide; Parkin, Ivan P.; Papakonstantinou, Ioannis; Desjardins, Adrien E.; Zhang, Edward Z.; Beard, Paul C.
2017-11-01
Highly sensitive broadband ultrasound detectors are needed to expand the capabilities of biomedical ultrasound, photoacoustic imaging and industrial ultrasonic non-destructive testing techniques. Here, a generic optical ultrasound sensing concept based on a novel plano-concave polymer microresonator is described. This achieves strong optical confinement (Q-factors > 105) resulting in very high sensitivity with excellent broadband acoustic frequency response and wide directivity. The concept is highly scalable in terms of bandwidth and sensitivity. To illustrate this, a family of microresonator sensors with broadband acoustic responses up to 40 MHz and noise-equivalent pressures as low as 1.6 mPa per √Hz have been fabricated and comprehensively characterized in terms of their acoustic performance. In addition, their practical application to high-resolution photoacoustic and ultrasound imaging is demonstrated. The favourable acoustic performance and design flexibility of the technology offers new opportunities to advance biomedical and industrial ultrasound-based techniques.
Liebe, J D; Hübner, U
2013-01-01
Continuous improvements of IT-performance in healthcare organisations require actionable performance indicators, regularly conducted, independent measurements and meaningful and scalable reference groups. Existing IT-benchmarking initiatives have focussed on the development of reliable and valid indicators, but less on the questions about how to implement an environment for conducting easily repeatable and scalable IT-benchmarks. This study aims at developing and trialling a procedure that meets the afore-mentioned requirements. We chose a well established, regularly conducted (inter-) national IT-survey of healthcare organisations (IT-Report Healthcare) as the environment and offered the participants of the 2011 survey (CIOs of hospitals) to enter a benchmark. The 61 structural and functional performance indicators covered among others the implementation status and integration of IT-systems and functions, global user satisfaction and the resources of the IT-department. Healthcare organisations were grouped by size and ownership. The benchmark results were made available electronically and feedback on the use of these results was requested after several months. Fifty-ninehospitals participated in the benchmarking. Reference groups consisted of up to 141 members depending on the number of beds (size) and the ownership (public vs. private). A total of 122 charts showing single indicator frequency views were sent to each participant. The evaluation showed that 94.1% of the CIOs who participated in the evaluation considered this benchmarking beneficial and reported that they would enter again. Based on the feedback of the participants we developed two additional views that provide a more consolidated picture. The results demonstrate that establishing an independent, easily repeatable and scalable IT-benchmarking procedure is possible and was deemed desirable. Based on these encouraging results a new benchmarking round which includes process indicators is currently conducted.
BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.
Ausmees, Kristiina; John, Aji; Toor, Salman Z; Hellander, Andreas; Nettelblad, Carl
2018-06-26
The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds - an increasingly common scenario for both academic and commercial cloud providers - our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.
Lead-free 0.5Ba(Zr0.2Ti0.8)O3-0.5(Ba0.7Ca0.3)TiO3 nanowires for energy harvesting.
Zhou, Zhi; Bowland, Christopher C; Malakooti, Mohammad H; Tang, Haixiong; Sodano, Henry A
2016-03-07
Lead-free piezoelectric nanowires (NWs) show strong potential in sensing and energy harvesting applications due to their flexibility and ability to convert mechanical energy to electric energy. Currently, most lead-free piezoelectric NWs are produced through low yield synthesis methods and result in low electromechanical coupling, which limit their efficiency as energy harvesters. In order to alleviate these issues, a scalable method is developed to synthesize perovskite type 0.5Ba(Zr0.2Ti0.8)O3-0.5(Ba0.7Ca0.3)TiO3 (BZT-BCT) NWs with high piezoelectric coupling coefficient. The piezoelectric coupling coefficient of the BZT-BCT NWs is measured by a refined piezoresponse force microscopy (PFM) testing method and shows the highest reported coupling coefficient for lead-free piezoelectric nanowires of 90 ± 5 pm V(-1). Flexible nanocomposites utilizing dispersed BZT-BCT NWs are fabricated to demonstrate an energy harvesting application with an open circuit voltage of up to 6.25 V and a power density of up to 2.25 μW cm(-3). The high electromechanical coupling coefficient and high power density demonstrated with these lead-free NWs produced via a scalable synthesis method shows the potential for high performance NW-based devices.
HRLSim: a high performance spiking neural network simulator for GPGPU clusters.
Minkovich, Kirill; Thibeault, Corey M; O'Brien, Michael John; Nogin, Aleksey; Cho, Youngkwan; Srinivasa, Narayan
2014-02-01
Modeling of large-scale spiking neural models is an important tool in the quest to understand brain function and subsequently create real-world applications. This paper describes a spiking neural network simulator environment called HRL Spiking Simulator (HRLSim). This simulator is suitable for implementation on a cluster of general purpose graphical processing units (GPGPUs). Novel aspects of HRLSim are described and an analysis of its performance is provided for various configurations of the cluster. With the advent of inexpensive GPGPU cards and compute power, HRLSim offers an affordable and scalable tool for design, real-time simulation, and analysis of large-scale spiking neural networks.
A Full Mesh ATCA-based General Purpose Data Processing Board (Pulsar II)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ajuha, S.
The Pulsar II is a custom ATCA full mesh enabled FPGA-based processor board which has been designed with the goal of creating a scalable architecture abundant in flexible, non-blocking, high bandwidth interconnections. The design has been motivated by silicon-based tracking trigger needs for LHC experiments. In this technical memo we describe the Pulsar II hardware and its performance, such as the performance test results with full mesh backplanes from different vendors, how the backplane is used for the development of low-latency time-multiplexed data transfer schemes and how the inter-shelf and intra-shelf synchronization works.
Next Generation Integrated Environment for Collaborative Work Across Internets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harvey B. Newman
2009-02-24
We are now well-advanced in our development, prototyping and deployment of a high performance next generation Integrated Environment for Collaborative Work. The system, aimed at using the capability of ESnet and Internet2 for rapid data exchange, is based on the Virtual Room Videoconferencing System (VRVS) developed by Caltech. The VRVS system has been chosen by the Internet2 Digital Video (I2-DV) Initiative as a preferred foundation for the development of advanced video, audio and multimedia collaborative applications by the Internet2 community. Today, the system supports high-end, broadcast-quality interactivity, while enabling a wide variety of clients (Mbone, H.323) to participate in themore » same conference by running different standard protocols in different contexts with different bandwidth connection limitations, has a fully Web-integrated user interface, developers and administrative APIs, a widely scalable video network topology based on both multicast domains and unicast tunnels, and demonstrated multiplatform support. This has led to its rapidly expanding production use for national and international scientific collaborations in more than 60 countries. We are also in the process of creating a 'testbed video network' and developing the necessary middleware to support a set of new and essential requirements for rapid data exchange, and a high level of interactivity in large-scale scientific collaborations. These include a set of tunable, scalable differentiated network services adapted to each of the data streams associated with a large number of collaborative sessions, policy-based and network state-based resource scheduling, authentication, and optional encryption to maintain confidentiality of inter-personal communications. High performance testbed video networks will be established in ESnet and Internet2 to test and tune the implementation, using a few target application-sets.« less
NASA Astrophysics Data System (ADS)
Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.
2018-03-01
A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Drotar, Alexander P.; Quinn, Erin E.; Sutherland, Landon D.
2012-07-30
Project description is: (1) Build a high performance computer; and (2) Create a tool to monitor node applications in Component Based Tool Framework (CBTF) using code from Lightweight Data Metric Service (LDMS). The importance of this project is that: (1) there is a need a scalable, parallel tool to monitor nodes on clusters; and (2) New LDMS plugins need to be able to be easily added to tool. CBTF stands for Component Based Tool Framework. It's scalable and adjusts to different topologies automatically. It uses MRNet (Multicast/Reduction Network) mechanism for information transport. CBTF is flexible and general enough to bemore » used for any tool that needs to do a task on many nodes. Its components are reusable and 'EASILY' added to a new tool. There are three levels of CBTF: (1) frontend node - interacts with users; (2) filter nodes - filters or concatenates information from backend nodes; and (3) backend nodes - where the actual work of the tool is done. LDMS stands for lightweight data metric servies. It's a tool used for monitoring nodes. Ltool is the name of the tool we derived from LDMS. It's dynamically linked and includes the following components: Vmstat, Meminfo, Procinterrupts and more. It works by: Ltool command is run on the frontend node; Ltool collects information from the backend nodes; backend nodes send information to the filter nodes; and filter nodes concatenate information and send to a database on the front end node. Ltool is a useful tool when it comes to monitoring nodes on a cluster because the overhead involved with running the tool is not particularly high and it will automatically scale to any size cluster.« less
Investigation on scalable high-power lasers with enhanced 'eye-safety' for future weapon systems
NASA Astrophysics Data System (ADS)
Bigotta, S.; Diener, K.; Eichhorn, M.; Galecki, L.; Geiss, L.; Ibach, T.; Scharf, H.; von Salisch, M.; Schöner, J.; Vincent, G.
2016-10-01
The possible use of lasers as weapons becomes more and more interesting for military forces. Besides the generation of high laser power and good beam quality, also safety considerations, e. g. concerning eye hazards, are of importance. The MELIAS (medium energy laser in the "eye-safe" spectral domain) project of ISL addresses these issues, and ISL has developed the most powerful solid-state laser in the "eye-safe" wavelength region up to now. "Eye safety" in this context means that light at a wavelength of > 1.4 μm does not penetrate the eye and thus will not be focused onto the retina. The basic principle of this technology is that a laser source needs to be scalable in power to far beyond 100 kW without a significant deterioration in beam quality. ISL has studied a very promising laser technology: the erbium heat-capacity laser. This type of laser is characterised by a compact design, a simple and robust technology and a scaling law which, in principle, allows the generation of laser power far beyond megawatts at small volumes. Previous investigations demonstrated the scalability of the SSHCL and up to 4.65 kW and 440 J in less than 800 ms have been obtained. Opticalto- optical efficiencies of over 41% and slope efficiencies of over 51% are obtained. The residual thermal gradients, due to non perfect pumping homogeneity, negatively affect the performance in terms of laser pulse energy, duration and beam quality. In the course of the next two years, ISL will be designing a 25 to 30 kW erbium heat-capacity laser.
Photoignition Torch Applied to Cryogenic H2/O2 Coaxial Jet
2016-12-06
suitable for certain thrusters and liquid rocket engines. This ignition system is scalable for applications in different combustion chambers such as gas ...turbines, gas generators, liquid rocket engines, and multi grain solid rocket motors. photoignition, fuel spray ignition, high pressure ignition...thrusters and liquid rocket engines. This ignition system is scalable for applications in different combustion chambers such as gas turbines, gas