A Scalable Architecture of a Structured LDPC Decoder
NASA Technical Reports Server (NTRS)
Lee, Jason Kwok-San; Lee, Benjamin; Thorpe, Jeremy; Andrews, Kenneth; Dolinar, Sam; Hamkins, Jon
2004-01-01
We present a scalable decoding architecture for a certain class of structured LDPC codes. The codes are designed using a small (n,r) protograph that is replicated Z times to produce a decoding graph for a (Z x n, Z x r) code. Using this architecture, we have implemented a decoder for a (4096,2048) LDPC code on a Xilinx Virtex-II 2000 FPGA, and achieved decoding speeds of 31 Mbps with 10 fixed iterations. The implemented message-passing algorithm uses an optimized 3-bit non-uniform quantizer that operates with 0.2dB implementation loss relative to a floating point decoder.
Kashyap, Vipul; Morales, Alfredo; Hongsermeier, Tonya
2006-01-01
We present an approach and architecture for implementing scalable and maintainable clinical decision support at the Partners HealthCare System. The architecture integrates a business rules engine that executes declarative if-then rules stored in a rule-base referencing objects and methods in a business object model. The rules engine executes object methods by invoking services implemented on the clinical data repository. Specialized inferences that support classification of data and instances into classes are identified and an approach to implement these inferences using an OWL based ontology engine is presented. Alternative representations of these specialized inferences as if-then rules or OWL axioms are explored and their impact on the scalability and maintenance of the system is presented. Architectural alternatives for integration of clinical decision support functionality with the invoking application and the underlying clinical data repository; and their associated trade-offs are discussed and presented.
Architectural Considerations for Highly Scalable Computing to Support On-demand Video Analytics
2017-04-19
enforcement . The system was tested in the wild using video files as well as a commercial Video Management System supporting more than 100 surveillance...research were used to implement a distributed on-demand video analytics system that was prototyped for the use of forensics investigators in law...cameras as video sources. The architectural considerations of this system are presented. Issues to be reckoned with in implementing a scalable
A reference architecture for integrated EHR in Colombia.
de la Cruz, Edgar; Lopez, Diego M; Uribe, Gustavo; Gonzalez, Carolina; Blobel, Bernd
2011-01-01
The implementation of national EHR infrastructures has to start by a detailed definition of the overall structure and behavior of the EHR system (system architecture). Architectures have to be open, scalable, flexible, user accepted and user friendly, trustworthy, based on standards including terminologies and ontologies. The GCM provides an architectural framework created with the purpose of analyzing any kind of system, including EHR system´s architectures. The objective of this paper is to propose a reference architecture for the implementation of an integrated EHR in Colombia, based on the current state of system´s architectural models, and EHR standards. The proposed EHR architecture defines a set of services (elements) and their interfaces, to support the exchange of clinical documents, offering an open, scalable, flexible and semantically interoperable infrastructure. The architecture was tested in a pilot tele-consultation project in Colombia, where dental EHR are exchanged.
Scalable Architecture for Multihop Wireless ad Hoc Networks
NASA Technical Reports Server (NTRS)
Arabshahi, Payman; Gray, Andrew; Okino, Clayton; Yan, Tsun-Yee
2004-01-01
A scalable architecture for wireless digital data and voice communications via ad hoc networks has been proposed. Although the details of the architecture and of its implementation in hardware and software have yet to be developed, the broad outlines of the architecture are fairly clear: This architecture departs from current commercial wireless communication architectures, which are characterized by low effective bandwidth per user and are not well suited to low-cost, rapid scaling in large metropolitan areas. This architecture is inspired by a vision more akin to that of more than two dozen noncommercial community wireless networking organizations established by volunteers in North America and several European countries.
A scalable healthcare information system based on a service-oriented architecture.
Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei
2011-06-01
Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.
Gichoya, Judy; Pearce, Chris; Wickramasinghe, Nilmini
2013-01-01
Kenya ranks among the twenty-two countries that collectively contribute about 80% of the world's Tuberculosis cases; with a 50-200 fold increased risk of tuberculosis in HIV infected persons versus non-HIV hosts. Contemporaneously, there is an increase in mobile penetration and its use to support healthcare throughout Africa. Many are skeptical that such m-health solutions are unsustainable and not scalable. We seek to design a scalable, pervasive m-health solution for Tuberculosis care to become a use case for sustainable and scalable health IT in limited resource settings. We combine agile design principles and user-centered design to develop the architecture needed for this initiative. Furthermore, the architecture runs on multiple devices integrated to deliver functionality critical for successful Health IT implementation in limited resource settings. It is anticipated that once fully implemented, the proposed m-health solution will facilitate superior monitoring and management of Tuberculosis and thereby reduce the alarming statistic regarding this disease in this region.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
Different micromanipulation applications based on common modular control architecture
NASA Astrophysics Data System (ADS)
Sipola, Risto; Vallius, Tero; Pudas, Marko; Röning, Juha
2010-01-01
This paper validates a previously introduced scalable modular control architecture and shows how it can be used to implement research equipment. The validation is conducted by presenting different kinds of micromanipulation applications that use the architecture. Conditions of the micro-world are very different from those of the macro-world. Adhesive forces are significant compared to gravitational forces when micro-scale objects are manipulated. Manipulation is mainly conducted by automatic control relying on haptic feedback provided by force sensors. The validated architecture is a hierarchical layered hybrid architecture, including a reactive layer and a planner layer. The implementation of the architecture is modular, and the architecture has a lot in common with open architectures. Further, the architecture is extensible, scalable, portable and it enables reuse of modules. These are the qualities that we validate in this paper. To demonstrate the claimed features, we present different applications that require special control in micrometer, millimeter and centimeter scales. These applications include a device that measures cell adhesion, a device that examines properties of thin films, a device that measures adhesion of micro fibers and a device that examines properties of submerged gel produced by bacteria. Finally, we analyze how the architecture is used in these applications.
Performance evaluation of OpenFOAM on many-core architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz
In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.
Palomar, Esther; Chen, Xiaohong; Liu, Zhiming; Maharjan, Sabita; Bowen, Jonathan
2016-10-28
Smart city systems embrace major challenges associated with climate change, energy efficiency, mobility and future services by embedding the virtual space into a complex cyber-physical system. Those systems are constantly evolving and scaling up, involving a wide range of integration among users, devices, utilities, public services and also policies. Modelling such complex dynamic systems' architectures has always been essential for the development and application of techniques/tools to support design and deployment of integration of new components, as well as for the analysis, verification, simulation and testing to ensure trustworthiness. This article reports on the definition and implementation of a scalable component-based architecture that supports a cooperative energy demand response (DR) system coordinating energy usage between neighbouring households. The proposed architecture, called refinement of Cyber-Physical Component Systems (rCPCS), which extends the refinement calculus for component and object system (rCOS) modelling method, is implemented using Eclipse Extensible Coordination Tools (ECT), i.e., Reo coordination language. With rCPCS implementation in Reo, we specify the communication, synchronisation and co-operation amongst the heterogeneous components of the system assuring, by design scalability and the interoperability, correctness of component cooperation.
Palomar, Esther; Chen, Xiaohong; Liu, Zhiming; Maharjan, Sabita; Bowen, Jonathan
2016-01-01
Smart city systems embrace major challenges associated with climate change, energy efficiency, mobility and future services by embedding the virtual space into a complex cyber-physical system. Those systems are constantly evolving and scaling up, involving a wide range of integration among users, devices, utilities, public services and also policies. Modelling such complex dynamic systems’ architectures has always been essential for the development and application of techniques/tools to support design and deployment of integration of new components, as well as for the analysis, verification, simulation and testing to ensure trustworthiness. This article reports on the definition and implementation of a scalable component-based architecture that supports a cooperative energy demand response (DR) system coordinating energy usage between neighbouring households. The proposed architecture, called refinement of Cyber-Physical Component Systems (rCPCS), which extends the refinement calculus for component and object system (rCOS) modelling method, is implemented using Eclipse Extensible Coordination Tools (ECT), i.e., Reo coordination language. With rCPCS implementation in Reo, we specify the communication, synchronisation and co-operation amongst the heterogeneous components of the system assuring, by design scalability and the interoperability, correctness of component cooperation. PMID:27801829
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)
1997-01-01
The Whitney project is integrating commodity off-the-shelf PC hardware and software technology to build a parallel supercomputer with hundreds to thousands of nodes. To build such a system, one must have a scalable software model, and the installation and maintenance of the system software must be completely automated. We describe the design of an architecture for booting, installing, and configuring nodes in such a system with particular consideration given to scalability and ease of maintenance. This system has been implemented on a 40-node prototype of Whitney and is to be used on the 500 processor Whitney system to be built in 1998.
Medusa: A Scalable MR Console Using USB
Stang, Pascal P.; Conolly, Steven M.; Santos, Juan M.; Pauly, John M.; Scott, Greig C.
2012-01-01
MRI pulse sequence consoles typically employ closed proprietary hardware, software, and interfaces, making difficult any adaptation for innovative experimental technology. Yet MRI systems research is trending to higher channel count receivers, transmitters, gradient/shims, and unique interfaces for interventional applications. Customized console designs are now feasible for researchers with modern electronic components, but high data rates, synchronization, scalability, and cost present important challenges. Implementing large multi-channel MR systems with efficiency and flexibility requires a scalable modular architecture. With Medusa, we propose an open system architecture using the Universal Serial Bus (USB) for scalability, combined with distributed processing and buffering to address the high data rates and strict synchronization required by multi-channel MRI. Medusa uses a modular design concept based on digital synthesizer, receiver, and gradient blocks, in conjunction with fast programmable logic for sampling and synchronization. Medusa is a form of synthetic instrument, being reconfigurable for a variety of medical/scientific instrumentation needs. The Medusa distributed architecture, scalability, and data bandwidth limits are presented, and its flexibility is demonstrated in a variety of novel MRI applications. PMID:21954200
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Distributed numerical controllers
NASA Astrophysics Data System (ADS)
Orban, Peter E.
2001-12-01
While the basic principles of Numerical Controllers (NC) have not changed much during the years, the implementation of NCs' has changed tremendously. NC equipment has evolved from yesterday's hard-wired specialty control apparatus to today's graphics intensive, networked, increasingly PC based open systems, controlling a wide variety of industrial equipment with positioning needs. One of the newest trends in NC technology is the distributed implementation of the controllers. Distributed implementation promises to offer robustness, lower implementation costs, and a scalable architecture. Historically partitioning has been done along the hierarchical levels, moving individual modules into self contained units. The paper discusses various NC architectures, the underlying technology for distributed implementation, and relevant design issues. First the functional requirements of individual NC modules are analyzed. Module functionality, cycle times, and data requirements are examined. Next the infrastructure for distributed node implementation is reviewed. Various communication protocols and distributed real-time operating system issues are investigated and compared. Finally, a different, vertical system partitioning, offering true scalability and reconfigurability is presented.
Reconfigurable firmware-defined radios synthesized from standard digital logic cells
NASA Astrophysics Data System (ADS)
Faisal, Muhammad; Park, Youngmin; Wentzloff, David D.
2011-06-01
This paper presents recent work on reconfigurable all-digital radio architectures. We leverage the flexibility and scalability of synthesized digital cells to construct reconfigurable radio architectures that consume significantly less power than a software defined radio implementing similar architectures. We present two prototypes of such architectures that can receive and demodulate FM and FRS band signals. Moreover, a radio architecture based on a reconfigurable alldigital phase-locked loop for coherent demodulation is presented.
Ultra-Dense Quantum Communication Using Integrated Photonic Architecture: First Annual Report
2011-08-24
REPORT Ultra-Dense Quantum Communication Using Integrated Photonic Architecture: First Annual Report 14. ABSTRACT 16. SECURITY CLASSIFICATION OF: The...goal of this program is to establish a fundamental information-theoretic understand of quantum secure communication and to devise a practical...scalable implementation of quantum key distribution protocols in an integrated photonic architecture. We report our progress on experimental and
Implementation of Virtualization Oriented Architecture: A Healthcare Industry Case Study
NASA Astrophysics Data System (ADS)
Rao, G. Subrahmanya Vrk; Parthasarathi, Jinka; Karthik, Sundararaman; Rao, Gvn Appa; Ganesan, Suresh
This paper presents a Virtualization Oriented Architecture (VOA) and an implementation of VOA for Hridaya - a Telemedicine initiative. Hadoop Compute cloud was established at our labs and jobs which require a massive computing capability such as ECG signal analysis were submitted and the study is presented in this current paper. VOA takes advantage of inexpensive community PCs and provides added advantages such as Fault Tolerance, Scalability, Performance, High Availability.
Schilling, Lisa M.; Kwan, Bethany M.; Drolshagen, Charles T.; Hosokawa, Patrick W.; Brandt, Elias; Pace, Wilson D.; Uhrich, Christopher; Kamerick, Michael; Bunting, Aidan; Payne, Philip R.O.; Stephens, William E.; George, Joseph M.; Vance, Mark; Giacomini, Kelli; Braddy, Jason; Green, Mika K.; Kahn, Michael G.
2013-01-01
Introduction: Distributed Data Networks (DDNs) offer infrastructure solutions for sharing electronic health data from across disparate data sources to support comparative effectiveness research. Data sharing mechanisms must address technical and governance concerns stemming from network security and data disclosure laws and best practices, such as HIPAA. Methods: The Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) deploys TRIAD grid technology, a common data model, detailed technical documentation, and custom software for data harmonization to facilitate data sharing in collaboration with stakeholders in the care of safety net populations. Data sharing partners host TRIAD grid nodes containing harmonized clinical data within their internal or hosted network environments. Authorized users can use a central web-based query system to request analytic data sets. Discussion: SAFTINet DDN infrastructure achieved a number of data sharing objectives, including scalable and sustainable systems for ensuring harmonized data structures and terminologies and secure distributed queries. Initial implementation challenges were resolved through iterative discussions, development and implementation of technical documentation, governance, and technology solutions. PMID:25848567
Schilling, Lisa M; Kwan, Bethany M; Drolshagen, Charles T; Hosokawa, Patrick W; Brandt, Elias; Pace, Wilson D; Uhrich, Christopher; Kamerick, Michael; Bunting, Aidan; Payne, Philip R O; Stephens, William E; George, Joseph M; Vance, Mark; Giacomini, Kelli; Braddy, Jason; Green, Mika K; Kahn, Michael G
2013-01-01
Distributed Data Networks (DDNs) offer infrastructure solutions for sharing electronic health data from across disparate data sources to support comparative effectiveness research. Data sharing mechanisms must address technical and governance concerns stemming from network security and data disclosure laws and best practices, such as HIPAA. The Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) deploys TRIAD grid technology, a common data model, detailed technical documentation, and custom software for data harmonization to facilitate data sharing in collaboration with stakeholders in the care of safety net populations. Data sharing partners host TRIAD grid nodes containing harmonized clinical data within their internal or hosted network environments. Authorized users can use a central web-based query system to request analytic data sets. SAFTINet DDN infrastructure achieved a number of data sharing objectives, including scalable and sustainable systems for ensuring harmonized data structures and terminologies and secure distributed queries. Initial implementation challenges were resolved through iterative discussions, development and implementation of technical documentation, governance, and technology solutions.
Digital quantum simulators in a scalable architecture of hybrid spin-photon qubits
Chiesa, Alessandro; Santini, Paolo; Gerace, Dario; Raftery, James; Houck, Andrew A.; Carretta, Stefano
2015-01-01
Resolving quantum many-body problems represents one of the greatest challenges in physics and physical chemistry, due to the prohibitively large computational resources that would be required by using classical computers. A solution has been foreseen by directly simulating the time evolution through sequences of quantum gates applied to arrays of qubits, i.e. by implementing a digital quantum simulator. Superconducting circuits and resonators are emerging as an extremely promising platform for quantum computation architectures, but a digital quantum simulator proposal that is straightforwardly scalable, universal, and realizable with state-of-the-art technology is presently lacking. Here we propose a viable scheme to implement a universal quantum simulator with hybrid spin-photon qubits in an array of superconducting resonators, which is intrinsically scalable and allows for local control. As representative examples we consider the transverse-field Ising model, a spin-1 Hamiltonian, and the two-dimensional Hubbard model and we numerically simulate the scheme by including the main sources of decoherence. PMID:26563516
Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.
Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios
2016-01-01
The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.
Finding idle machines in a workstation-based distributed system
NASA Technical Reports Server (NTRS)
Theimer, Marvin M.; Lantz, Keith A.
1989-01-01
The authors describe the design and performance of scheduling facilities for finding idle hosts in a workstation-based distributed system. They focus on the tradeoffs between centralized and decentralized architectures with respect to scalability, fault tolerance, and simplicity of design, as well as several implementation issues of interest when multicast communication is used. They conclude that the principal tradeoff between the two approaches is that a centralized architecture can be scaled to a significantly greater degree and can more easily monitor global system statistics, whereas a decentralized architecture is simpler to implement.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Srinivasa, Narayan; Zhang, Deying; Grigorian, Beayna
2014-03-01
This paper describes a novel architecture for enabling robust and efficient neuromorphic communication. The architecture combines two concepts: 1) synaptic time multiplexing (STM) that trades space for speed of processing to create an intragroup communication approach that is firing rate independent and offers more flexibility in connectivity than cross-bar architectures and 2) a wired multiple input multiple output (MIMO) communication with orthogonal frequency division multiplexing (OFDM) techniques to enable a robust and efficient intergroup communication for neuromorphic systems. The MIMO-OFDM concept for the proposed architecture was analyzed by simulating large-scale spiking neural network architecture. Analysis shows that the neuromorphic system with MIMO-OFDM exhibits robust and efficient communication while operating in real time with a high bit rate. Through combining STM with MIMO-OFDM techniques, the resulting system offers a flexible and scalable connectivity as well as a power and area efficient solution for the implementation of very large-scale spiking neural architectures in hardware.
Scalable service architecture for providing strong service guarantees
NASA Astrophysics Data System (ADS)
Christin, Nicolas; Liebeherr, Joerg
2002-07-01
For the past decade, a lot of Internet research has been devoted to providing different levels of service to applications. Initial proposals for service differentiation provided strong service guarantees, with strict bounds on delays, loss rates, and throughput, but required high overhead in terms of computational complexity and memory, both of which raise scalability concerns. Recently, the interest has shifted to service architectures with low overhead. However, these newer service architectures only provide weak service guarantees, which do not always address the needs of applications. In this paper, we describe a service architecture that supports strong service guarantees, can be implemented with low computational complexity, and only requires to maintain little state information. A key mechanism of the proposed service architecture is that it addresses scheduling and buffer management in a single algorithm. The presented architecture offers no solution for controlling the amount of traffic that enters the network. Instead, we plan on exploiting feedback mechanisms of TCP congestion control algorithms for the purpose of regulating the traffic entering the network.
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Bisson, M.; Salvadore, F.
2014-10-01
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2005-12-02
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.
Polymorphous Computing Architectures
2007-12-12
provide a multiprocessor implementation. In this work, we introduce the Atomos transactional programming language, which is the first to include...implicit transactions, strong atomicity, and a scalable multiprocessor implementation [47]. Atomos is derived from Java, but replaces its synchronization...and conditional waiting constructs with transactional alternatives. The Atomos conditional waiting proposal is tailored to allow efficient
The Quantum Socket: Wiring for Superconducting Qubits - Part 3
NASA Astrophysics Data System (ADS)
Mariantoni, M.; Bejianin, J. H.; McConkey, T. G.; Rinehart, J. R.; Bateman, J. D.; Earnest, C. T.; McRae, C. H.; Rohanizadegan, Y.; Shiri, D.; Penava, B.; Breul, P.; Royak, S.; Zapatka, M.; Fowler, A. G.
The implementation of a quantum computer requires quantum error correction codes, which allow to correct errors occurring on physical quantum bits (qubits). Ensemble of physical qubits will be grouped to form a logical qubit with a lower error rate. Reaching low error rates will necessitate a large number of physical qubits. Thus, a scalable qubit architecture must be developed. Superconducting qubits have been used to realize error correction. However, a truly scalable qubit architecture has yet to be demonstrated. A critical step towards scalability is the realization of a wiring method that allows to address qubits densely and accurately. A quantum socket that serves this purpose has been designed and tested at microwave frequencies. In this talk, we show results where the socket is used at millikelvin temperatures to measure an on-chip superconducting resonator. The control electronics is another fundamental element for scalability. We will present a proposal based on the quantum socket to interconnect a classical control hardware to a superconducting qubit hardware, where both are operated at millikelvin temperatures.
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
Quantum Computing Architectural Design
NASA Astrophysics Data System (ADS)
West, Jacob; Simms, Geoffrey; Gyure, Mark
2006-03-01
Large scale quantum computers will invariably require scalable architectures in addition to high fidelity gate operations. Quantum computing architectural design (QCAD) addresses the problems of actually implementing fault-tolerant algorithms given physical and architectural constraints beyond those of basic gate-level fidelity. Here we introduce a unified framework for QCAD that enables the scientist to study the impact of varying error correction schemes, architectural parameters including layout and scheduling, and physical operations native to a given architecture. Our software package, aptly named QCAD, provides compilation, manipulation/transformation, multi-paradigm simulation, and visualization tools. We demonstrate various features of the QCAD software package through several examples.
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
A distributed infrastructure for publishing VO services: an implementation
NASA Astrophysics Data System (ADS)
Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo
2016-07-01
This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.
An Element-Based Concurrent Partitioner for Unstructured Finite Element Meshes
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
A concurrent partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The partitioner uses an element-based partitioning strategy. Its main advantage over the more conventional node-based partitioning strategy is its modular programming approach to the development of parallel applications. The partitioner first partitions element centroids using a recursive inertial bisection algorithm. Elements and nodes then migrate according to the partitioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a non-scalable implementation which is a straightforward parallelization of a sequential partitioner.
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Urrios, Arturo; de Nadal, Eulàlia; Solé, Ricard; Posas, Francesc
2016-01-01
Engineered synthetic biological devices have been designed to perform a variety of functions from sensing molecules and bioremediation to energy production and biomedicine. Notwithstanding, a major limitation of in vivo circuit implementation is the constraint associated to the use of standard methodologies for circuit design. Thus, future success of these devices depends on obtaining circuits with scalable complexity and reusable parts. Here we show how to build complex computational devices using multicellular consortia and space as key computational elements. This spatial modular design grants scalability since its general architecture is independent of the circuit’s complexity, minimizes wiring requirements and allows component reusability with minimal genetic engineering. The potential use of this approach is demonstrated by implementation of complex logical functions with up to six inputs, thus demonstrating the scalability and flexibility of this method. The potential implications of our results are outlined. PMID:26829588
NASA Astrophysics Data System (ADS)
Park, Soomyung; Joo, Seong-Soon; Yae, Byung-Ho; Lee, Jong-Hyun
2002-07-01
In this paper, we present the Optical Cross-Connect (OXC) Management Control System Architecture, which has the scalability and robust maintenance and provides the distributed managing environment in the optical transport network. The OXC system we are developing, which is divided into the hardware and the internal and external software for the OXC system, is made up the OXC subsystem with the Optical Transport Network (OTN) sub layers-hardware and the optical switch control system, the signaling control protocol subsystem performing the User-to-Network Interface (UNI) and Network-to-Network Interface (NNI) signaling control, the Operation Administration Maintenance & Provisioning (OAM&P) subsystem, and the network management subsystem. And the OXC management control system has the features that can support the flexible expansion of the optical transport network, provide the connectivity to heterogeneous external network elements, be added or deleted without interrupting OAM&P services, be remotely operated, provide the global view and detail information for network planner and operator, and have Common Object Request Broker Architecture (CORBA) based the open system architecture adding and deleting the intelligent service networking functions easily in future. To meet these considerations, we adopt the object oriented development method in the whole developing steps of the system analysis, design, and implementation to build the OXC management control system with the scalability, the maintenance, and the distributed managing environment. As a consequently, the componentification for the OXC operation management functions of each subsystem makes the robust maintenance, and increases code reusability. Also, the component based OXC management control system architecture will have the flexibility and scalability in nature.
McEwan, Reed; Melton, Genevieve B; Knoll, Benjamin C; Wang, Yan; Hultman, Gretchen; Dale, Justin L; Meyer, Tim; Pakhomov, Serguei V
2016-01-01
Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota.
A high performance parallel computing architecture for robust image features
NASA Astrophysics Data System (ADS)
Zhou, Renyan; Liu, Leibo; Wei, Shaojun
2014-03-01
A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Proton beam therapy control system
Baumann, Michael A [Riverside, CA; Beloussov, Alexandre V [Bernardino, CA; Bakir, Julide [Alta Loma, CA; Armon, Deganit [Redlands, CA; Olsen, Howard B [Colton, CA; Salem, Dana [Riverside, CA
2008-07-08
A tiered communications architecture for managing network traffic in a distributed system. Communication between client or control computers and a plurality of hardware devices is administered by agent and monitor devices whose activities are coordinated to reduce the number of open channels or sockets. The communications architecture also improves the transparency and scalability of the distributed system by reducing network mapping dependence. The architecture is desirably implemented in a proton beam therapy system to provide flexible security policies which improve patent safety and facilitate system maintenance and development.
Proton beam therapy control system
Baumann, Michael A.; Beloussov, Alexandre V.; Bakir, Julide; Armon, Deganit; Olsen, Howard B.; Salem, Dana
2010-09-21
A tiered communications architecture for managing network traffic in a distributed system. Communication between client or control computers and a plurality of hardware devices is administered by agent and monitor devices whose activities are coordinated to reduce the number of open channels or sockets. The communications architecture also improves the transparency and scalability of the distributed system by reducing network mapping dependence. The architecture is desirably implemented in a proton beam therapy system to provide flexible security policies which improve patent safety and facilitate system maintenance and development.
Proton beam therapy control system
Baumann, Michael A; Beloussov, Alexandre V; Bakir, Julide; Armon, Deganit; Olsen, Howard B; Salem, Dana
2013-06-25
A tiered communications architecture for managing network traffic in a distributed system. Communication between client or control computers and a plurality of hardware devices is administered by agent and monitor devices whose activities are coordinated to reduce the number of open channels or sockets. The communications architecture also improves the transparency and scalability of the distributed system by reducing network mapping dependence. The architecture is desirably implemented in a proton beam therapy system to provide flexible security policies which improve patent safety and facilitate system maintenance and development.
Proton beam therapy control system
Baumann, Michael A; Beloussov, Alexandre V; Bakir, Julide; Armon, Deganit; Olsen, Howard B; Salem, Dana
2013-12-03
A tiered communications architecture for managing network traffic in a distributed system. Communication between client or control computers and a plurality of hardware devices is administered by agent and monitor devices whose activities are coordinated to reduce the number of open channels or sockets. The communications architecture also improves the transparency and scalability of the distributed system by reducing network mapping dependence. The architecture is desirably implemented in a proton beam therapy system to provide flexible security policies which improve patent safety and facilitate system maintenance and development.
Blueprint for a microwave trapped ion quantum computer
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.
2017-01-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154
Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services
NASA Astrophysics Data System (ADS)
Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.
Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
McEwan, Reed; Melton, Genevieve B.; Knoll, Benjamin C.; Wang, Yan; Hultman, Gretchen; Dale, Justin L.; Meyer, Tim; Pakhomov, Serguei V.
2016-01-01
Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota. PMID:27570663
Framework for a clinical information system.
Van De Velde, R; Lansiers, R; Antonissen, G
2002-01-01
The design and implementation of Clinical Information System architecture is presented. This architecture has been developed and implemented based on components following a strong underlying conceptual and technological model. Common Object Request Broker and n-tier technology featuring centralised and departmental clinical information systems as the back-end store for all clinical data are used. Servers located in the "middle" tier apply the clinical (business) model and application rules. The main characteristics are the focus on modelling and reuse of both data and business logic. Scalability as well as adaptability to constantly changing requirements via component driven computing are the main reasons for that approach.
NASA Astrophysics Data System (ADS)
Bigdeli, Abbas; Biglari-Abhari, Morteza; Salcic, Zoran; Tin Lai, Yat
2006-12-01
A new pipelined systolic array-based (PSA) architecture for matrix inversion is proposed. The pipelined systolic array (PSA) architecture is suitable for FPGA implementations as it efficiently uses available resources of an FPGA. It is scalable for different matrix size and as such allows employing parameterisation that makes it suitable for customisation for application-specific needs. This new architecture has an advantage of[InlineEquation not available: see fulltext.] processing element complexity, compared to the[InlineEquation not available: see fulltext.] in other systolic array structures, where the size of the input matrix is given by[InlineEquation not available: see fulltext.]. The use of the PSA architecture for Kalman filter as an implementation example, which requires different structures for different number of states, is illustrated. The resulting precision error is analysed and shown to be negligible.
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
Torres-Huitzil, Cesar
2013-01-01
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Qubit Architecture with High Coherence and Fast Tunable Coupling.
Chen, Yu; Neill, C; Roushan, P; Leung, N; Fang, M; Barends, R; Kelly, J; Campbell, B; Chen, Z; Chiaro, B; Dunsworth, A; Jeffrey, E; Megrant, A; Mutus, J Y; O'Malley, P J J; Quintana, C M; Sank, D; Vainsencher, A; Wenner, J; White, T C; Geller, Michael R; Cleland, A N; Martinis, John M
2014-11-28
We introduce a superconducting qubit architecture that combines high-coherence qubits and tunable qubit-qubit coupling. With the ability to set the coupling to zero, we demonstrate that this architecture is protected from the frequency crowding problems that arise from fixed coupling. More importantly, the coupling can be tuned dynamically with nanosecond resolution, making this architecture a versatile platform with applications ranging from quantum logic gates to quantum simulation. We illustrate the advantages of dynamical coupling by implementing a novel adiabatic controlled-z gate, with a speed approaching that of single-qubit gates. Integrating coherence and scalable control, the introduced qubit architecture provides a promising path towards large-scale quantum computation and simulation.
Optimization of atmospheric transport models on HPC platforms
NASA Astrophysics Data System (ADS)
de la Cruz, Raúl; Folch, Arnau; Farré, Pau; Cabezas, Javier; Navarro, Nacho; Cela, José María
2016-12-01
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, memory access latency or lack of task overlapping. We investigate how different software optimizations and porting to non general-purpose hardware architectures improve code scalability and execution times considering, as an example, the FALL3D volcanic ash transport model. To this purpose, we implement the FALL3D model equations in the WARIS framework, a software designed from scratch to solve in a parallel and efficient way different geoscience problems on a wide variety of architectures. In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. Considering all these aspects together, the FALL3D execution times for a realistic test case running on general-purpose cluster architectures (Intel Sandy Bridge) decrease by a factor between 7 and 40 depending on the grid resolution. Finally, we port the application to Intel Xeon Phi (MIC) and NVIDIA GPUs (CUDA) accelerator-based architectures and compare performance, cost and power consumption on all the architectures. Implications on time-constrained operational model configurations are discussed.
Negative autoregulation matches production and demand in synthetic transcriptional networks.
Franco, Elisa; Giordano, Giulia; Forsberg, Per-Ola; Murray, Richard M
2014-08-15
We propose a negative feedback architecture that regulates activity of artificial genes, or "genelets", to meet their output downstream demand, achieving robustness with respect to uncertain open-loop output production rates. In particular, we consider the case where the outputs of two genelets interact to form a single assembled product. We show with analysis and experiments that negative autoregulation matches the production and demand of the outputs: the magnitude of the regulatory signal is proportional to the "error" between the circuit output concentration and its actual demand. This two-device system is experimentally implemented using in vitro transcriptional networks, where reactions are systematically designed by optimizing nucleic acid sequences with publicly available software packages. We build a predictive ordinary differential equation (ODE) model that captures the dynamics of the system and can be used to numerically assess the scalability of this architecture to larger sets of interconnected genes. Finally, with numerical simulations we contrast our negative autoregulation scheme with a cross-activation architecture, which is less scalable and results in slower response times.
Authentication Architecture for Region-Wide e-Health System with Smartcards and a PKI
NASA Astrophysics Data System (ADS)
Zúquete, André; Gomes, Helder; Cunha, João Paulo Silva
This paper describes the design and implementation of an e-Health authentication architecture using smartcards and a PKI. This architecture was developed to authenticate e-Health Professionals accessing the RTS (Rede Telemática da Saúde), a regional platform for sharing clinical data among a set of affiliated health institutions. The architecture had to accommodate specific RTS requirements, namely the security of Professionals' credentials, the mobility of Professionals, and the scalability to accommodate new health institutions. The adopted solution uses short-lived certificates and cross-certification agreements between RTS and e-Health institutions for authenticating Professionals accessing the RTS. These certificates carry as well the Professional's role at their home institution for role-based authorization. Trust agreements between e-Health institutions and RTS are necessary in order to make the certificates recognized by the RTS. As a proof of concept, a prototype was implemented with Windows technology. The presented authentication architecture is intended to be applied to other medical telematic systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod
2006-04-15
In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparentmore » and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5μs; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.« less
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tock, Yoav; Mandler, Benjamin; Moreira, Jose
2013-01-01
As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns be- come paramount.We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a computation lifetime. The core of our solution is a hierarchical scalable membership service provid- ing eventual consistency semantics. An attribute replication service is used for hierarchy organization, and is exposed to external applications. Our solution is based on P2P technologies and provides resiliencymore » and elastic runtime support at ultra large scales. Resulting middleware is general purpose while exploiting HPC platform unique features and architecture. We have implemented and tested this system on BlueGene/P with Linux, and using worst-case analysis, evaluated the service scalability as effective for up to 1M nodes.« less
Design and implementation of workflow engine for service-oriented architecture
NASA Astrophysics Data System (ADS)
Peng, Shuqing; Duan, Huining; Chen, Deyun
2009-04-01
As computer network is developed rapidly and in the situation of the appearance of distribution specialty in enterprise application, traditional workflow engine have some deficiencies, such as complex structure, bad stability, poor portability, little reusability and difficult maintenance. In this paper, in order to improve the stability, scalability and flexibility of workflow management system, a four-layer architecture structure of workflow engine based on SOA is put forward according to the XPDL standard of Workflow Management Coalition, the route control mechanism in control model is accomplished and the scheduling strategy of cyclic routing and acyclic routing is designed, and the workflow engine which adopts the technology such as XML, JSP, EJB and so on is implemented.
Integrated Visible Photonics for Trapped-Ion Quantum Computing
2017-06-10
necessarily reflect the views of the Department of Defense. Abstract- A scalable trapped-ion-based quantum - computing architecture requires the... Quantum Computing Dave Kharas, Cheryl Sorace-Agaskar, Suraj Bramhavar, William Loh, Jeremy M. Sage, Paul W. Juodawlkis, and John...coherence times, strong coulomb interactions, and optical addressability, hold great promise for implementation of practical quantum information
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Scalable quantum memory in the ultrastrong coupling regime.
Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C
2015-03-02
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.
Scalable quantum memory in the ultrastrong coupling regime
Kyaw, T. H.; Felicetti, S.; Romero, G.; Solano, E.; Kwek, L.-C.
2015-01-01
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances. PMID:25727251
Fan-out Estimation in Spin-based Quantum Computer Scale-up.
Nguyen, Thien; Hill, Charles D; Hollenberg, Lloyd C L; James, Matthew R
2017-10-17
Solid-state spin-based qubits offer good prospects for scaling based on their long coherence times and nexus to large-scale electronic scale-up technologies. However, high-threshold quantum error correction requires a two-dimensional qubit array operating in parallel, posing significant challenges in fabrication and control. While architectures incorporating distributed quantum control meet this challenge head-on, most designs rely on individual control and readout of all qubits with high gate densities. We analysed the fan-out routing overhead of a dedicated control line architecture, basing the analysis on a generalised solid-state spin qubit platform parameterised to encompass Coulomb confined (e.g. donor based spin qubits) or electrostatically confined (e.g. quantum dot based spin qubits) implementations. The spatial scalability under this model is estimated using standard electronic routing methods and present-day fabrication constraints. Based on reasonable assumptions for qubit control and readout we estimate 10 2 -10 5 physical qubits, depending on the quantum interconnect implementation, can be integrated and fanned-out independently. Assuming relatively long control-free interconnects the scalability can be extended. Ultimately, the universal quantum computation may necessitate a much higher number of integrated qubits, indicating that higher dimensional electronics fabrication and/or multiplexed distributed control and readout schemes may be the preferredstrategy for large-scale implementation.
Privacy-Aware Location Database Service for Granular Queries
NASA Astrophysics Data System (ADS)
Kiyomoto, Shinsaku; Martin, Keith M.; Fukushima, Kazuhide
Future mobile markets are expected to increasingly embrace location-based services. This paper presents a new system architecture for location-based services, which consists of a location database and distributed location anonymizers. The service is privacy-aware in the sense that the location database always maintains a degree of anonymity. The location database service permits three different levels of query and can thus be used to implement a wide range of location-based services. Furthermore, the architecture is scalable and employs simple functions that are similar to those found in general database systems.
Phipps, Eric T.; D'Elia, Marta; Edwards, Harold C.; ...
2017-04-18
In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in anmore » embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).« less
NASA Astrophysics Data System (ADS)
Magnoni, L.; Suthakar, U.; Cordeiro, C.; Georgiou, M.; Andreeva, J.; Khan, A.; Smith, D. R.
2015-12-01
Monitoring the WLCG infrastructure requires the gathering and analysis of a high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen increase in the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events. This paper presents a new scalable data store and analytics platform designed by the Support for Distributed Computing (SDC) group, at the CERN IT department, which uses a variety of technologies each one targeting specific aspects of big-scale distributed data-processing (commonly referred as lambda-architecture approach). Results of data processing on Hadoop for WLCG data activities monitoring are presented, showing how the new architecture can easily analyze hundreds of millions of transfer logs in a few minutes. Moreover, a comparison of data partitioning, compression and file format (e.g. CSV, Avro) is presented, with particular attention given to how the file structure impacts the overall MapReduce performance. In conclusion, the evolution of the current implementation, which focuses on data storage and batch processing, towards a complete lambda-architecture is discussed, with consideration of candidate technology for the serving layer (e.g. Elasticsearch) and a description of a proof of concept implementation, based on Apache Spark and Esper, for the real-time part which compensates for batch-processing latency and automates problem detection and failures.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications
NASA Astrophysics Data System (ADS)
Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei
2007-04-01
In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
The Solar Umbrella: A Low-cost Demonstration of Scalable Space Based Solar Power
NASA Technical Reports Server (NTRS)
Contreras, Michael T.; Trease, Brian P.; Sherwood, Brent
2013-01-01
Within the past decade, the Space Solar Power (SSP) community has seen an influx of stakeholders willing to entertain the SSP prospect of potentially boundless, base-load solar energy. Interested parties affiliated with the Department of Defense (DoD), the private sector, and various international entities have all agreed that while the benefits of SSP are tremendous and potentially profitable, the risk associated with developing an efficient end to end SSP harvesting system is still very high. In an effort to reduce the implementation risk for future SSP architectures, this study proposes a system level design that is both low-cost and seeks to demonstrate the furthest transmission of wireless power to date. The overall concept is presented and each subsystem is explained in detail with best estimates of current implementable technologies. Basic cost models were constructed based on input from JPL subject matter experts and assume that the technology demonstration would be carried out by a federally funded entity. The main thrust of the architecture is to demonstrate that a usable amount of solar power can be safely and reliably transmitted from space to the Earth's surface; however, maximum power scalability limits and their cost implications are discussed.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Hegde, Ganapathi; Vaya, Pukhraj
2013-10-01
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Globus Nexus: A Platform-as-a-Service Provider of Research Identity, Profile, and Group Management.
Chard, Kyle; Lidman, Mattias; McCollam, Brendan; Bryan, Josh; Ananthakrishnan, Rachana; Tuecke, Steven; Foster, Ian
2016-03-01
Globus Nexus is a professionally hosted Platform-as-a-Service that provides identity, profile and group management functionality for the research community. Many collaborative e-Science applications need to manage large numbers of user identities, profiles, and groups. However, developing and maintaining such capabilities is often challenging given the complexity of modern security protocols and requirements for scalable, robust, and highly available implementations. By outsourcing this functionality to Globus Nexus, developers can leverage best-practice implementations without incurring development and operations overhead. Users benefit from enhanced capabilities such as identity federation, flexible profile management, and user-oriented group management. In this paper we present Globus Nexus, describe its capabilities and architecture, summarize how several e-Science applications leverage these capabilities, and present results that characterize its scalability, reliability, and availability.
Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chard, Kyle; Lidman, Mattias; McCollam, Brendan
Globus Nexus is a professionally hosted Platform-as-a-Service that provides identity, profile and group management functionality for the research community. Many collaborative e-Science applications need to manage large numbers of user identities, profiles, and groups. However, developing and maintaining such capabilities is often challenging given the complexity of modern security protocols and requirements for scalable, robust, and highly available implementations. By outsourcing this functionality to Globus Nexus, developers can leverage best-practice implementations without incurring development and operations overhead. Users benefit from enhanced capabilities such as identity federation, flexible profile management, and user-oriented group management. In this paper we present Globus Nexus,more » describe its capabilities and architecture, summarize how several e-Science applications leverage these capabilities, and present results that characterize its scalability, reliability, and availability.« less
Globus Nexus: A Platform-as-a-Service Provider of Research Identity, Profile, and Group Management
Lidman, Mattias; McCollam, Brendan; Bryan, Josh; Ananthakrishnan, Rachana; Tuecke, Steven; Foster, Ian
2015-01-01
Globus Nexus is a professionally hosted Platform-as-a-Service that provides identity, profile and group management functionality for the research community. Many collaborative e-Science applications need to manage large numbers of user identities, profiles, and groups. However, developing and maintaining such capabilities is often challenging given the complexity of modern security protocols and requirements for scalable, robust, and highly available implementations. By outsourcing this functionality to Globus Nexus, developers can leverage best-practice implementations without incurring development and operations overhead. Users benefit from enhanced capabilities such as identity federation, flexible profile management, and user-oriented group management. In this paper we present Globus Nexus, describe its capabilities and architecture, summarize how several e-Science applications leverage these capabilities, and present results that characterize its scalability, reliability, and availability. PMID:26688598
An open, interoperable, and scalable prehospital information technology network architecture.
Landman, Adam B; Rokos, Ivan C; Burns, Kevin; Van Gelder, Carin M; Fisher, Roger M; Dunford, James V; Cone, David C; Bogucki, Sandy
2011-01-01
Some of the most intractable challenges in prehospital medicine include response time optimization, inefficiencies at the emergency medical services (EMS)-emergency department (ED) interface, and the ability to correlate field interventions with patient outcomes. Information technology (IT) can address these and other concerns by ensuring that system and patient information is received when and where it is needed, is fully integrated with prior and subsequent patient information, and is securely archived. Some EMS agencies have begun adopting information technologies, such as wireless transmission of 12-lead electrocardiograms, but few agencies have developed a comprehensive plan for management of their prehospital information and integration with other electronic medical records. This perspective article highlights the challenges and limitations of integrating IT elements without a strategic plan, and proposes an open, interoperable, and scalable prehospital information technology (PHIT) architecture. The two core components of this PHIT architecture are 1) routers with broadband network connectivity to share data between ambulance devices and EMS system information services and 2) an electronic patient care report to organize and archive all electronic prehospital data. To successfully implement this comprehensive PHIT architecture, data and technology requirements must be based on best available evidence, and the system must adhere to health data standards as well as privacy and security regulations. Recent federal legislation prioritizing health information technology may position federal agencies to help design and fund PHIT architectures.
A Ground Systems Architecture Transition for a Distributed Operations System
NASA Technical Reports Server (NTRS)
Sellers, Donna; Pitts, Lee; Bryant, Barry
2003-01-01
The Marshall Space Flight Center (MSFC) Ground Systems Department (GSD) recently undertook an architecture change in the product line that serves the ISS program. As a result, the architecture tradeoffs between data system product lines that serve remote users versus those that serve control center flight control teams were explored extensively. This paper describes the resulting architecture that will be used in the International Space Station (ISS) payloads program, and the resulting functional breakdown of the products that support this architecture. It also describes the lessons learned from the path that was followed, as a migration of products cause the need to reevaluate the allocation of functions across the architecture. The result is a set of innovative ground system solutions that is scalable so it can support facilities of wide-ranging sizes, from a small site up to large control centers. Effective use of system automation, custom components, design optimization for data management, data storage, data transmissions, and advanced local and wide area networking architectures, plus the effective use of Commercial-Off-The-Shelf (COTS) products, provides flexible Remote Ground System options that can be tailored to the needs of each user. This paper offers a description of the efficiency and effectiveness of the Ground Systems architectural options that have been implemented, and includes successful implementation examples and lessons learned.
The P-Mesh: A Commodity-based Scalable Network Architecture for Clusters
NASA Technical Reports Server (NTRS)
Nitzberg, Bill; Kuszmaul, Chris; Stockdale, Ian; Becker, Jeff; Jiang, John; Wong, Parkson; Tweten, David (Technical Monitor)
1998-01-01
We designed a new network architecture, the P-Mesh which combines the scalability and fault resilience of a torus with the performance of a switch. We compare the scalability, performance, and cost of the hub, switch, torus, tree, and P-Mesh architectures. The latter three are capable of scaling to thousands of nodes, however, the torus has severe performance limitations with that many processors. The tree and P-Mesh have similar latency, bandwidth, and bisection bandwidth, but the P-Mesh outperforms the switch architecture (a lower bound for tree performance) on 16-node NAB Parallel Benchmark tests by up to 23%, and costs 40% less. Further, the P-Mesh has better fault resilience characteristics. The P-Mesh architecture trades increased management overhead for lower cost, and is a good bridging technology while the price of tree uplinks is expensive.
A shared synapse architecture for efficient FPGA implementation of autoencoders.
Suzuki, Akihiro; Morie, Takashi; Tamukoh, Hakaru
2018-01-01
This paper proposes a shared synapse architecture for autoencoders (AEs), and implements an AE with the proposed architecture as a digital circuit on a field-programmable gate array (FPGA). In the proposed architecture, the values of the synapse weights are shared between the synapses of an input and a hidden layer, and between the synapses of a hidden and an output layer. This architecture utilizes less of the limited resources of an FPGA than an architecture which does not share the synapse weights, and reduces the amount of synapse modules used by half. For the proposed circuit to be implemented into various types of AEs, it utilizes three kinds of parameters; one to change the number of layers' units, one to change the bit width of an internal value, and a learning rate. By altering a network configuration using these parameters, the proposed architecture can be used to construct a stacked AE. The proposed circuits are logically synthesized, and the number of their resources is determined. Our experimental results show that single and stacked AE circuits utilizing the proposed shared synapse architecture operate as regular AEs and as regular stacked AEs. The scalability of the proposed circuit and the relationship between the bit widths and the learning results are also determined. The clock cycles of the proposed circuits are formulated, and this formula is used to estimate the theoretical performance of the circuit when the circuit is used to construct arbitrary networks.
A shared synapse architecture for efficient FPGA implementation of autoencoders
Morie, Takashi; Tamukoh, Hakaru
2018-01-01
This paper proposes a shared synapse architecture for autoencoders (AEs), and implements an AE with the proposed architecture as a digital circuit on a field-programmable gate array (FPGA). In the proposed architecture, the values of the synapse weights are shared between the synapses of an input and a hidden layer, and between the synapses of a hidden and an output layer. This architecture utilizes less of the limited resources of an FPGA than an architecture which does not share the synapse weights, and reduces the amount of synapse modules used by half. For the proposed circuit to be implemented into various types of AEs, it utilizes three kinds of parameters; one to change the number of layers’ units, one to change the bit width of an internal value, and a learning rate. By altering a network configuration using these parameters, the proposed architecture can be used to construct a stacked AE. The proposed circuits are logically synthesized, and the number of their resources is determined. Our experimental results show that single and stacked AE circuits utilizing the proposed shared synapse architecture operate as regular AEs and as regular stacked AEs. The scalability of the proposed circuit and the relationship between the bit widths and the learning results are also determined. The clock cycles of the proposed circuits are formulated, and this formula is used to estimate the theoretical performance of the circuit when the circuit is used to construct arbitrary networks. PMID:29543909
NASA Astrophysics Data System (ADS)
Antonelli, Charles J.; Honeyman, Peter
2001-02-01
This paper describes the Advanced Packet Vault, a technology for creating such a record by collecting and securely storing all packets observed on a network, with a scalable architecture intended to support network speeds in excess of 100 Mbps. Encryption is used to preserve users' security and privacy, permitting selected traffic to be made available without revealing other traffic. The Vault implementation, based on Linux and OpenBSD, is open-source.
Scalable, efficient ASICS for the square kilometre array: From A/D conversion to central correlation
NASA Astrophysics Data System (ADS)
Schmatz, M. L.; Jongerius, R.; Dittmann, G.; Anghel, A.; Engbersen, T.; van Lunteren, J.; Buchmann, P.
2014-05-01
The Square Kilometre Array (SKA) is a future radio telescope, currently being designed by the worldwide radio-astronomy community. During the first of two construction phases, more than 250,000 antennas will be deployed, clustered in aperture-array stations. The antennas will generate 2.5 Pb/s of data, which needs to be processed in real time. For the processing stages from A/D conversion to central correlation, we propose an ASIC solution using only three chip architectures. The architecture is scalable - additional chips support additional antennas or beams - and versatile - it can relocate its receiver band within a range of a few MHz up to 4GHz. This flexibility makes it applicable to both SKA phases 1 and 2. The proposed chips implement an antenna and station processor for 289 antennas with a power consumption on the order of 600W and a correlator, including corner turn, for 911 stations on the order of 90 kW.
UPM: unified policy-based network management
NASA Astrophysics Data System (ADS)
Law, Eddie; Saxena, Achint
2001-07-01
Besides providing network management to the Internet, it has become essential to offer different Quality of Service (QoS) to users. Policy-based management provides control on network routers to achieve this goal. The Internet Engineering Task Force (IETF) has proposed a two-tier architecture whose implementation is based on the Common Open Policy Service (COPS) protocol and Lightweight Directory Access Protocol (LDAP). However, there are several limitations to this design such as scalability and cross-vendor hardware compatibility. To address these issues, we present a functionally enhanced multi-tier policy management architecture design in this paper. Several extensions are introduced thereby adding flexibility and scalability. In particular, an intermediate entity between the policy server and policy rule database called the Policy Enforcement Agent (PEA) is introduced. By keeping internal data in a common format, using a standard protocol, and by interpreting and translating request and decision messages from multi-vendor hardware, this agent allows a dynamic Unified Information Model throughout the architecture. We have tailor-made this unique information system to save policy rules in the directory server and allow executions of policy rules with dynamic addition of new equipment during run-time.
Capella, Juan V.; Perles, Angel; Bonastre, Alberto; Serrano, Juan J.
2011-01-01
We present a set of novel low power wireless sensor nodes designed for monitoring wooden masterpieces and historical buildings, in order to perform an early detection of pests. Although our previous star-based system configuration has been in operation for more than 13 years, it does not scale well for sensorization of large buildings or when deploying hundreds of nodes. In this paper we demonstrate the feasibility of a cluster-based dynamic-tree hierarchical Wireless Sensor Network (WSN) architecture where realistic assumptions of radio frequency data transmission are applied to cluster construction, and a mix of heterogeneous nodes are used to minimize economic cost of the whole system and maximize power saving of the leaf nodes. Simulation results show that the specialization of a fraction of the nodes by providing better antennas and some energy harvesting techniques can dramatically extend the life of the entire WSN and reduce the cost of the whole system. A demonstration of the proposed architecture with a new routing protocol and applied to termite pest detection has been implemented on a set of new nodes and should last for about 10 years, but it provides better scalability, reliability and deployment properties. PMID:22346630
Capella, Juan V; Perles, Angel; Bonastre, Alberto; Serrano, Juan J
2011-01-01
We present a set of novel low power wireless sensor nodes designed for monitoring wooden masterpieces and historical buildings, in order to perform an early detection of pests. Although our previous star-based system configuration has been in operation for more than 13 years, it does not scale well for sensorization of large buildings or when deploying hundreds of nodes. In this paper we demonstrate the feasibility of a cluster-based dynamic-tree hierarchical Wireless Sensor Network (WSN) architecture where realistic assumptions of radio frequency data transmission are applied to cluster construction, and a mix of heterogeneous nodes are used to minimize economic cost of the whole system and maximize power saving of the leaf nodes. Simulation results show that the specialization of a fraction of the nodes by providing better antennas and some energy harvesting techniques can dramatically extend the life of the entire WSN and reduce the cost of the whole system. A demonstration of the proposed architecture with a new routing protocol and applied to termite pest detection has been implemented on a set of new nodes and should last for about 10 years, but it provides better scalability, reliability and deployment properties.
Inherent polarization entanglement generated from a monolithic semiconductor chip
Horn, Rolf T.; Kolenderski, Piotr; Kang, Dongpeng; Abolghasem, Payam; Scarcella, Carmelo; Frera, Adriano Della; Tosi, Alberto; Helt, Lukas G.; Zhukovsky, Sergei V.; Sipe, J. E.; Weihs, Gregor; Helmy, Amr S.; Jennewein, Thomas
2013-01-01
Creating miniature chip scale implementations of optical quantum information protocols is a dream for many in the quantum optics community. This is largely because of the promise of stability and scalability. Here we present a monolithically integratable chip architecture upon which is built a photonic device primitive called a Bragg reflection waveguide (BRW). Implemented in gallium arsenide, we show that, via the process of spontaneous parametric down conversion, the BRW is capable of directly producing polarization entangled photons without additional path difference compensation, spectral filtering or post-selection. After splitting the twin-photons immediately after they emerge from the chip, we perform a variety of correlation tests on the photon pairs and show non-classical behaviour in their polarization. Combined with the BRW's versatile architecture our results signify the BRW design as a serious contender on which to build large scale implementations of optical quantum processing devices. PMID:23896982
A software methodology for compiling quantum programs
NASA Astrophysics Data System (ADS)
Häner, Thomas; Steiger, Damian S.; Svore, Krysta; Troyer, Matthias
2018-04-01
Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.
Molecular nanomagnets with switchable coupling for quantum simulation
Chiesa, Alessandro; Whitehead, George F. S.; Carretta, Stefano; ...
2014-12-11
Molecular nanomagnets are attractive candidate qubits because of their wide inter- and intra-molecular tunability. Uniform magnetic pulses could be exploited to implement one- and two-qubit gates in presence of a properly engineered pattern of interactions, but the synthesis of suitable and potentially scalable supramolecular complexes has proven a very hard task. Indeed, no quantum algorithms have ever been implemented, not even a proof-of-principle two-qubit gate. In this paper we show that the magnetic couplings in two supramolecular {Cr7Ni}-Ni-{Cr7Ni} assemblies can be chemically engineered to fit the above requisites for conditional gates with no need of local control. Microscopic parameters aremore » determined by a recently developed many-body ab-initio approach and used to simulate quantum gates. We find that these systems are optimal for proof-of-principle two-qubit experiments and can be exploited as building blocks of scalable architectures for quantum simulation.« less
Implementation of the semiclassical quantum Fourier transform in a scalable system.
Chiaverini, J; Britton, J; Leibfried, D; Knill, E; Barrett, M D; Blakestad, R B; Itano, W M; Jost, J D; Langer, C; Ozeri, R; Schaetz, T; Wineland, D J
2005-05-13
We report the implementation of the semiclassical quantum Fourier transform in a system of three beryllium ion qubits (two-level quantum systems) confined in a segmented multizone trap. The quantum Fourier transform is the crucial final step in Shor's algorithm, and it acts on a register of qubits to determine the periodicity of the quantum state's amplitudes. Because only probability amplitudes are required for this task, a more efficient semiclassical version can be used, for which only single-qubit operations conditioned on measurement outcomes are required. We apply the transform to several input states of different periodicities; the results enable the location of peaks corresponding to the original periods. This demonstration incorporates the key elements of a scalable ion-trap architecture, suggesting the future capability of applying the quantum Fourier transform to a large number of qubits as required for a useful quantum factoring algorithm.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
Large-Scale Networked Virtual Environments: Architecture and Applications
ERIC Educational Resources Information Center
Lamotte, Wim; Quax, Peter; Flerackers, Eddy
2008-01-01
Purpose: Scalability is an important research topic in the context of networked virtual environments (NVEs). This paper aims to describe the ALVIC (Architecture for Large-scale Virtual Interactive Communities) approach to NVE scalability. Design/methodology/approach: The setup and results from two case studies are shown: a 3-D learning environment…
An Approach for On-Board Software Building Blocks Cooperation and Interfaces Definition
NASA Astrophysics Data System (ADS)
Pascucci, Dario; Campolo, Giovanni; Candia, Sante; Lisio, Giovanni
2010-08-01
This paper provides an insight on the Avionic SW architecture developed by Thales Alenia Space Italy (TAS-I) to achieve structuring of the OBSW as a set of self-standing and re-usable building blocks. It is initially described the underlying framework for building blocks cooperation, which is based on ECSSE-70 packets forwarding (for services request to a building block) and standard parameters exchange for data communication. Subsequently it is discussed the high level of flexibility and scalability of the resulting architecture, reporting as example an implementation of the Failure Detection, Isolation and Recovery (FDIR) function which exploits the proposed architecture. The presented approach evolves from avionic SW architecture developed in the scope of the project PRIMA (Mult-Purpose Italian Re-configurable Platform) and has been adopted for the Sentinel-1 Avionic Software (ASW).
Scalable architecture for a room temperature solid-state quantum information processor.
Yao, N Y; Jiang, L; Gorshkov, A V; Maurer, P C; Giedke, G; Cirac, J I; Lukin, M D
2012-04-24
The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Here we propose and analyse an architecture for a scalable, solid-state quantum information processor capable of operating at room temperature. Our approach is based on recent experimental advances involving nitrogen-vacancy colour centres in diamond. In particular, we demonstrate that the multiple challenges associated with operation at ambient temperature, individual addressing at the nanoscale, strong qubit coupling, robustness against disorder and low decoherence rates can be simultaneously achieved under realistic, experimentally relevant conditions. The architecture uses a novel approach to quantum information transfer and includes a hierarchy of control at successive length scales. Moreover, it alleviates the stringent constraints currently limiting the realization of scalable quantum processors and will provide fundamental insights into the physics of non-equilibrium many-body quantum systems.
Predefined three tier business intelligence architecture in healthcare enterprise.
Wang, Meimei
2013-04-01
Business Intelligence (BI) has caused extensive concerns and widespread use in gathering, processing and analyzing data and providing enterprise users the methodology to make decisions. Different from traditional BI architecture, this paper proposes a new BI architecture, Top-Down Scalable BI architecture with defining mechanism for enterprise decision making solutions and aims at establishing a rapid, consistent, and scalable multiple applications on multiple platforms of BI mechanism. The two opposite information flows in our BI architecture offer the merits of having the high level of organizational prospects and making full use of the existing resources. We also introduced the avg-bed-waiting-time factor to evaluate hospital care capacity.
Framework for a clinical information system.
Van de Velde, R
2000-01-01
The current status of our work towards the design and implementation of a reference architecture for a Clinical Information System is presented. This architecture has been developed and implemented based on components following a strong underlying conceptual and technological model. Common Object Request Broker and n-tier technology featuring centralised and departmental clinical information systems as the back-end store for all clinical data are used. Servers located in the 'middle' tier apply the clinical (business) model and application rules to communicate with so-called 'thin client' workstations. The main characteristics are the focus on modelling and reuse of both data and business logic as there is a shift away from data and functional modelling towards object modelling. Scalability as well as adaptability to constantly changing requirements via component driven computing are the main reasons for that approach.
Programming time-multiplexed reconfigurable hardware using a scalable neuromorphic compiler.
Minkovich, Kirill; Srinivasa, Narayan; Cruz-Albrecht, Jose M; Cho, Youngkwan; Nogin, Aleksey
2012-06-01
Scalability and connectivity are two key challenges in designing neuromorphic hardware that can match biological levels. In this paper, we describe a neuromorphic system architecture design that addresses an approach to meet these challenges using traditional complementary metal-oxide-semiconductor (CMOS) hardware. A key requirement in realizing such neural architectures in hardware is the ability to automatically configure the hardware to emulate any neural architecture or model. The focus for this paper is to describe the details of such a programmable front-end. This programmable front-end is composed of a neuromorphic compiler and a digital memory, and is designed based on the concept of synaptic time-multiplexing (STM). The neuromorphic compiler automatically translates any given neural architecture to hardware switch states and these states are stored in digital memory to enable desired neural architectures. STM enables our proposed architecture to address scalability and connectivity using traditional CMOS hardware. We describe the details of the proposed design and the programmable front-end, and provide examples to illustrate its capabilities. We also provide perspectives for future extensions and potential applications.
Entangling distant solid-state spins via thermal phonons
NASA Astrophysics Data System (ADS)
Cao, Puhao; Betzholz, Ralf; Zhang, Shaoliang; Cai, Jianming
2017-12-01
The implementation of quantum entangling gates between qubits is essential to achieve scalable quantum computation. Here, we propose a robust scheme to realize an entangling gate for distant solid-state spins via a mechanical oscillator in its thermal equilibrium state. By appropriate Hamiltonian engineering and usage of a protected subspace, we show that the proposed scheme is able to significantly reduce the thermal effect of the mechanical oscillator on the spins. In particular, we demonstrate that a high entangling gate fidelity can be achieved even for a relatively high thermal occupation. Our scheme can thus relax the requirement for ground-state cooling of the mechanical oscillator, and may find applications in scalable quantum information processing in hybrid solid-state architectures.
Chelonia: A self-healing, replicated storage system
NASA Astrophysics Data System (ADS)
Kerr Nilsen, Jon; Toor, Salman; Nagy, Zsombor; Read, Alex
2011-12-01
Chelonia is a novel grid storage system designed to fill the requirements gap between those of large, sophisticated scientific collaborations which have adopted the grid paradigm for their distributed storage needs, and of corporate business communities gravitating towards the cloud paradigm. Chelonia is an integrated system of heterogeneous, geographically dispersed storage sites which is easily and dynamically expandable and optimized for high availability and scalability. The architecture and implementation in term of web-services running inside the Advanced Resource Connector Hosting Environment Dameon (ARC HED) are described and results of tests in both local -area and wide-area networks that demonstrate the fault tolerance, stability and scalability of Chelonia will be presented. In addition, example setups for production deployments for small and medium-sized VO's are described.
Christoph, J; Griebel, L; Leb, I; Engel, I; Köpcke, F; Toddenroth, D; Prokosch, H-U; Laufer, J; Marquardt, K; Sedlmayr, M
2015-01-01
The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases. The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis. Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed. Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.
The Design of a Multi-Agent NDE Inspection Qualification System
NASA Astrophysics Data System (ADS)
McLean, N.; McKenna, J. P.; Gachagan, A.; McArthur, S.; Hayward, G.
2007-03-01
A novel Multi-Agent system (MAS) for NDE inspection qualification is being developed to facilitate a scalable environment allowing integration and automation of new and existing inspection qualification tools. This paper discusses the advantages of using a MAS approach to integrate the large number of disparate NDE software tools. The design and implementation of the system architecture is described, including the development of an ontology to describe the NDE domain.
2007-09-01
behaviour based on past experience of interacting with the operator), and mobile (i.e., can move themselves from one machine to another). Edwards argues that...Sofge, D., Bugajska, M., Adams, W., Perzanowski, D., and Schultz, A. (2003). Agent-based Multimodal Interface for Dynamically Autonomous Mobile Robots...based architecture can provide a natural and scalable approach to implementing a multimodal interface to control mobile robots through dynamic
Three-Dimensional Wiring for Extensible Quantum Computing: The Quantum Socket
NASA Astrophysics Data System (ADS)
Béjanin, J. H.; McConkey, T. G.; Rinehart, J. R.; Earnest, C. T.; McRae, C. R. H.; Shiri, D.; Bateman, J. D.; Rohanizadegan, Y.; Penava, B.; Breul, P.; Royak, S.; Zapatka, M.; Fowler, A. G.; Mariantoni, M.
2016-10-01
Quantum computing architectures are on the verge of scalability, a key requirement for the implementation of a universal quantum computer. The next stage in this quest is the realization of quantum error-correction codes, which will mitigate the impact of faulty quantum information on a quantum computer. Architectures with ten or more quantum bits (qubits) have been realized using trapped ions and superconducting circuits. While these implementations are potentially scalable, true scalability will require systems engineering to combine quantum and classical hardware. One technology demanding imminent efforts is the realization of a suitable wiring method for the control and the measurement of a large number of qubits. In this work, we introduce an interconnect solution for solid-state qubits: the quantum socket. The quantum socket fully exploits the third dimension to connect classical electronics to qubits with higher density and better performance than two-dimensional methods based on wire bonding. The quantum socket is based on spring-mounted microwires—the three-dimensional wires—that push directly on a microfabricated chip, making electrical contact. A small wire cross section (approximately 1 mm), nearly nonmagnetic components, and functionality at low temperatures make the quantum socket ideal for operating solid-state qubits. The wires have a coaxial geometry and operate over a frequency range from dc to 8 GHz, with a contact resistance of approximately 150 m Ω , an impedance mismatch of approximately 10 Ω , and minimal cross talk. As a proof of principle, we fabricate and use a quantum socket to measure high-quality superconducting resonators at a temperature of approximately 10 mK. Quantum error-correction codes such as the surface code will largely benefit from the quantum socket, which will make it possible to address qubits located on a two-dimensional lattice. The present implementation of the socket could be readily extended to accommodate a quantum processor with a (10 ×10 )-qubit lattice, which would allow for the realization of a simple quantum memory.
Fully programmable and scalable optical switching fabric for petabyte data center.
Zhu, Zhonghua; Zhong, Shan; Chen, Li; Chen, Kai
2015-02-09
We present a converged EPS and OCS switching fabric for data center networks (DCNs) based on a distributed optical switching architecture leveraging both WDM & SDM technologies. The architecture is topology adaptive, well suited to dynamic and diverse *-cast traffic patterns. Compared to a typical folded-Clos network, the new architecture is more readily scalable to future multi-Petabyte data centers with 1000 + racks while providing a higher link bandwidth, reducing transceiver count by 50%, and improving cabling efficiency by more than 90%.
FPGA implementation of a configurable neuromorphic CPG-based locomotion controller.
Barron-Zambrano, Jose Hugo; Torres-Huitzil, Cesar
2013-09-01
Neuromorphic engineering is a discipline devoted to the design and development of computational hardware that mimics the characteristics and capabilities of neuro-biological systems. In recent years, neuromorphic hardware systems have been implemented using a hybrid approach incorporating digital hardware so as to provide flexibility and scalability at the cost of power efficiency and some biological realism. This paper proposes an FPGA-based neuromorphic-like embedded system on a chip to generate locomotion patterns of periodic rhythmic movements inspired by Central Pattern Generators (CPGs). The proposed implementation follows a top-down approach where modularity and hierarchy are two desirable features. The locomotion controller is based on CPG models to produce rhythmic locomotion patterns or gaits for legged robots such as quadrupeds and hexapods. The architecture is configurable and scalable for robots with either different morphologies or different degrees of freedom (DOFs). Experiments performed on a real robot are presented and discussed. The obtained results demonstrate that the CPG-based controller provides the necessary flexibility to generate different rhythmic patterns at run-time suitable for adaptable locomotion. Copyright © 2013 Elsevier Ltd. All rights reserved.
Distributed controller clustering in software defined networks.
Abdelaziz, Ahmed; Fong, Ang Tan; Gani, Abdullah; Garba, Usman; Khan, Suleman; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond
2017-01-01
Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability.
NASA Astrophysics Data System (ADS)
Sabeur, Z. A.; Wächter, J.; Middleton, S. E.; Zlatev, Z.; Häner, R.; Hammitzsch, M.; Loewe, P.
2012-04-01
The intelligent management of large volumes of environmental monitoring data for early tsunami warning requires the deployment of robust and scalable service oriented infrastructure that is supported by an agile knowledge-base for critical decision-support In the TRIDEC project (TRIDEC 2010-2013), a sensor observation service bus of the TRIDEC system is being developed for the advancement of complex tsunami event processing and management. Further, a dedicated TRIDEC system knowledge-base is being implemented to enable on-demand access to semantically rich OGC SWE compliant hydrodynamic observations and operationally oriented meta-information to multiple subscribers. TRIDEC decision support requires a scalable and agile real-time processing architecture which enables fast response to evolving subscribers requirements as the tsunami crisis develops. This is also achieved with the support of intelligent processing services which specialise in multi-level fusion methods with relevance feedback and deep learning. The TRIDEC knowledge base development work coupled with that of the generic sensor bus platform shall be presented to demonstrate advanced decision-support with situation awareness in context of tsunami early warning and crisis management.
Scalable boson sampling with time-bin encoding using a loop-based architecture.
Motes, Keith R; Gilchrist, Alexei; Dowling, Jonathan P; Rohde, Peter P
2014-09-19
We present an architecture for arbitrarily scalable boson sampling using two nested fiber loops. The architecture has fixed experimental complexity, irrespective of the size of the desired interferometer, whose scale is limited only by fiber and switch loss rates. The architecture employs time-bin encoding, whereby the incident photons form a pulse train, which enters the loops. Dynamically controlled loop coupling ratios allow the construction of the arbitrary linear optics interferometers required for boson sampling. The architecture employs only a single point of interference and may thus be easier to stabilize than other approaches. The scheme has polynomial complexity and could be realized using demonstrated present-day technologies.
ePix: a class of architectures for second generation LCLS cameras
Dragone, A.; Caragiulo, P.; Markovic, B.; ...
2014-03-31
ePix is a novel class of ASIC architectures, based on a common platform, optimized to build modular scalable detectors for LCLS. The platform architecture is composed of a random access analog matrix of pixel with global shutter, fast parallel column readout, and dedicated sigma-delta analog-to-digital converters per column. It also implements a dedicated control interface and all the required support electronics to perform configuration, calibration and readout of the matrix. Based on this platform a class of front-end ASICs and several camera modules, meeting different requirements, can be developed by designing specific pixel architectures. This approach reduces development time andmore » expands the possibility of integration of detector modules with different size, shape or functionality in the same camera. The ePix platform is currently under development together with the first two integrating pixel architectures: ePix100 dedicated to ultra low noise applications and ePix10k for high dynamic range applications.« less
High-performance, scalable optical network-on-chip architectures
NASA Astrophysics Data System (ADS)
Tan, Xianfang
The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of GWOR in optical communication and BFT in non-uniform traffic communication and three-dimension (3D) implementation. 5. A cycle-accurate NoC simulator is developed to evaluate the performance of proposed HONoC architectures. It is a comprehensive platform that can simulate both electronic and optical NoCs. Different size HONoC architectures are evaluated in terms of throughput, latency and energy dissipation. Simulation results confirm that HONoC achieves good network performance with lower power consumption.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
GOES-R GS Product Generation Infrastructure Operations
NASA Astrophysics Data System (ADS)
Blanton, M.; Gundy, J.
2012-12-01
GOES-R GS Product Generation Infrastructure Operations: The GOES-R Ground System (GS) will produce a much larger set of products with higher data density than previous GOES systems. This requires considerably greater compute and memory resources to achieve the necessary latency and availability for these products. Over time, new algorithms could be added and existing ones removed or updated, but the GOES-R GS cannot go down during this time. To meet these GOES-R GS processing needs, the Harris Corporation will implement a Product Generation (PG) infrastructure that is scalable, extensible, extendable, modular and reliable. The primary parts of the PG infrastructure are the Service Based Architecture (SBA), which includes the Distributed Data Fabric (DDF). The SBA is the middleware that encapsulates and manages science algorithms that generate products. The SBA is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. The SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DDF to provide this data communication layer between algorithms. The DDF provides an abstract interface over a distributed and persistent multi-layered storage system (memory based caching above disk-based storage) and an event system that allows algorithm services to know when data is available and to get the data that they need to begin processing when they need it. Together, the SBA and the DDF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pruitt, Spencer R.; Nakata, Hiroya; Nagata, Takeshi
2016-04-12
The analytic first derivative with respect to nuclear coordinates is formulated and implemented in the framework of the three-body fragment molecular orbital (FMO) method. The gradient has been derived and implemented for restricted Hartree-Fock, second-order Møller-Plesset perturbation, and density functional theories. The importance of the three-body fully analytic gradient is illustrated through the failure of the two-body FMO method during molecular dynamics simulations of a small water cluster. The parallel implementation of the fragment molecular orbital method, its parallel efficiency, and its scalability on the Blue Gene/Q architecture up to 262,144 CPU cores, are also discussed.
NASA Astrophysics Data System (ADS)
Lisio, Giovanni; Candia, Sante; Campolo, Giovanni; Pascucci, Dario
2011-08-01
Thales Alenia Space Italy has carried out the definition of a configurable (on mission basis) PUS ECSS-E_70- 41A see [3] Centralised Services Layer, characterised by:- a mission-independent set of 'classes' implementing the services logic.- a mission-dependent set of configuration data and selection flags.The software components belonging to this layer implement the PUS standard services ECSS-E_70-41A and a set of mission-specific services. The design of this layer has been performed by separating the services mechanisms (mission-independent execution logic) from the services configuration information (mission-dependent data). Once instantiated for a specific mission, the PUS Centralised Services Layer offers a large set of capabilities available to the CSCI's Applications Layer. This paper describes the building blocks PUS architectural solution developed by Thales Alenia Space Italy, emphasizing the mechanisms which allow easy configuration of the Scalable PUS library to fulfill the requirements of different missions. This paper also focus the Thales Alenia Space solution to automatically generate the mission-specific "PUS Services" flight software based on mission specific requirements. Building the PUS services mechanisms, which are configurable on mission basis is part of the PRIMA (Multipurpose Spacecraft Bus ) 'missionisation' process improvement. PRIMA Platform Avionics Software (ASW) is continuously evolving to improve modularity and standardization of interfaces and of SW components (see references in [1]).
Unstructured Adaptive Grid Computations on an Array of SMPs
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.
1996-01-01
Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
A remote instruction system empowered by tightly shared haptic sensation
NASA Astrophysics Data System (ADS)
Nishino, Hiroaki; Yamaguchi, Akira; Kagawa, Tsuneo; Utsumiya, Kouichi
2007-09-01
We present a system to realize an on-line instruction environment among physically separated participants based on a multi-modal communication strategy. In addition to visual and acoustic information, commonly used communication modalities in network environments, our system provides a haptic channel to intuitively conveying partners' sense of touch. The human touch sensation, however, is very sensitive for delays and jitters in the networked virtual reality (NVR) systems. Therefore, a method to compensate for such negative factors needs to be provided. We show an NVR architecture to implement a basic framework that can be shared by various applications and effectively deals with the problems. We take a hybrid approach to implement both data consistency by client-server and scalability by peer-to-peer models. As an application system built on the proposed architecture, a remote instruction system targeted at teaching handwritten characters and line patterns on a Korea-Japan high-speed research network also is mentioned.
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer
1997-01-01
A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
NASA Technical Reports Server (NTRS)
Younes, Badri A.; Schier, James S.
2010-01-01
The SCaN Program has defined an integrated network architecture that fully meets the Administrator s mandate to the Program, and will result in a NASA infrastructure capable of providing the needed and enabling communications services to future space missions. The integrated network architecture will increase SCaN operational efficiency and interoperability through standardization, commonality and technology infusion. It will enable NASA missions requiring advanced communication and tracking capabilities such as: a. Optical communication b. Antenna arraying c. Lunar and Mars Relays d. Integrated network management (service management and network control) and integrated service execution e. Enhanced tracking for navigation f. Space internetworking with DTN and IP g. End-to-end security h. Enhanced security services Moreover, the SCaN Program has created an Integrated Network Roadmap that depicts an orchestrated and coherent evolution path toward the target architecture, encompassing all aspects that concern network assets (i.e., operations and maintenance, sustaining engineering, upgrade efforts, and major development). This roadmap identifies major NASA ADPs, and shows dependencies and drivers among the various planned undertakings and timelines. The roadmap is scalable to accommodate timely adjustments in response to Agency needs, goals, objectives and funding. Future challenges to implementing this architecture include balancing user mission needs, technology development, and the availability of funding within NASA s priorities. Strategies for addressing these challenges are to: define a flexible architecture, update the architecture periodically, use ADPs to evaluate options and determine when to make decisions, and to engage the stakeholders in these evaluations. In addition, the SCaN Program will evaluate and respond to mission need dates for technical and operational capabilities to be provided by the SCaN integrated network. In that regard, the architecture defined in this ADD is scalable to accommodate programmatic and technical changes.
Multimedia content analysis and indexing: evaluation of a distributed and scalable architecture
NASA Astrophysics Data System (ADS)
Mandviwala, Hasnain; Blackwell, Scott; Weikart, Chris; Van Thong, Jean-Manuel
2003-11-01
Multimedia search engines facilitate the retrieval of documents from large media content archives now available via intranets and the Internet. Over the past several years, many research projects have focused on algorithms for analyzing and indexing media content efficiently. However, special system architectures are required to process large amounts of content from real-time feeds or existing archives. Possible solutions include dedicated distributed architectures for analyzing content rapidly and for making it searchable. The system architecture we propose implements such an approach: a highly distributed and reconfigurable batch media content analyzer that can process media streams and static media repositories. Our distributed media analysis application handles media acquisition, content processing, and document indexing. This collection of modules is orchestrated by a task flow management component, exploiting data and pipeline parallelism in the application. A scheduler manages load balancing and prioritizes the different tasks. Workers implement application-specific modules that can be deployed on an arbitrary number of nodes running different operating systems. Each application module is exposed as a web service, implemented with industry-standard interoperable middleware components such as Microsoft ASP.NET and Sun J2EE. Our system architecture is the next generation system for the multimedia indexing application demonstrated by www.speechbot.com. It can process large volumes of audio recordings with minimal support and maintenance, while running on low-cost commodity hardware. The system has been evaluated on a server farm running concurrent content analysis processes.
Integration of Sensors, Controllers and Instruments Using a Novel OPC Architecture
2017-01-01
The interconnection between sensors, controllers and instruments through a communication network plays a vital role in the performance and effectiveness of a control system. Since its inception in the 90s, the Object Linking and Embedding for Process Control (OPC) protocol has provided open connectivity for monitoring and automation systems. It has been widely used in several environments such as industrial facilities, building and energy automation, engineering education and many others. This paper presents a novel OPC-based architecture to implement automation systems devoted to R&D and educational activities. The proposal is a novel conceptual framework, structured into four functional layers where the diverse components are categorized aiming to foster the systematic design and implementation of automation systems involving OPC communication. Due to the benefits of OPC, the proposed architecture provides features like open connectivity, reliability, scalability, and flexibility. Furthermore, four successful experimental applications of such an architecture, developed at the University of Extremadura (UEX), are reported. These cases are a proof of concept of the ability of this architecture to support interoperability for different domains. Namely, the automation of energy systems like a smart microgrid and photobioreactor facilities, the implementation of a network-accessible industrial laboratory and the development of an educational hardware-in-the-loop platform are described. All cases include a Programmable Logic Controller (PLC) to automate and control the plant behavior, which exchanges operative data (measurements and signals) with a multiplicity of sensors, instruments and supervisory systems under the structure of the novel OPC architecture. Finally, the main conclusions and open research directions are highlighted. PMID:28654002
Integration of Sensors, Controllers and Instruments Using a Novel OPC Architecture.
González, Isaías; Calderón, Antonio José; Barragán, Antonio Javier; Andújar, José Manuel
2017-06-27
The interconnection between sensors, controllers and instruments through a communication network plays a vital role in the performance and effectiveness of a control system. Since its inception in the 90s, the Object Linking and Embedding for Process Control (OPC) protocol has provided open connectivity for monitoring and automation systems. It has been widely used in several environments such as industrial facilities, building and energy automation, engineering education and many others. This paper presents a novel OPC-based architecture to implement automation systems devoted to R&D and educational activities. The proposal is a novel conceptual framework, structured into four functional layers where the diverse components are categorized aiming to foster the systematic design and implementation of automation systems involving OPC communication. Due to the benefits of OPC, the proposed architecture provides features like open connectivity, reliability, scalability, and flexibility. Furthermore, four successful experimental applications of such an architecture, developed at the University of Extremadura (UEX), are reported. These cases are a proof of concept of the ability of this architecture to support interoperability for different domains. Namely, the automation of energy systems like a smart microgrid and photobioreactor facilities, the implementation of a network-accessible industrial laboratory and the development of an educational hardware-in-the-loop platform are described. All cases include a Programmable Logic Controller (PLC) to automate and control the plant behavior, which exchanges operative data (measurements and signals) with a multiplicity of sensors, instruments and supervisory systems under the structure of the novel OPC architecture. Finally, the main conclusions and open research directions are highlighted.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
NASA Astrophysics Data System (ADS)
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
Quantum repeaters based on trapped ions with decoherence-free subspace encoding
NASA Astrophysics Data System (ADS)
Zwerger, M.; Lanyon, B. P.; Northup, T. E.; Muschik, C. A.; Dür, W.; Sangouard, N.
2017-12-01
Quantum repeaters provide an efficient solution to distribute Bell pairs over arbitrarily long distances. While scalable architectures are demanding regarding the number of qubits that need to be controlled, here we present a quantum repeater scheme aiming to extend the range of present day quantum communications that could be implemented in the near future with trapped ions in cavities. We focus on an architecture where ion-photon entangled states are created locally and subsequently processed with linear optics to create elementary links of ion-ion entangled states. These links are then used to distribute entangled pairs over long distances using successive entanglement swapping operations performed using deterministic ion-ion gates. We show how this architecture can be implemented while encoding the qubits in a decoherence-free subspace to protect them against collective dephasing. This results in a protocol that can be used to violate a Bell inequality over distances of about 800 km assuming state-of-the-art parameters. We discuss how this could be improved to several thousand kilometres in future setups.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network.
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-21
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices' non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-01-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing. PMID:28322262
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCaskey, Alex; Billings, Jay Jay; de Almeida, Valmor F
2011-08-01
This report details the progress made in the development of the Reprocessing Plant Toolkit (RPTk) for the DOE Nuclear Energy Advanced Modeling and Simulation (NEAMS) program. RPTk is an ongoing development effort intended to provide users with an extensible, integrated, and scalable software framework for the modeling and simulation of spent nuclear fuel reprocessing plants by enabling the insertion and coupling of user-developed physicochemical modules of variable fidelity. The NEAMS Safeguards and Separations IPSC (SafeSeps) and the Enabling Computational Technologies (ECT) supporting program element have partnered to release an initial version of the RPTk with a focus on software usabilitymore » and utility. RPTk implements a data flow architecture that is the source of the system's extensibility and scalability. Data flows through physicochemical modules sequentially, with each module importing data, evolving it, and exporting the updated data to the next downstream module. This is accomplished through various architectural abstractions designed to give RPTk true plug-and-play capabilities. A simple application of this architecture, as well as RPTk data flow and evolution, is demonstrated in Section 6 with an application consisting of two coupled physicochemical modules. The remaining sections describe this ongoing work in full, from system vision and design inception to full implementation. Section 3 describes the relevant software development processes used by the RPTk development team. These processes allow the team to manage system complexity and ensure stakeholder satisfaction. This section also details the work done on the RPTk ``black box'' and ``white box'' models, with a special focus on the separation of concerns between the RPTk user interface and application runtime. Section 4 and 5 discuss that application runtime component in more detail, and describe the dependencies, behavior, and rigorous testing of its constituent components.« less
Mapping of H.264 decoding on a multiprocessor architecture
NASA Astrophysics Data System (ADS)
van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.
2003-05-01
Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
NASA Astrophysics Data System (ADS)
Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong
As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Space Flight Middleware: Remote AMS over DTN for Delay-Tolerant Messaging
NASA Technical Reports Server (NTRS)
Burleigh, Scott
2011-01-01
This paper describes a technique for implementing scalable, reliable, multi-source multipoint data distribution in space flight communications -- Delay-Tolerant Reliable Multicast (DTRM) -- that is fully supported by the "Remote AMS" (RAMS) protocol of the Asynchronous Message Service (AMS) proposed for standardization within the Consultative Committee for Space Data Systems (CCSDS). The DTRM architecture enables applications to easily "publish" messages that will be reliably and efficiently delivered to an arbitrary number of "subscribing" applications residing anywhere in the space network, whether in the same subnet or in a subnet on a remote planet or vehicle separated by many light minutes of interplanetary space. The architecture comprises multiple levels of protocol, each included for a specific purpose and allocated specific responsibilities: "application AMS" traffic performs end-system data introduction and delivery subject to access control; underlying "remote AMS" directs this application traffic to populations of recipients at remote locations in a multicast distribution tree, enabling the architecture to scale up to large networks; further underlying Delay-Tolerant Networking (DTN) Bundle Protocol (BP) advances RAMS protocol data units through the distribution tree using delay-tolerant storeand- forward methods; and further underlying reliable "convergence-layer" protocols ensure successful data transfer over each segment of the end-to-end route. The result is scalable, reliable, delay-tolerant multi-source multicast that is largely self-configuring.
The GOES-R Product Generation Architecture
NASA Astrophysics Data System (ADS)
Dittberner, G. J.; Kalluri, S.; Hansen, D.; Weiner, A.; Tarpley, A.; Marley, S.
2011-12-01
The GOES-R system will substantially improve users' ability to succeed in their work by providing data with significantly enhanced instruments, higher resolution, much shorter relook times, and an increased number and diversity of products. The Product Generation architecture is designed to provide the computer and memory resources necessary to achieve the necessary latency and availability for these products. Over time, new and updated algorithms are expected to be added and old ones removed as science advances and new products are developed. The GOES-R GS architecture is being planned to maintain functionality so that when such changes are implemented, operational product generation will continue without interruption. The primary parts of the PG infrastructure are the Service Based Architecture (SBA) and the Data Fabric (DF). SBA is the middleware that encapsulates and manages science algorithms that generate products. It is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DF to provide this data communication layer between algorithms. The DF provides an abstract interface over a distributed and persistent multi-layered storage system (e.g., memory based caching above disk-based storage) and an event management system that allows event-driven algorithm services to know when instrument data are available and where they reside. Together, the SBA and the DF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
A Geo-Distributed System Architecture for Different Domains
NASA Astrophysics Data System (ADS)
Moßgraber, Jürgen; Middleton, Stuart; Tao, Ran
2013-04-01
The presentation will describe work on the system-of-systems (SoS) architecture that is being developed in the EU FP7 project TRIDEC on "Collaborative, Complex and Critical Decision-Support in Evolving Crises". In this project we deal with two use-cases: Natural Crisis Management (e.g. Tsunami Early Warning) and Industrial Subsurface Development (e.g. drilling for oil). These use-cases seem to be quite different at first sight but share a lot of similarities, like managing and looking up available sensors, extracting data from them and annotate it semantically, intelligently manage the data (big data problem), run mathematical analysis algorithms on the data and finally provide decision support on this basis. The main challenge was to create a generic architecture which fits both use-cases. The requirements to the architecture are manifold and the whole spectrum of a modern, geo-distributed and collaborative system comes into play. Obviously, one cannot expect to tackle these challenges adequately with a monolithic system or with a single technology. Therefore, a system architecture providing the blueprints to implement the system-of-systems approach has to combine multiple technologies and architectural styles. The most important architectural challenges we needed to address are 1. Build a scalable communication layer for a System-of-sytems 2. Build a resilient communication layer for a System-of-sytems 3. Efficiently publish large volumes of semantically rich sensor data 4. Scalable and high performance storage of large distributed datasets 5. Handling federated multi-domain heterogeneous data 6. Discovery of resources in a geo-distributed SoS 7. Coordination of work between geo-distributed systems The design decisions made for each of them will be presented. These developed concepts are also applicable to the requirements of the Future Internet (FI) and Internet of Things (IoT) which will provide services like smart grids, smart metering, logistics and environmental monitoring.
Medical image archive node simulation and architecture
NASA Astrophysics Data System (ADS)
Chiang, Ted T.; Tang, Yau-Kuo
1996-05-01
It is a well known fact that managed care and new treatment technologies are revolutionizing the health care provider world. Community Health Information Network and Computer-based Patient Record projects are underway throughout the United States. More and more hospitals are installing digital, `filmless' radiology (and other imagery) systems. They generate a staggering amount of information around the clock. For example, a typical 500-bed hospital might accumulate more than 5 terabytes of image data in a period of 30 years for conventional x-ray images and digital images such as Magnetic Resonance Imaging and Computer Tomography images. With several hospitals contributing to the archive, the storage required will be in the hundreds of terabytes. Systems for reliable, secure, and inexpensive storage and retrieval of digital medical information do not exist today. In this paper, we present a Medical Image Archive and Distribution Service (MIADS) concept. MIADS is a system shared by individual and community hospitals, laboratories, and doctors' offices that need to store and retrieve medical images. Due to the large volume and complexity of the data, as well as the diversified user access requirement, implementation of the MIADS will be a complex procedure. One of the key challenges to implementing a MIADS is to select a cost-effective, scalable system architecture to meet the ingest/retrieval performance requirements. We have performed an in-depth system engineering study, and developed a sophisticated simulation model to address this key challenge. This paper describes the overall system architecture based on our system engineering study and simulation results. In particular, we will emphasize system scalability and upgradability issues. Furthermore, we will discuss our simulation results in detail. The simulations study the ingest/retrieval performance requirements based on different system configurations and architectures for variables such as workload, tape access time, number of drives, number of exams per patient, number of Central Processing Units, patient grouping, and priority impacts. The MIADS, which could be a key component of a broader data repository system, will be able to communicate with and obtain data from existing hospital information systems. We will discuss the external interfaces enabling MIADS to communicate with and obtain data from existing Radiology Information Systems such as the Picture Archiving and Communication System (PACS). Our system design encompasses the broader aspects of the archive node, which could include multimedia data such as image, audio, video, and free text data. This system is designed to be integrated with current hospital PACS through a Digital Imaging and Communications in Medicine interface. However, the system can also be accessed through the Internet using Hypertext Transport Protocol or Simple File Transport Protocol. Our design and simulation work will be key to implementing a successful, scalable medical image archive and distribution system.
Hierarchical Address Event Routing for Reconfigurable Large-Scale Neuromorphic Systems.
Park, Jongkil; Yu, Theodore; Joshi, Siddharth; Maier, Christoph; Cauwenberghs, Gert
2017-10-01
We present a hierarchical address-event routing (HiAER) architecture for scalable communication of neural and synaptic spike events between neuromorphic processors, implemented with five Xilinx Spartan-6 field-programmable gate arrays and four custom analog neuromophic integrated circuits serving 262k neurons and 262M synapses. The architecture extends the single-bus address-event representation protocol to a hierarchy of multiple nested buses, routing events across increasing scales of spatial distance. The HiAER protocol provides individually programmable axonal delay in addition to strength for each synapse, lending itself toward biologically plausible neural network architectures, and scales across a range of hierarchies suitable for multichip and multiboard systems in reconfigurable large-scale neuromorphic systems. We show approximately linear scaling of net global synaptic event throughput with number of routing nodes in the network, at 3.6×10 7 synaptic events per second per 16k-neuron node in the hierarchy.
Li, Bo; Wang, Xin; Jung, Hyun Young; Kim, Young Lae; Robinson, Jeremy T.; Zalalutdinov, Maxim; Hong, Sanghyun; Hao, Ji; Ajayan, Pulickel M.; Wan, Kai-Tak; Jung, Yung Joon
2015-01-01
Suspended single-walled carbon nanotubes (SWCNTs) offer unique functionalities for electronic and electromechanical systems. Due to their outstanding flexible nature, suspended SWCNT architectures have great potential for integration into flexible electronic systems. However, current techniques for integrating SWCNT architectures with flexible substrates are largely absent, especially in a manner that is both scalable and well controlled. Here, we present a new nanostructured transfer paradigm to print scalable and well-defined suspended nano/microscale SWCNT networks on 3D patterned flexible substrates with micro- to nanoscale precision. The underlying printing/transfer mechanism, as well as the mechanical, electromechanical, and mechanical resonance properties of the suspended SWCNTs are characterized, including identifying metrics relevant for reliable and sensitive device structures. Our approach represents a fast, scalable and general method for building suspended nano/micro SWCNT architectures suitable for flexible sensing and actuation systems. PMID:26511284
Li, Bo; Wang, Xin; Jung, Hyun Young; Kim, Young Lae; Robinson, Jeremy T; Zalalutdinov, Maxim; Hong, Sanghyun; Hao, Ji; Ajayan, Pulickel M; Wan, Kai-Tak; Jung, Yung Joon
2015-10-29
Suspended single-walled carbon nanotubes (SWCNTs) offer unique functionalities for electronic and electromechanical systems. Due to their outstanding flexible nature, suspended SWCNT architectures have great potential for integration into flexible electronic systems. However, current techniques for integrating SWCNT architectures with flexible substrates are largely absent, especially in a manner that is both scalable and well controlled. Here, we present a new nanostructured transfer paradigm to print scalable and well-defined suspended nano/microscale SWCNT networks on 3D patterned flexible substrates with micro- to nanoscale precision. The underlying printing/transfer mechanism, as well as the mechanical, electromechanical, and mechanical resonance properties of the suspended SWCNTs are characterized, including identifying metrics relevant for reliable and sensitive device structures. Our approach represents a fast, scalable and general method for building suspended nano/micro SWCNT architectures suitable for flexible sensing and actuation systems.
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
The Electronic Logbook for the Information Storage of ATLAS Experiment at LHC (ELisA)
NASA Astrophysics Data System (ADS)
Corso Radu, A.; Lehmann Miotto, G.; Magnoni, L.
2012-12-01
A large experiment like ATLAS at LHC (CERN), with over three thousand members and a shift crew of 15 people running the experiment 24/7, needs an easy and reliable tool to gather all the information concerning the experiment development, installation, deployment and exploitation over its lifetime. With the increasing number of users and the accumulation of stored information since the experiment start-up, the electronic logbook actually in use, ATLOG, started to show its limitations in terms of speed and usability. Its monolithic architecture makes the maintenance and implementation of new functionality a hard-to-almost-impossible process. A new tool ELisA has been developed to replace the existing ATLOG. It is based on modern web technologies: the Spring framework using a Model-View-Controller architecture was chosen, thus helping building flexible and easy to maintain applications. The new tool implements all features of the old electronic logbook with increased performance and better graphics: it uses the same database back-end for portability reasons. In addition, several new requirements have been accommodated which could not be implemented in ATLOG. This paper describes the architecture, implementation and performance of ELisA, with particular emphasis on the choices that allowed having a scalable and very fast system and on the aspects that could be re-used in different contexts to build a similar application.
A Hybrid EAV-Relational Model for Consistent and Scalable Capture of Clinical Research Data.
Khan, Omar; Lim Choi Keung, Sarah N; Zhao, Lei; Arvanitis, Theodoros N
2014-01-01
Many clinical research databases are built for specific purposes and their design is often guided by the requirements of their particular setting. Not only does this lead to issues of interoperability and reusability between research groups in the wider community but, within the project itself, changes and additions to the system could be implemented using an ad hoc approach, which may make the system difficult to maintain and even more difficult to share. In this paper, we outline a hybrid Entity-Attribute-Value and relational model approach for modelling data, in light of frequently changing requirements, which enables the back-end database schema to remain static, improving the extensibility and scalability of an application. The model also facilitates data reuse. The methods used build on the modular architecture previously introduced in the CURe project.
NASA Astrophysics Data System (ADS)
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures
NASA Astrophysics Data System (ADS)
Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.
2016-12-01
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Distributed controller clustering in software defined networks
Gani, Abdullah; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond
2017-01-01
Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability. PMID:28384312
Process Management inside ATLAS DAQ
NASA Astrophysics Data System (ADS)
Alexandrov, I.; Amorim, A.; Badescu, E.; Burckhart-Chromek, D.; Caprini, M.; Dobson, M.; Duval, P. Y.; Hart, R.; Jones, R.; Kazarov, A.; Kolos, S.; Kotov, V.; Liko, D.; Lucio, L.; Mapelli, L.; Mineev, M.; Moneta, L.; Nassiakou, M.; Pedro, L.; Ribeiro, A.; Roumiantsev, V.; Ryabov, Y.; Schweiger, D.; Soloviev, I.; Wolters, H.
2002-10-01
The Process Management component of the online software of the future ATLAS experiment data acquisition system is presented. The purpose of the Process Manager is to perform basic job control of the software components of the data acquisition system. It is capable of starting, stopping and monitoring the status of those components on the data acquisition processors independent of the underlying operating system. Its architecture is designed on the basis of a server client model using CORBA based communication. The server part relies on C++ software agent objects acting as an interface between the local operating system and client applications. Some of the major design challenges of the software agents were to achieve the maximum degree of autonomy possible, to create processes aware of dynamic conditions in their environment and with the ability to determine corresponding actions. Issues such as the performance of the agents in terms of time needed for process creation and destruction, the scalability of the system taking into consideration the final ATLAS configuration and minimizing the use of hardware resources were also of critical importance. Besides the details given on the architecture and the implementation, we also present scalability and performance tests results of the Process Manager system.
Piromalis, Dimitrios; Arvanitis, Konstantinos
2016-08-04
Wireless Sensor and Actuators Networks (WSANs) constitute one of the most challenging technologies with tremendous socio-economic impact for the next decade. Functionally and energy optimized hardware systems and development tools maybe is the most critical facet of this technology for the achievement of such prospects. Especially, in the area of agriculture, where the hostile operating environment comes to add to the general technological and technical issues, reliable and robust WSAN systems are mandatory. This paper focuses on the hardware design architectures of the WSANs for real-world agricultural applications. It presents the available alternatives in hardware design and identifies their difficulties and problems for real-life implementations. The paper introduces SensoTube, a new WSAN hardware architecture, which is proposed as a solution to the various existing design constraints of WSANs. The establishment of the proposed architecture is based, firstly on an abstraction approach in the functional requirements context, and secondly, on the standardization of the subsystems connectivity, in order to allow for an open, expandable, flexible, reconfigurable, energy optimized, reliable and robust hardware system. The SensoTube implementation reference model together with its encapsulation design and installation are analyzed and presented in details. Furthermore, as a proof of concept, certain use cases have been studied in order to demonstrate the benefits of migrating existing designs based on the available open-source hardware platforms to SensoTube architecture.
Peer-to-peer Cooperative Scheduling Architecture for National Grid Infrastructure
NASA Astrophysics Data System (ADS)
Matyska, Ludek; Ruda, Miroslav; Toth, Simon
For some ten years, the Czech National Grid Infrastructure MetaCentrum uses a single central PBSPro installation to schedule jobs across the country. This centralized approach keeps a full track about all the clusters, providing support for jobs spanning several sites, implementation for the fair-share policy and better overall control of the grid environment. Despite a steady progress in the increased stability and resilience to intermittent very short network failures, growing number of sites and processors makes this architecture, with a single point of failure and scalability limits, obsolete. As a result, a new scheduling architecture is proposed, which relies on higher autonomy of clusters. It is based on a peer to peer network of semi-independent schedulers for each site or even cluster. Each scheduler accepts jobs for the whole infrastructure, cooperating with other schedulers on implementation of global policies like central job accounting, fair-share, or submission of jobs across several sites. The scheduling system is integrated with the Magrathea system to support scheduling of virtual clusters, including the setup of their internal network, again eventually spanning several sites. On the other hand, each scheduler is local to one of several clusters and is able to directly control and submit jobs to them even if the connection of other scheduling peers is lost. In parallel to the change of the overall architecture, the scheduling system itself is being replaced. Instead of PBSPro, chosen originally for its declared support of large scale distributed environment, the new scheduling architecture is based on the open-source Torque system. The implementation and support for the most desired properties in PBSPro and Torque are discussed and the necessary modifications to Torque to support the MetaCentrum scheduling architecture are presented, too.
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure
Persaud, A.; Ji, Q.; Feinberg, E.; ...
2017-06-08
Here, a new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number ofmore » parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further red ucing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Finally, ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.« less
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure
NASA Astrophysics Data System (ADS)
Persaud, A.; Ji, Q.; Feinberg, E.; Seidl, P. A.; Waldron, W. L.; Schenkel, T.; Lal, A.; Vinayakumar, K. B.; Ardanuc, S.; Hammer, D. A.
2017-06-01
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure.
Persaud, A; Ji, Q; Feinberg, E; Seidl, P A; Waldron, W L; Schenkel, T; Lal, A; Vinayakumar, K B; Ardanuc, S; Hammer, D A
2017-06-01
A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
Hatsek, Avner; Shahar, Yuval; Taieb-Maimon, Meirav; Shalom, Erez; Klimov, Denis; Lunenfeld, Eitan
2010-01-01
Clinical guidelines have been shown to improve the quality of medical care and to reduce its costs. However, most guidelines exist in a free-text representation and, without automation, are not sufficiently accessible to clinicians at the point of care. A prerequisite for automated guideline application is a machine-comprehensible representation of the guidelines. In this study, we designed and implemented a scalable architecture to support medical experts and knowledge engineers in specifying and maintaining the procedural and declarative aspects of clinical guideline knowledge, resulting in a machine comprehensible representation. The new framework significantly extends our previous work on the Digital electronic Guidelines Library (DeGeL) The current study designed and implemented a graphical framework for specification of declarative and procedural clinical knowledge, Gesher. We performed three different experiments to evaluate the functionality and usability of the major aspects of the new framework: Specification of procedural clinical knowledge, specification of declarative clinical knowledge, and exploration of a given clinical guideline. The subjects included clinicians and knowledge engineers (overall, 27 participants). The evaluations indicated high levels of completeness and correctness of the guideline specification process by both the clinicians and the knowledge engineers, although the best results, in the case of declarative-knowledge specification, were achieved by teams including a clinician and a knowledge engineer. The usability scores were high as well, although the clinicians' assessment was significantly lower than the assessment of the knowledge engineers.
Exploring a model-driven architecture (MDA) approach to health care information systems development.
Raghupathi, Wullianallur; Umar, Amjad
2008-05-01
To explore the potential of the model-driven architecture (MDA) in health care information systems development. An MDA is conceptualized and developed for a health clinic system to track patient information. A prototype of the MDA is implemented using an advanced MDA tool. The UML provides the underlying modeling support in the form of the class diagram. The PIM to PSM transformation rules are applied to generate the prototype application from the model. The result of the research is a complete MDA methodology to developing health care information systems. Additional insights gained include development of transformation rules and documentation of the challenges in the application of MDA to health care. Design guidelines for future MDA applications are described. The model has the potential for generalizability. The overall approach supports limited interoperability and portability. The research demonstrates the applicability of the MDA approach to health care information systems development. When properly implemented, it has the potential to overcome the challenges of platform (vendor) dependency, lack of open standards, interoperability, portability, scalability, and the high cost of implementation.
A Scalability Model for ECS's Data Server
NASA Technical Reports Server (NTRS)
Menasce, Daniel A.; Singhal, Mukesh
1998-01-01
This report presents in four chapters a model for the scalability analysis of the Data Server subsystem of the Earth Observing System Data and Information System (EOSDIS) Core System (ECS). The model analyzes if the planned architecture of the Data Server will support an increase in the workload with the possible upgrade and/or addition of processors, storage subsystems, and networks. The approaches in the report include a summary of the architecture of ECS's Data server as well as a high level description of the Ingest and Retrieval operations as they relate to ECS's Data Server. This description forms the basis for the development of the scalability model of the data server and the methodology used to solve it.
VASP-4096: a very high performance programmable device for digital media processing applications
NASA Astrophysics Data System (ADS)
Krikelis, Argy
2001-03-01
Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.
The GOES-R Product Generation Architecture - Post CDR Update
NASA Astrophysics Data System (ADS)
Dittberner, G.; Kalluri, S.; Weiner, A.
2012-12-01
The GOES-R system will substantially improve the accuracy of information available to users by providing data from significantly enhanced instruments, which will generate an increased number and diversity of products with higher resolution, and much shorter relook times. Considerably greater compute and memory resources are necessary to achieve the necessary latency and availability for these products. Over time, new and updated algorithms are expected to be added and old ones removed as science advances and new products are developed. The GOES-R GS architecture is being planned to maintain functionality so that when such changes are implemented, operational product generation will continue without interruption. The primary parts of the PG infrastructure are the Service Based Architecture (SBA) and the Data Fabric (DF). SBA is the middleware that encapsulates and manages science algorithms that generate products. It is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DF to provide this data communication layer between algorithms. The DF provides an abstract interface over a distributed and persistent multi-layered storage system (e.g., memory based caching above disk-based storage) and an event management system that allows event-driven algorithm services to know when instrument data are available and where they reside. Together, the SBA and the DF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
Architecture Knowledge for Evaluating Scalable Databases
2015-01-16
problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly
Using Computing and Data Grids for Large-Scale Science and Engineering
NASA Technical Reports Server (NTRS)
Johnston, William E.
2001-01-01
We use the term "Grid" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. These emerging data and computing Grids promise to provide a highly capable and scalable environment for addressing large-scale science problems. We describe the requirements for science Grids, the resulting services and architecture of NASA's Information Power Grid (IPG) and DOE's Science Grid, and some of the scaling issues that have come up in their implementation.
Transport implementation of the Bernstein-Vazirani algorithm with ion qubits
NASA Astrophysics Data System (ADS)
Fallek, S. D.; Herold, C. D.; McMahon, B. J.; Maller, K. M.; Brown, K. R.; Amini, J. M.
2016-08-01
Using trapped ion quantum bits in a scalable microfabricated surface trap, we perform the Bernstein-Vazirani algorithm. Our architecture takes advantage of the ion transport capabilities of such a trap. The algorithm is demonstrated using two- and three-ion chains. For three ions, an improvement is achieved compared to a classical system using the same number of oracle queries. For two ions and one query, we correctly determine an unknown bit string with probability 97.6(8)%. For three ions, we succeed with probability 80.9(3)%.
Electron beam throughput from raster to imaging
NASA Astrophysics Data System (ADS)
Zywno, Marek
2016-12-01
Two architectures of electron beam tools are presented: single beam MEBES Exara designed and built by Etec Systems for mask writing, and the Reflected E-Beam Lithography tool (REBL), designed and built by KLA-Tencor under a DARPA Agreement No. HR0011-07-9-0007. Both tools have implemented technologies not used before to achieve their goals. The MEBES X, renamed Exara for marketing purposes, used an air bearing stage running in vacuum to achieve smooth continuous scanning. The REBL used 2 dimensional imaging to distribute charge to a 4k pixel swath to achieve writing times on the order of 1 wafer per hour, scalable to throughput approaching optical projection tools. Three stage architectures were designed for continuous scanning of wafers: linear maglev, rotary maglev, and dual linear maglev.
Flexible medical image management using service-oriented architecture.
Shaham, Oded; Melament, Alex; Barak-Corren, Yuval; Kostirev, Igor; Shmueli, Noam; Peres, Yardena
2012-01-01
Management of medical images increasingly involves the need for integration with a variety of information systems. To address this need, we developed Content Management Offering (CMO), a platform for medical image management supporting interoperability through compliance with standards. CMO is based on the principles of service-oriented architecture, implemented with emphasis on three areas: clarity of business process definition, consolidation of service configuration management, and system scalability. Owing to the flexibility of this platform, a small team is able to accommodate requirements of customers varying in scale and in business needs. We describe two deployments of CMO, highlighting the platform's value to customers. CMO represents a flexible approach to medical image management, which can be applied to a variety of information technology challenges in healthcare and life sciences organizations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
Peta-scale computing environments pose significant challenges for both system and application developers and addressing them required more than simply scaling up existing tera-scale solutions. Performance analysis tools play an important role in gaining this understanding, but previous monolithic tools with fixed feature sets have not sufficed. Instead, this project worked on the design, implementation, and evaluation of a general, flexible tool infrastructure supporting the construction of performance tools as “pipelines” of high-quality tool building blocks. These tool building blocks provide common performance tool functionality, and are designed for scalability, lightweight data acquisition and analysis, and interoperability. For this project, wemore » built on Open|SpeedShop, a modular and extensible open source performance analysis tool set. The design and implementation of such a general and reusable infrastructure targeted for petascale systems required us to address several challenging research issues. All components needed to be designed for scale, a task made more difficult by the need to provide general modules. The infrastructure needed to support online data aggregation to cope with the large amounts of performance and debugging data. We needed to be able to map any combination of tool components to each target architecture. And we needed to design interoperable tool APIs and workflows that were concrete enough to support the required functionality, yet provide the necessary flexibility to address a wide range of tools. A major result of this project is the ability to use this scalable infrastructure to quickly create tools that match with a machine architecture and a performance problem that needs to be understood. Another benefit is the ability for application engineers to use the highly scalable, interoperable version of Open|SpeedShop, which are reassembled from the tool building blocks into a flexible, multi-user interface set of tools. This set of tools targeted at Office of Science Leadership Class computer systems and selected Office of Science application codes. We describe the contributions made by the team at the University of Wisconsin. The project built on the efforts in Open|SpeedShop funded by DOE/NNSA and the DOE/NNSA Tri-Lab community, extended Open|Speedshop to the Office of Science Leadership Class Computing Facilities, and addressed new challenges found on these cutting edge systems. Work done under this project at Wisconsin can be divided into two categories, new algorithms and techniques for debugging, and foundation infrastructure work on our Dyninst binary analysis and instrumentation toolkits and MRNet scalability infrastructure.« less
NASA Astrophysics Data System (ADS)
Aktas, Mehmet; Aydin, Galip; Donnellan, Andrea; Fox, Geoffrey; Granat, Robert; Grant, Lisa; Lyzenga, Greg; McLeod, Dennis; Pallickara, Shrideep; Parker, Jay; Pierce, Marlon; Rundle, John; Sayar, Ahmet; Tullis, Terry
2006-12-01
We describe the goals and initial implementation of the International Solid Earth Virtual Observatory (iSERVO). This system is built using a Web Services approach to Grid computing infrastructure and is accessed via a component-based Web portal user interface. We describe our implementations of services used by this system, including Geographical Information System (GIS)-based data grid services for accessing remote data repositories and job management services for controlling multiple execution steps. iSERVO is an example of a larger trend to build globally scalable scientific computing infrastructures using the Service Oriented Architecture approach. Adoption of this approach raises a number of research challenges in millisecond-latency message systems suitable for internet-enabled scientific applications. We review our research in these areas.
NASA Astrophysics Data System (ADS)
Bay, Hamed Hosseini; Patino, Daisy; Mutlu, Zafer; Romero, Paige; Ozkan, Mihrimah; Ozkan, Cengiz S.
2016-02-01
Water decontamination and oil/water separation are principal motives in the surge to develop novel means for sustainability. In this prospect, supplying clean water for the ecosystems is as important as the recovery of the oil spills since the supplies are scarce. Inspired to design an engineering material which not only serves this purpose, but can also be altered for other applications to preserve natural resources, a facile template-free process is suggested to fabricate a superporous, superhydrophobic ultra-thin graphite sponge. Moreover, the process is designed to be inexpensive and scalable. The fabricated sponge can be used to clean up different types of oil, organic solvents, toxic and corrosive contaminants. This versatile microstructure can retain its functionality even when pulverized. The sponge is applicable for targeted sorption and collection due to its ferromagnetic properties. We hope that such a cost-effective process can be embraced and implemented widely.
Scalable Background-Limited Polarization-Sensitive Detectors for mm-wave Applications
NASA Technical Reports Server (NTRS)
Rostem, Karwan; Ali, Aamir; Appel, John W.; Bennett, Charles L.; Chuss, David T.; Colazo, Felipe A.; Crowe, Erik; Denis, Kevin L.; Essinger-Hileman, Tom; Marriage, Tobias A.;
2014-01-01
We report on the status and development of polarization-sensitive detectors for millimeter-wave applications. The detectors are fabricated on single-crystal silicon, which functions as a low-loss dielectric substrate for the microwave circuitry as well as the supporting membrane for the Transition-Edge Sensor (TES) bolometers. The orthomode transducer (OMT) is realized as a symmetric structure and on-chip filters are employed to define the detection bandwidth. A hybridized integrated enclosure reduces the high-frequency THz mode set that can couple to the TES bolometers. An implementation of the detector architecture at Q-band achieves 90% efficiency in each polarization. The design is scalable in both frequency coverage, 30-300 GHz, and in number of detectors with uniform characteristics. Hence, the detectors are desirable for ground-based or space-borne instruments that require large arrays of efficient background-limited cryogenic detectors.
Evaluation of the Huawei UDS cloud storage system for CERN specific data
NASA Astrophysics Data System (ADS)
Zotes Resines, M.; Heikkila, S. S.; Duellmann, D.; Adde, G.; Toebbicke, R.; Hughes, J.; Wang, L.
2014-06-01
Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack SWIFT storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as a standard in the cloud storage market. A set of client machines is used to generate I/O load patterns to evaluate the storage system performance. The presented read and write test results indicate scalability both in metadata and data perspectives. Futher the Huawei UDS cloud storage is shown to be able to recover from a major failure of losing 16 disks. Both cloud storages are finally demonstrated to function as back-end storage systems to a filesystem, which is used to deliver high energy physics software.
NASA Technical Reports Server (NTRS)
Parish, David W.; Grabbe, Robert D.; Marzwell, Neville I.
1994-01-01
A Modular Autonomous Robotic System (MARS), consisting of a modular autonomous vehicle control system that can be retrofit on to any vehicle to convert it to autonomous control and support a modular payload for multiple applications is being developed. The MARS design is scalable, reconfigurable, and cost effective due to the use of modern open system architecture design methodologies, including serial control bus technology to simplify system wiring and enhance scalability. The design is augmented with modular, object oriented (C++) software implementing a hierarchy of five levels of control including teleoperated, continuous guidepath following, periodic guidepath following, absolute position autonomous navigation, and relative position autonomous navigation. The present effort is focused on producing a system that is commercially viable for routine autonomous patrolling of known, semistructured environments, like environmental monitoring of chemical and petroleum refineries, exterior physical security and surveillance, perimeter patrolling, and intrafacility transport applications.
Zhang, Xiaoyuan; Cheng, Shaoan; Liang, Peng; Huang, Xia; Logan, Bruce E
2011-01-01
The combined use of brush anodes and glass fiber (GF1) separators, and plastic mesh supporters were used here for the first time to create a scalable microbial fuel cell architecture. Separators prevented short circuiting of closely-spaced electrodes, and cathode supporters were used to avoid water gaps between the separator and cathode that can reduce power production. The maximum power density with a separator and supporter and a single cathode was 75 ± 1 W/m(3). Removing the separator decreased power by 8%. Adding a second cathode increased power to 154 ± 1 W/m(3). Current was increased by connecting two MFCs connected in parallel. These results show that brush anodes, combined with a glass fiber separator and a plastic mesh supporter, produce a useful MFC architecture that is inherently scalable due to good insulation between the electrodes and a compact architecture. Copyright © 2010 Elsevier Ltd. All rights reserved.
Conceptual Architecture for Obtaining Cyber Situational Awareness
2014-06-01
1-893723-17-8. [10] SKYBOX SECURITY. Developer´s Guide. Skybox View. Manual.Version 11. 2010. [11] SCALABLE Network. EXata communications...E. Understanding command and control. Washington, D.C.: CCRP Publication Series, 2006. 255 p. ISBN 1-893723-17-8. • [10] SKYBOX SECURITY. Developer...s Guide. Skybox View. Manual.Version 11. 2010. • [11] SCALABLE Network. EXata communications simulation platform. Available: <http://www.scalable
TriG: Next Generation Scalable Spaceborne GNSS Receiver
NASA Technical Reports Server (NTRS)
Tien, Jeffrey Y.; Okihiro, Brian Bachman; Esterhuizen, Stephan X.; Franklin, Garth W.; Meehan, Thomas K.; Munson, Timothy N.; Robison, David E.; Turbiner, Dmitry; Young, Lawrence E.
2012-01-01
TriG is the next generation NASA scalable space GNSS Science Receiver. It will track all GNSS and additional signals (i.e. GPS, GLONASS, Galileo, Compass and Doris). Scalable 3U architecture and fully software and firmware recofigurable, enabling optimization to meet specific mission requirements. TriG GNSS EM is currently undergoing testing and is expected to complete full performance testing later this year.
Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju
2014-03-21
A tracking service like asset management is essential in a dynamic hospital environment consisting of numerous mobile assets (e.g., wheelchairs or infusion pumps) that are continuously relocated throughout a hospital. The tracking service is accomplished based on the key technologies of an indoor location-based service (LBS), such as locating and monitoring multiple mobile targets inside a building in real time. An indoor LBS such as a tracking service entails numerous resource lookups being requested concurrently and frequently from several locations, as well as a network infrastructure requiring support for high scalability in indoor environments. A traditional centralized architecture needs to maintain a geographic map of the entire building or complex in its central server, which can cause low scalability and traffic congestion. This paper presents a self-organizing and fully distributed indoor mobile asset management (MAM) platform, and proposes an architecture for multiple trackees (such as mobile assets) and trackers based on the proposed distributed platform in real time. In order to verify the suggested platform, scalability performance according to increases in the number of concurrent lookups was evaluated in a real test bed. Tracking latency and traffic load ratio in the proposed tracking architecture was also evaluated.
LVFS: A Scalable Petabye/Exabyte Data Storage System
NASA Astrophysics Data System (ADS)
Golpayegani, N.; Halem, M.; Masuoka, E. J.; Ye, G.; Devine, N. K.
2013-12-01
Managing petabytes of data with hundreds of millions of files is the first step necessary towards an effective big data computing and collaboration environment in a distributed system. We describe here the MODAPS LAADS Virtual File System (LVFS), a new storage architecture which replaces the previous MODAPS operational Level 1 Land Atmosphere Archive Distribution System (LAADS) NFS based approach to storing and distributing datasets from several instruments, such as MODIS, MERIS, and VIIRS. LAADS is responsible for the distribution of over 4 petabytes of data and over 300 million files across more than 500 disks. We present here the first LVFS big data comparative performance results and new capabilities not previously possible with the LAADS system. We consider two aspects in addressing inefficiencies of massive scales of data. First, is dealing in a reliable and resilient manner with the volume and quantity of files in such a dataset, and, second, minimizing the discovery and lookup times for accessing files in such large datasets. There are several popular file systems that successfully deal with the first aspect of the problem. Their solution, in general, is through distribution, replication, and parallelism of the storage architecture. The Hadoop Distributed File System (HDFS), Parallel Virtual File System (PVFS), and Lustre are examples of such file systems that deal with petabyte data volumes. The second aspect deals with data discovery among billions of files, the largest bottleneck in reducing access time. The metadata of a file, generally represented in a directory layout, is stored in ways that are not readily scalable. This is true for HDFS, PVFS, and Lustre as well. Recent experimental file systems, such as Spyglass or Pantheon, have attempted to address this problem through redesign of the metadata directory architecture. LVFS takes a radically different architectural approach by eliminating the need for a separate directory within the file system. The LVFS system replaces the NFS disk mounting approach of LAADS and utilizes the already existing highly optimized metadata database server, which is applicable to most scientific big data intensive compute systems. Thus, LVFS ties the existing storage system with the existing metadata infrastructure system which we believe leads to a scalable exabyte virtual file system. The uniqueness of the implemented design is not limited to LAADS but can be employed with most scientific data processing systems. By utilizing the Filesystem In Userspace (FUSE), a kernel module available in many operating systems, LVFS was able to replace the NFS system while staying POSIX compliant. As a result, the LVFS system becomes scalable to exabyte sizes owing to the use of highly scalable database servers optimized for metadata storage. The flexibility of the LVFS design allows it to organize data on the fly in different ways, such as by region, date, instrument or product without the need for duplication, symbolic links, or any other replication methods. We proposed here a strategic reference architecture that addresses the inefficiencies of scientific petabyte/exabyte file system access through the dynamic integration of the observing system's large metadata file.
Piromalis, Dimitrios; Arvanitis, Konstantinos
2016-01-01
Wireless Sensor and Actuators Networks (WSANs) constitute one of the most challenging technologies with tremendous socio-economic impact for the next decade. Functionally and energy optimized hardware systems and development tools maybe is the most critical facet of this technology for the achievement of such prospects. Especially, in the area of agriculture, where the hostile operating environment comes to add to the general technological and technical issues, reliable and robust WSAN systems are mandatory. This paper focuses on the hardware design architectures of the WSANs for real-world agricultural applications. It presents the available alternatives in hardware design and identifies their difficulties and problems for real-life implementations. The paper introduces SensoTube, a new WSAN hardware architecture, which is proposed as a solution to the various existing design constraints of WSANs. The establishment of the proposed architecture is based, firstly on an abstraction approach in the functional requirements context, and secondly, on the standardization of the subsystems connectivity, in order to allow for an open, expandable, flexible, reconfigurable, energy optimized, reliable and robust hardware system. The SensoTube implementation reference model together with its encapsulation design and installation are analyzed and presented in details. Furthermore, as a proof of concept, certain use cases have been studied in order to demonstrate the benefits of migrating existing designs based on the available open-source hardware platforms to SensoTube architecture. PMID:27527180
Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs
NASA Astrophysics Data System (ADS)
Dias, Tiago; Roma, Nuno; Sousa, Leonel
2014-12-01
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
Architecture of next-generation information management systems for digital radiology enterprises
NASA Astrophysics Data System (ADS)
Wong, Stephen T. C.; Wang, Huili; Shen, Weimin; Schmidt, Joachim; Chen, George; Dolan, Tom
2000-05-01
Few information systems today offer a clear and flexible means to define and manage the automated part of radiology processes. None of them provide a coherent and scalable architecture that can easily cope with heterogeneity and inevitable local adaptation of applications. Most importantly, they often lack a model that can integrate clinical and administrative information to aid better decisions in managing resources, optimizing operations, and improving productivity. Digital radiology enterprises require cost-effective solutions to deliver information to the right person in the right place and at the right time. We propose a new architecture of image information management systems for digital radiology enterprises. Such a system is based on the emerging technologies in workflow management, distributed object computing, and Java and Web techniques, as well as Philips' domain knowledge in radiology operations. Our design adapts the approach of '4+1' architectural view. In this new architecture, PACS and RIS will become one while the user interaction can be automated by customized workflow process. Clinical service applications are implemented as active components. They can be reasonably substituted by applications of local adaptations and can be multiplied for fault tolerance and load balancing. Furthermore, it will provide powerful query and statistical functions for managing resources and improving productivity in real time. This work will lead to a new direction of image information management in the next millennium. We will illustrate the innovative design with implemented examples of a working prototype.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Functional Basis for Efficient Physical Layer Classical Control in Quantum Processors
NASA Astrophysics Data System (ADS)
Ball, Harrison; Nguyen, Trung; Leong, Philip H. W.; Biercuk, Michael J.
2016-12-01
The rapid progress seen in the development of quantum-coherent devices for information processing has motivated serious consideration of quantum computer architecture and organization. One topic which remains open for investigation and optimization relates to the design of the classical-quantum interface, where control operations on individual qubits are applied according to higher-level algorithms; accommodating competing demands on performance and scalability remains a major outstanding challenge. In this work, we present a resource-efficient, scalable framework for the implementation of embedded physical layer classical controllers for quantum-information systems. Design drivers and key functionalities are introduced, leading to the selection of Walsh functions as an effective functional basis for both programing and controller hardware implementation. This approach leverages the simplicity of real-time Walsh-function generation in classical digital hardware, and the fact that a wide variety of physical layer controls, such as dynamic error suppression, are known to fall within the Walsh family. We experimentally implement a real-time field-programmable-gate-array-based Walsh controller producing Walsh timing signals and Walsh-synthesized analog waveforms appropriate for critical tasks in error-resistant quantum control and noise characterization. These demonstrations represent the first step towards a unified framework for the realization of physical layer controls compatible with large-scale quantum-information processing.
HYDRA : High-speed simulation architecture for precision spacecraft formation simulation
NASA Technical Reports Server (NTRS)
Martin, Bryan J.; Sohl, Garett.
2003-01-01
e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.
2004-10-01
MONITORING AGENCY NAME(S) AND ADDRESS(ES) Defense Advanced Research Projects Agency AFRL/IFTC 3701 North Fairfax Drive...Scalable Parallel Libraries for Large-Scale Concurrent Applications," Technical Report UCRL -JC-109251, Lawrence Livermore National Laboratory
NASA Astrophysics Data System (ADS)
Miles, B.; Chepudira, K.; LaBar, W.
2017-12-01
The Open Geospatial Consortium (OGC) SensorThings API (STA) specification, ratified in 2016, is a next-generation open standard for enabling real-time communication of sensor data. Building on over a decade of OGC Sensor Web Enablement (SWE) Standards, STA offers a rich data model that can represent a range of sensor and phenomena types (e.g. fixed sensors sensing fixed phenomena, fixed sensors sensing moving phenomena, mobile sensors sensing fixed phenomena, and mobile sensors sensing moving phenomena) and is data agnostic. Additionally, and in contrast to previous SWE standards, STA is developer-friendly, as is evident from its convenient JSON serialization, and expressive OData-based query language (with support for geospatial queries); with its Message Queue Telemetry Transport (MQTT), STA is also well-suited to efficient real-time data publishing and discovery. All these attributes make STA potentially useful for use in environmental monitoring sensor networks. Here we present Kinota(TM), an Open-Source NoSQL implementation of OGC SensorThings for large-scale high-resolution real-time environmental monitoring. Kinota, which roughly stands for Knowledge from Internet of Things Analyses, relies on Cassandra its underlying data store, which is a horizontally scalable, fault-tolerant open-source database that is often used to store time-series data for Big Data applications (though integration with other NoSQL or rational databases is possible). With this foundation, Kinota can scale to store data from an arbitrary number of sensors collecting data every 500 milliseconds. Additionally, Kinota architecture is very modular allowing for customization by adopters who can choose to replace parts of the existing implementation when desirable. The architecture is also highly portable providing the flexibility to choose between cloud providers like azure, amazon, google etc. The scalable, flexible and cloud friendly architecture of Kinota makes it ideal for use in next-generation large-scale and high-resolution real-time environmental monitoring networks used in domains such as hydrology, geomorphology, and geophysics, as well as management applications such as flood early warning, and regulatory enforcement.
A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniquesmore » in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.« less
Electrical control of a solid-state flying qubit.
Yamamoto, Michihisa; Takada, Shintaro; Bäuerle, Christopher; Watanabe, Kenta; Wieck, Andreas D; Tarucha, Seigo
2012-03-18
Solid-state approaches to quantum information technology are attractive because they are scalable. The coherent transport of quantum information over large distances is a requirement for any practical quantum computer and has been demonstrated by coupling super-conducting qubits to photons. Single electrons have also been transferred between distant quantum dots in times shorter than their spin coherence time. However, until now, there have been no demonstrations of scalable 'flying qubit' architectures-systems in which it is possible to perform quantum operations on qubits while they are being coherently transferred-in solid-state systems. These architectures allow for control over qubit separation and for non-local entanglement, which makes them more amenable to integration and scaling than static qubit approaches. Here, we report the transport and manipulation of qubits over distances of 6 µm within 40 ps, in an Aharonov-Bohm ring connected to two-channel wires that have a tunable tunnel coupling between channels. The flying qubit state is defined by the presence of a travelling electron in either channel of the wire, and can be controlled without a magnetic field. Our device has shorter quantum gates (<1 µm), longer coherence lengths (∼86 µm at 70 mK) and higher operating frequencies (∼100 GHz) than other solid-state implementations of flying qubits.
He, Yugui; Feng, Jiwen; Zhang, Zhi; Wang, Chao; Wang, Dong; Chen, Fang; Liu, Maili; Liu, Chaoyang
2015-08-01
High sensitivity, high data rates, fast pulses, and accurate synchronization all represent challenges for modern nuclear magnetic resonance spectrometers, which make any expansion or adaptation of these devices to new techniques and experiments difficult. Here, we present a Peripheral Component Interconnect Express (PCIe)-based highly integrated distributed digital architecture pulsed spectrometer that is implemented with electron and nucleus double resonances and is scalable specifically for broad dynamic nuclear polarization (DNP) enhancement applications, including DNP-magnetic resonance spectroscopy/imaging (DNP-MRS/MRI). The distributed modularized architecture can implement more transceiver channels flexibly to meet a variety of MRS/MRI instrumentation needs. The proposed PCIe bus with high data rates can significantly improve data transmission efficiency and communication reliability and allow precise control of pulse sequences. An external high speed double data rate memory chip is used to store acquired data and pulse sequence elements, which greatly accelerates the execution of the pulse sequence, reduces the TR (time of repetition) interval, and improves the accuracy of TR in imaging sequences. Using clock phase-shift technology, we can produce digital pulses accurately with high timing resolution of 1 ns and narrow widths of 4 ns to control the microwave pulses required by pulsed DNP and ensure overall system synchronization. The proposed spectrometer is proved to be both feasible and reliable by observation of a maximum signal enhancement factor of approximately -170 for (1)H, and a high quality water image was successfully obtained by DNP-enhanced spin-echo (1)H MRI at 0.35 T.
Miniature EVA Software Defined Radio
NASA Technical Reports Server (NTRS)
Pozhidaev, Aleksey
2012-01-01
As NASA embarks upon developing the Next-Generation Extra Vehicular Activity (EVA) Radio for deep space exploration, the demands on EVA battery life will substantially increase. The number of modes and frequency bands required will continue to grow in order to enable efficient and complex multi-mode operations including communications, navigation, and tracking applications. Whether conducting astronaut excursions, communicating to soldiers, or first responders responding to emergency hazards, NASA has developed an innovative, affordable, miniaturized, power-efficient software defined radio that offers unprecedented power-efficient flexibility. This lightweight, programmable, S-band, multi-service, frequency- agile EVA software defined radio (SDR) supports data, telemetry, voice, and both standard and high-definition video. Features include a modular design, an easily scalable architecture, and the EVA SDR allows for both stationary and mobile battery powered handheld operations. Currently, the radio is equipped with an S-band RF section. However, its scalable architecture can accommodate multiple RF sections simultaneously to cover multiple frequency bands. The EVA SDR also supports multiple network protocols. It currently implements a Hybrid Mesh Network based on the 802.11s open standard protocol. The radio targets RF channel data rates up to 20 Mbps and can be equipped with a real-time operating system (RTOS) that can be switched off for power-aware applications. The EVA SDR's modular design permits implementation of the same hardware at all Network Nodes concept. This approach assures the portability of the same software into any radio in the system. It also brings several benefits to the entire system including reducing system maintenance, system complexity, and development cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)
He, Yugui; Liu, Chaoyang, E-mail: chyliu@wipm.ac.cn; State Key Laboratory of Magnet Resonance and Atomic and Molecular Physics, Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071
2015-08-15
High sensitivity, high data rates, fast pulses, and accurate synchronization all represent challenges for modern nuclear magnetic resonance spectrometers, which make any expansion or adaptation of these devices to new techniques and experiments difficult. Here, we present a Peripheral Component Interconnect Express (PCIe)-based highly integrated distributed digital architecture pulsed spectrometer that is implemented with electron and nucleus double resonances and is scalable specifically for broad dynamic nuclear polarization (DNP) enhancement applications, including DNP-magnetic resonance spectroscopy/imaging (DNP-MRS/MRI). The distributed modularized architecture can implement more transceiver channels flexibly to meet a variety of MRS/MRI instrumentation needs. The proposed PCIe bus with highmore » data rates can significantly improve data transmission efficiency and communication reliability and allow precise control of pulse sequences. An external high speed double data rate memory chip is used to store acquired data and pulse sequence elements, which greatly accelerates the execution of the pulse sequence, reduces the TR (time of repetition) interval, and improves the accuracy of TR in imaging sequences. Using clock phase-shift technology, we can produce digital pulses accurately with high timing resolution of 1 ns and narrow widths of 4 ns to control the microwave pulses required by pulsed DNP and ensure overall system synchronization. The proposed spectrometer is proved to be both feasible and reliable by observation of a maximum signal enhancement factor of approximately −170 for {sup 1}H, and a high quality water image was successfully obtained by DNP-enhanced spin-echo {sup 1}H MRI at 0.35 T.« less
Addressable single-spin control in multiple quantum dots coupled in series
NASA Astrophysics Data System (ADS)
Nakajima, Takashi
2015-03-01
Electron spin in semiconductor quantum dots (QDs) is promising building block of quantum computers for its controllability and potential scalability. Recent experiments on GaAs QDs have demonstrated necessary ingredients of universal quantum gate operations: single-spin rotations by electron spin resonance (ESR) which is virtually free from the effect of nuclear spin fluctuation, and pulsed control of two-spin entanglement. The scalability of this architecture, however, has remained to be demonstrated in the real world. In this talk, we will present our recent results on implementing single-spin-based qubits in triple, quadruple, and quintuple QDs based on a series coupled architecture defined by gate electrodes. Deterministic initialization of individual spin states and spin-state readout were performed by the pulse operation of detuning between two neighboring QDs. The spin state was coherently manipulated by ESR, where each spin in different QDs is addressed by the shift of the resonance frequency due to the inhomogeneous magnetic field induced by the micro magnet deposited on top of the QDs. Control of two-spin entanglement was also demonstrated. We will discuss key issues for implementing quantum algorithms based on three or more qubits, including the effect of a nuclear spin bath, single-shot readout fidelity, and tuning of multiple qubit devices. Our approaches to these issues will be also presented. This research is supported by Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST) from JSPS, IARPA project ``Multi-Qubit Coherent Operations'' through Copenhagen University, and Grant-in-Aid for Scientific Research from JSPS.
Modular Universal Scalable Ion-trap Quantum Computer
2016-06-02
SECURITY CLASSIFICATION OF: The main goal of the original MUSIQC proposal was to construct and demonstrate a modular and universally- expandable ion...Distribution Unlimited UU UU UU UU 02-06-2016 1-Aug-2010 31-Jan-2016 Final Report: Modular Universal Scalable Ion-trap Quantum Computer The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Ion trap quantum computation, scalable modular architectures REPORT DOCUMENTATION PAGE 11
A quantum annealing architecture with all-to-all connectivity from local interactions.
Lechner, Wolfgang; Hauke, Philipp; Zoller, Peter
2015-10-01
Quantum annealers are physical devices that aim at solving NP-complete optimization problems by exploiting quantum mechanics. The basic principle of quantum annealing is to encode the optimization problem in Ising interactions between quantum bits (qubits). A fundamental challenge in building a fully programmable quantum annealer is the competing requirements of full controllable all-to-all connectivity and the quasi-locality of the interactions between physical qubits. We present a scalable architecture with full connectivity, which can be implemented with local interactions only. The input of the optimization problem is encoded in local fields acting on an extended set of physical qubits. The output is-in the spirit of topological quantum memories-redundantly encoded in the physical qubits, resulting in an intrinsic fault tolerance. Our model can be understood as a lattice gauge theory, where long-range interactions are mediated by gauge constraints. The architecture can be realized on various platforms with local controllability, including superconducting qubits, NV-centers, quantum dots, and atomic systems.
A quantum annealing architecture with all-to-all connectivity from local interactions
Lechner, Wolfgang; Hauke, Philipp; Zoller, Peter
2015-01-01
Quantum annealers are physical devices that aim at solving NP-complete optimization problems by exploiting quantum mechanics. The basic principle of quantum annealing is to encode the optimization problem in Ising interactions between quantum bits (qubits). A fundamental challenge in building a fully programmable quantum annealer is the competing requirements of full controllable all-to-all connectivity and the quasi-locality of the interactions between physical qubits. We present a scalable architecture with full connectivity, which can be implemented with local interactions only. The input of the optimization problem is encoded in local fields acting on an extended set of physical qubits. The output is—in the spirit of topological quantum memories—redundantly encoded in the physical qubits, resulting in an intrinsic fault tolerance. Our model can be understood as a lattice gauge theory, where long-range interactions are mediated by gauge constraints. The architecture can be realized on various platforms with local controllability, including superconducting qubits, NV-centers, quantum dots, and atomic systems. PMID:26601316
Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...
2015-07-14
In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
Seebregts, Christopher; Dane, Pierre; Parsons, Annie Neo; Fogwill, Thomas; Rogers, Debbie; Bekker, Marcha; Shaw, Vincent; Barron, Peter
2018-01-01
MomConnect is a national initiative coordinated by the South African National Department of Health that sends text-based mobile phone messages free of charge to pregnant women who voluntarily register at any public healthcare facility in South Africa. We describe the system design and architecture of the MomConnect technical platform, planned as a nationally scalable and extensible initiative. It uses a health information exchange that can connect any standards-compliant electronic front-end application to any standards-compliant electronic back-end database. The implementation of the MomConnect technical platform, in turn, is a national reference application for electronic interoperability in line with the South African National Health Normative Standards Framework. The use of open content and messaging standards enables the architecture to include any application adhering to the selected standards. Its national implementation at scale demonstrates both the use of this technology and a key objective of global health information systems, which is to achieve implementation scale. The system’s limited clinical information, initially, allowed the architecture to focus on the base standards and profiles for interoperability in a resource-constrained environment with limited connectivity and infrastructural capacity. Maintenance of the system requires mobilisation of national resources. Future work aims to use the standard interfaces to include data from additional applications as well as to extend and interface the framework with other public health information systems in South Africa. The development of this platform has also shown the benefits of interoperability at both an organisational and technical level in South Africa. PMID:29713506
Seebregts, Christopher; Dane, Pierre; Parsons, Annie Neo; Fogwill, Thomas; Rogers, Debbie; Bekker, Marcha; Shaw, Vincent; Barron, Peter
2018-01-01
MomConnect is a national initiative coordinated by the South African National Department of Health that sends text-based mobile phone messages free of charge to pregnant women who voluntarily register at any public healthcare facility in South Africa. We describe the system design and architecture of the MomConnect technical platform, planned as a nationally scalable and extensible initiative. It uses a health information exchange that can connect any standards-compliant electronic front-end application to any standards-compliant electronic back-end database. The implementation of the MomConnect technical platform, in turn, is a national reference application for electronic interoperability in line with the South African National Health Normative Standards Framework. The use of open content and messaging standards enables the architecture to include any application adhering to the selected standards. Its national implementation at scale demonstrates both the use of this technology and a key objective of global health information systems, which is to achieve implementation scale. The system's limited clinical information, initially, allowed the architecture to focus on the base standards and profiles for interoperability in a resource-constrained environment with limited connectivity and infrastructural capacity. Maintenance of the system requires mobilisation of national resources. Future work aims to use the standard interfaces to include data from additional applications as well as to extend and interface the framework with other public health information systems in South Africa. The development of this platform has also shown the benefits of interoperability at both an organisational and technical level in South Africa.
Adaptive packet switch with an optical core (demonstrator)
NASA Astrophysics Data System (ADS)
Abdo, Ahmad; Bishtein, Vadim; Clark, Stewart A.; Dicorato, Pino; Lu, David T.; Paredes, Sofia A.; Taebi, Sareh; Hall, Trevor J.
2004-11-01
A three-stage opto-electronic packet switch architecture is described consisting of a reconfigurable optical centre stage surrounded by two electronic buffering stages partitioned into sectors to ease memory contention. A Flexible Bandwidth Provision (FBP) algorithm, implemented on a soft-core processor, is used to change the configuration of the input sectors and optical centre stage to set up internal paths that will provide variable bandwidth to serve the traffic. The switch is modeled by a bipartite graph built from a service matrix, which is a function of the arriving traffic. The bipartite graph is decomposed by solving an edge-colouring problem and the resulting permutations are used to configure the switch. Simulation results show that this architecture exhibits a dramatic reduction of complexity and increased potential for scalability, at the price of only a modest spatial speed-up k, 1
Pak, JuGeon; Park, KeeHyun
2012-01-01
We propose a smart medication dispenser having a high degree of scalability and remote manageability. We construct the dispenser to have extensible hardware architecture for achieving scalability, and we install an agent program in it for achieving remote manageability. The dispenser operates as follows: when the real-time clock reaches the predetermined medication time and the user presses the dispense button at that time, the predetermined medication is dispensed from the medication dispensing tray (MDT). In the proposed dispenser, the medication for each patient is stored in an MDT. One smart medication dispenser contains mainly one MDT; however, the dispenser can be extended to include more MDTs in order to support multiple users using one dispenser. For remote management, the proposed dispenser transmits the medication status and the system configurations to the monitoring server. In the case of a specific event such as a shortage of medication, memory overload, software error, or non-adherence, the event is transmitted immediately. All these operations are performed automatically without the intervention of patients, through the agent program installed in the dispenser. Results of implementation and verification show that the proposed dispenser operates normally and performs the management operations from the medication monitoring server suitably.
High Intensity Laser Power Beaming Architecture for Space and Terrestrial Missions
NASA Technical Reports Server (NTRS)
Nayfeh, Taysir; Fast, Brian; Raible, Daniel; Dinca, Dragos; Tollis, Nick; Jalics, Andrew
2011-01-01
High Intensity Laser Power Beaming (HILPB) has been developed as a technique to achieve Wireless Power Transmission (WPT) for both space and terrestrial applications. In this paper, the system architecture and hardware results for a terrestrial application of HILPB are presented. These results demonstrate continuous conversion of high intensity optical energy at near-IR wavelengths directly to electrical energy at output power levels as high as 6.24 W from the single cell 0.8 cm2 aperture receiver. These results are scalable, and may be realized by implementing receiver arraying and utilizing higher power source lasers. This type of system would enable long range optical refueling of electric platforms, such as MUAV s, airships, robotic exploration missions and provide power to spacecraft platforms which may utilize it to drive electric means of propulsion.
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Heber, Gerd; Biswas, Rupak
2000-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Scalability improvements to NRLMOL for DFT calculations of large molecules
NASA Astrophysics Data System (ADS)
Diaz, Carlos Manuel
Advances in high performance computing (HPC) have provided a way to treat large, computationally demanding tasks using thousands of processors. With the development of more powerful HPC architectures, the need to create efficient and scalable code has grown more important. Electronic structure calculations are valuable in understanding experimental observations and are routinely used for new materials predictions. For the electronic structure calculations, the memory and computation time are proportional to the number of atoms. Memory requirements for these calculations scale as N2, where N is the number of atoms. While the recent advances in HPC offer platforms with large numbers of cores, the limited amount of memory available on a given node and poor scalability of the electronic structure code hinder their efficient usage of these platforms. This thesis will present some developments to overcome these bottlenecks in order to study large systems. These developments, which are implemented in the NRLMOL electronic structure code, involve the use of sparse matrix storage formats and the use of linear algebra using sparse and distributed matrices. These developments along with other related development now allow ground state density functional calculations using up to 25,000 basis functions and the excited state calculations using up to 17,000 basis functions while utilizing all cores on a node. An example on a light-harvesting triad molecule is described. Finally, future plans to further improve the scalability will be presented.
NASA Astrophysics Data System (ADS)
Xu, Boyi; Xu, Li Da; Fei, Xiang; Jiang, Lihong; Cai, Hongming; Wang, Shuai
2017-08-01
Facing the rapidly changing business environments, implementation of flexible business process is crucial, but difficult especially in data-intensive application areas. This study aims to provide scalable and easily accessible information resources to leverage business process management. In this article, with a resource-oriented approach, enterprise data resources are represented as data-centric Web services, grouped on-demand of business requirement and configured dynamically to adapt to changing business processes. First, a configurable architecture CIRPA involving information resource pool is proposed to act as a scalable and dynamic platform to virtualise enterprise information resources as data-centric Web services. By exposing data-centric resources as REST services in larger granularities, tenant-isolated information resources could be accessed in business process execution. Second, dynamic information resource pool is designed to fulfil configurable and on-demand data accessing in business process execution. CIRPA also isolates transaction data from business process while supporting diverse business processes composition. Finally, a case study of using our method in logistics application shows that CIRPA provides an enhanced performance both in static service encapsulation and dynamic service execution in cloud computing environment.
Rezaeibagha, Fatemeh; Win, Khin Than; Susilo, Willy
Even though many safeguards and policies for electronic health record (EHR) security have been implemented, barriers to the privacy and security protection of EHR systems persist. This article presents the results of a systematic literature review regarding frequently adopted security and privacy technical features of EHR systems. Our inclusion criteria were full articles that dealt with the security and privacy of technical implementations of EHR systems published in English in peer-reviewed journals and conference proceedings between 1998 and 2013; 55 selected studies were reviewed in detail. We analysed the review results using two International Organization for Standardization (ISO) standards (29100 and 27002) in order to consolidate the study findings. Using this process, we identified 13 features that are essential to security and privacy in EHRs. These included system and application access control, compliance with security requirements, interoperability, integration and sharing, consent and choice mechanism, policies and regulation, applicability and scalability and cryptography techniques. This review highlights the importance of technical features, including mandated access control policies and consent mechanisms, to provide patients' consent, scalability through proper architecture and frameworks, and interoperability of health information systems, to EHR security and privacy requirements.
The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Tai, Ann T.
2000-01-01
The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Draeger, Erik W.
This report documents the fact that the work in creating a strategic plan and beginning customer engagements has been completed. The description of milestone is: The newly formed advanced architecture and portability specialists (AAPS) team will develop a strategic plan to meet the goals of 1) sharing knowledge and experience with code teams to ensure that ASC codes run well on new architectures, and 2) supplying skilled computational scientists to put the strategy into practice. The plan will be delivered to ASC management in the first quarter. By the fourth quarter, the team will identify their first customers within PEMmore » and IC, perform an initial assessment and scalability and performance bottleneck for next-generation architectures, and embed AAPS team members with customer code teams to assist with initial portability development within standalone kernels or proxy applications.« less
Jeong, Seol Young; Jo, Hyeong Gon; Kang, Soon Ju
2014-01-01
A tracking service like asset management is essential in a dynamic hospital environment consisting of numerous mobile assets (e.g., wheelchairs or infusion pumps) that are continuously relocated throughout a hospital. The tracking service is accomplished based on the key technologies of an indoor location-based service (LBS), such as locating and monitoring multiple mobile targets inside a building in real time. An indoor LBS such as a tracking service entails numerous resource lookups being requested concurrently and frequently from several locations, as well as a network infrastructure requiring support for high scalability in indoor environments. A traditional centralized architecture needs to maintain a geographic map of the entire building or complex in its central server, which can cause low scalability and traffic congestion. This paper presents a self-organizing and fully distributed indoor mobile asset management (MAM) platform, and proposes an architecture for multiple trackees (such as mobile assets) and trackers based on the proposed distributed platform in real time. In order to verify the suggested platform, scalability performance according to increases in the number of concurrent lookups was evaluated in a real test bed. Tracking latency and traffic load ratio in the proposed tracking architecture was also evaluated. PMID:24662407
Slices: A Scalable Partitioner for Finite Element Meshes
NASA Technical Reports Server (NTRS)
Ding, H. Q.; Ferraro, R. D.
1995-01-01
A parallel partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The element based partitioner can handle mixtures of different element types. All algorithms adopted in the partitioner are scalable, including a communication template for unpredictable incoming messages, as shown in actual timing measurements.
NASA Astrophysics Data System (ADS)
Tysowski, Piotr K.; Ling, Xinhua; Lütkenhaus, Norbert; Mosca, Michele
2018-04-01
Quantum key distribution (QKD) is a means of generating keys between a pair of computing hosts that is theoretically secure against cryptanalysis, even by a quantum computer. Although there is much active research into improving the QKD technology itself, there is still significant work to be done to apply engineering methodology and determine how it can be practically built to scale within an enterprise IT environment. Significant challenges exist in building a practical key management service (KMS) for use in a metropolitan network. QKD is generally a point-to-point technique only and is subject to steep performance constraints. The integration of QKD into enterprise-level computing has been researched, to enable quantum-safe communication. A novel method for constructing a KMS is presented that allows arbitrary computing hosts on one site to establish multiple secure communication sessions with the hosts of another site. A key exchange protocol is proposed where symmetric private keys are granted to hosts while satisfying the scalability needs of an enterprise population of users. The KMS operates within a layered architectural style that is able to interoperate with various underlying QKD implementations. Variable levels of security for the host population are enforced through a policy engine. A network layer provides key generation across a network of nodes connected by quantum links. Scheduling and routing functionality allows quantum key material to be relayed across trusted nodes. Optimizations are performed to match the real-time host demand for key material with the capacity afforded by the infrastructure. The result is a flexible and scalable architecture that is suitable for enterprise use and independent of any specific QKD technology.
A context management system for a cost-efficient smart home platform
NASA Astrophysics Data System (ADS)
Schneider, J.; Klein, A.; Mannweiler, C.; Schotten, H. D.
2012-09-01
This paper presents an overview of state-of-the-art architectures for integrating wireless sensor and actuators networks into the Future Internet. Furthermore, we will address advantages and disadvantages of the different architectures. With respect to these criteria, we develop a new architecture overcoming these weaknesses. Our system, called Smart Home Context Management System, will be used for intelligent home utilities, appliances, and electronics and includes physical, logical as well as network context sources within one concept. It considers important aspects and requirements of modern context management systems for smart X applications: plug and play as well as plug and trust capabilities, scalability, extensibility, security, and adaptability. As such, it is able to control roller blinds, heating systems as well as learn, for example, the user's taste w.r.t. to home entertainment (music, videos, etc.). Moreover, Smart Grid applications and Ambient Assisted Living (AAL) functions are applicable. With respect to AAL, we included an Emergency Handling function. It assures that emergency calls (police, ambulance or fire department) are processed appropriately. Our concept is based on a centralized Context Broker architecture, enhanced by a distributed Context Broker system. The goal of this concept is to develop a simple, low-priced, multi-functional, and save architecture affordable for everybody. Individual components of the architecture are well tested. Implementation and testing of the architecture as a whole is in progress.
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
NASA Astrophysics Data System (ADS)
de Schryver, C.; Weithoffer, S.; Wasenmüller, U.; Wehn, N.
2012-09-01
Channel coding is a standard technique in all wireless communication systems. In addition to the typically employed methods like convolutional coding, turbo coding or low density parity check (LDPC) coding, algebraic codes are used in many cases. For example, outer BCH coding is applied in the DVB-S2 standard for satellite TV broadcasting. A key operation for BCH and the related Reed-Solomon codes are multiplications in finite fields (Galois Fields), where extension fields of prime fields are used. A lot of architectures for multiplications in finite fields have been published over the last decades. This paper examines four different multiplier architectures in detail that offer the potential for very high throughputs. We investigate the implementation performance of these multipliers on FPGA technology in the context of channel coding. We study the efficiency of the multipliers with respect to area, frequency and throughput, as well as configurability and scalability. The implementation data of the fully verified circuits are provided for a Xilinx Virtex-4 device after place and route.
Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F
2002-02-01
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Architectural approaches for HL7-based health information systems implementation.
López, D M; Blobel, B
2010-01-01
Information systems integration is hard, especially when semantic and business process interoperability requirements need to be met. To succeed, a unified methodology, approaching different aspects of systems architecture such as business, information, computational, engineering and technology viewpoints, has to be considered. The paper contributes with an analysis and demonstration on how the HL7 standard set can support health information systems integration. Based on the Health Information Systems Development Framework (HIS-DF), common architectural models for HIS integration are analyzed. The framework is a standard-based, consistent, comprehensive, customizable, scalable methodology that supports the design of semantically interoperable health information systems and components. Three main architectural models for system integration are analyzed: the point to point interface, the messages server and the mediator models. Point to point interface and messages server models are completely supported by traditional HL7 version 2 and version 3 messaging. The HL7 v3 standard specification, combined with service-oriented, model-driven approaches provided by HIS-DF, makes the mediator model possible. The different integration scenarios are illustrated by describing a proof-of-concept implementation of an integrated public health surveillance system based on Enterprise Java Beans technology. Selecting the appropriate integration architecture is a fundamental issue of any software development project. HIS-DF provides a unique methodological approach guiding the development of healthcare integration projects. The mediator model - offered by the HIS-DF and supported in HL7 v3 artifacts - is the more promising one promoting the development of open, reusable, flexible, semantically interoperable, platform-independent, service-oriented and standard-based health information systems.
Coupled Physics Environment (CouPE) library - Design, Implementation, and Release
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mahadevan, Vijay S.
Over several years, high fidelity, validated mono-physics solvers with proven scalability on peta-scale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a unified mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. In this report, we present details on the design decisions and developments on CouPE, an acronym that stands for Coupled Physics Environment that orchestrates a coupled physics solver through the interfaces exposed by MOAB array-based unstructured mesh, both of which are part of SIGMA (Scalable Interfaces for Geometry and Mesh-Based Applications) toolkit.more » The SIGMA toolkit contains libraries that enable scalable geometry and unstructured mesh creation and handling in a memory and computationally efficient implementation. The CouPE version being prepared for a full open-source release along with updated documentation will contain several useful examples that will enable users to start developing their applications natively using the native MOAB mesh and couple their models to existing physics applications to analyze and solve real world problems of interest. An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is also being investigated as part of the NEAMS RPL, to tightly couple neutron transport, thermal-hydraulics and structural mechanics physics under the SHARP framework. This report summarizes the efforts that have been invested in CouPE to bring together several existing physics applications namely PROTEUS (neutron transport code), Nek5000 (computational fluid-dynamics code) and Diablo (structural mechanics code). The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in order to reduce the overall numerical uncertainty while leveraging available computational resources. The design of CouPE along with motivations that led to implementation choices are also discussed. The first release of the library will be different from the current version of the code that integrates the components in SHARP and explanation on the need for forking the source base will also be provided. Enhancements in the functionality and improved user guides will be available as part of the release. CouPE v0.1 is scheduled for an open-source release in December 2014 along with SIGMA v1.1 components that provide support for language-agnostic mesh loading, traversal and query interfaces along with scalable solution transfer of fields between different physics codes. The coupling methodology and software interfaces of the library are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the CouPE library.« less
Merolla, Paul A; Arthur, John V; Alvarez-Icaza, Rodrigo; Cassidy, Andrew S; Sawada, Jun; Akopyan, Filipp; Jackson, Bryan L; Imam, Nabil; Guo, Chen; Nakamura, Yutaka; Brezzo, Bernard; Vo, Ivan; Esser, Steven K; Appuswamy, Rathinakumar; Taba, Brian; Amir, Arnon; Flickner, Myron D; Risk, William P; Manohar, Rajit; Modha, Dharmendra S
2014-08-08
Inspired by the brain's structure, we have developed an efficient, scalable, and flexible non-von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts. Copyright © 2014, American Association for the Advancement of Science.
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unifiedmore » metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation commitment to a multi-year project for follow-on work.« less
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
A High-Speed Design of Montgomery Multiplier
NASA Astrophysics Data System (ADS)
Fan, Yibo; Ikenaga, Takeshi; Goto, Satoshi
With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25μm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180MHz and the throughput of 1024-bit RSA encryption is 352kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Bay, Hamed Hosseini; Patino, Daisy; Mutlu, Zafer; Romero, Paige; Ozkan, Mihrimah; Ozkan, Cengiz S.
2016-01-01
Water decontamination and oil/water separation are principal motives in the surge to develop novel means for sustainability. In this prospect, supplying clean water for the ecosystems is as important as the recovery of the oil spills since the supplies are scarce. Inspired to design an engineering material which not only serves this purpose, but can also be altered for other applications to preserve natural resources, a facile template-free process is suggested to fabricate a superporous, superhydrophobic ultra-thin graphite sponge. Moreover, the process is designed to be inexpensive and scalable. The fabricated sponge can be used to clean up different types of oil, organic solvents, toxic and corrosive contaminants. This versatile microstructure can retain its functionality even when pulverized. The sponge is applicable for targeted sorption and collection due to its ferromagnetic properties. We hope that such a cost-effective process can be embraced and implemented widely. PMID:26908346
Field-free junctions for surface electrode ion traps
NASA Astrophysics Data System (ADS)
Jordens, Robert; Schmied, R.; Blain, M. G.; Leibfried, D.; Wineland, D.
2015-05-01
Intersections between transport guides in a network of RF ion traps are a key ingredient to many implementations of scalable quantum information processing with trapped ions. Several junction architectures demonstrated so far are limited by varying radial secular frequencies, a reduced trap depth, or a non-vanishing RF field along the transport channel. We report on the design and progress in implementing a configurable microfabricated surface electrode Y-junction that employs switchable RF electrodes. An essentially RF-field-free pseudopotential guide between any two legs of the junction can be established by applying RF potential to a suitable pair of electrodes. The transport channel's height above the electrodes, its depth and radial curvature are constant to within 15%. Supported by IARPA, Sandia, NSA, ONR, and the NIST Quantum Information Program.
FPGA-based coprocessor for matrix algorithms implementation
NASA Astrophysics Data System (ADS)
Amira, Abbes; Bensaali, Faycal
2003-03-01
Matrix algorithms are important in many types of applications including image and signal processing. These areas require enormous computing power. A close examination of the algorithms used in these, and related, applications reveals that many of the fundamental actions involve matrix operations such as matrix multiplication which is of O (N3) on a sequential computer and O (N3/p) on a parallel system with p processors complexity. This paper presents an investigation into the design and implementation of different matrix algorithms such as matrix operations, matrix transforms and matrix decompositions using an FPGA based environment. Solutions for the problem of processing large matrices have been proposed. The proposed system architectures are scalable, modular and require less area and time complexity with reduced latency when compared with existing structures.
CORAL Server and CORAL Server Proxy: Scalable Access to Relational Databases from CORAL Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valassi, A.; /CERN; Bartoldus, R.
The CORAL software is widely used at CERN by the LHC experiments to access the data they store on relational databases, such as Oracle. Two new components have recently been added to implement a model involving a middle tier 'CORAL server' deployed close to the database and a tree of 'CORAL server proxies', providing data caching and multiplexing, deployed close to the client. A first implementation of the two new components, released in the summer 2009, is now deployed in the ATLAS online system to read the data needed by the High Level Trigger, allowing the configuration of a farmmore » of several thousand processes. This paper reviews the architecture of the software, its development status and its usage in ATLAS.« less
Matrix multiplication on the Intel Touchstone Delta
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huss-Lederman, S.; Jacobson, E.M.; Tsao, A.
1993-12-31
Matrix multiplication is a key primitive in block matrix algorithms such as those found in LAPACK. We present results from our study of matrix multiplication algorithms on the Intel Touchstone Delta, a distributed memory message-passing architecture with a two-dimensional mesh topology. We obtain an implementation that uses communication primitives highly suited to the Delta and exploits the single node assembly-coded matrix multiplication. Our algorithm is completely general, able to deal with arbitrary mesh aspect ratios and matrix dimensions, and has achieved parallel efficiency of 86% with overall peak performance in excess of 8 Gflops on 256 nodes for an 8800more » {times} 8800 matrix. We describe our algorithm design and implementation, and present performance results that demonstrate scalability and robust behavior over varying mesh topologies.« less
NASA Astrophysics Data System (ADS)
Zhang, Wenyu; Zhang, Shuai; Cai, Ming; Jian, Wu
2015-04-01
With the development of virtual enterprise (VE) paradigm, the usage of serviceoriented architecture (SOA) is increasingly being considered for facilitating the integration and utilisation of distributed manufacturing resources. However, due to the heterogeneous nature among VEs, the dynamic nature of a VE and the autonomous nature of each VE member, the lack of both sophisticated coordination mechanism in the popular centralised infrastructure and semantic expressivity in the existing SOA standards make the current centralised, syntactic service discovery method undesirable. This motivates the proposed agent-based peer-to-peer (P2P) architecture for semantic discovery of manufacturing services across VEs. Multi-agent technology provides autonomous and flexible problemsolving capabilities in dynamic and adaptive VE environments. Peer-to-peer overlay provides highly scalable coupling across decentralised VEs, each of which exhibiting as a peer composed of multiple agents dealing with manufacturing services. The proposed architecture utilises a novel, efficient, two-stage search strategy - semantic peer discovery and semantic service discovery - to handle the complex searches of manufacturing services across VEs through fast peer filtering. The operation and experimental evaluation of the prototype system are presented to validate the implementation of the proposed approach.
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceriani, Marco; Palermo, Gianluca; Secchi, Simone
We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies
Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco
2014-01-01
Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397
An Immersive VR System for Sports Education
NASA Astrophysics Data System (ADS)
Song, Peng; Xu, Shuhong; Fong, Wee Teck; Chin, Ching Ling; Chua, Gim Guan; Huang, Zhiyong
The development of new technologies has undoubtedly promoted the advances of modern education, among which Virtual Reality (VR) technologies have made the education more visually accessible for students. However, classroom education has been the focus of VR applications whereas not much research has been done in promoting sports education using VR technologies. In this paper, an immersive VR system is designed and implemented to create a more intuitive and visual way of teaching tennis. A scalable system architecture is proposed in addition to the hardware setup layout, which can be used for various immersive interactive applications such as architecture walkthroughs, military training simulations, other sports game simulations, interactive theaters, and telepresent exhibitions. Realistic interaction experience is achieved through accurate and robust hybrid tracking technology, while the virtual human opponent is animated in real time using shader-based skin deformation. Potential future extensions are also discussed to improve the teaching/learning experience.
Chromium: A Stress-Processing Framework for Interactive Rendering on Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humphreys, G,; Houston, M.; Ng, Y.-R.
2002-01-11
We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromium's stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications thatmore » use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.« less
A generic interface to reduce the efficiency-stability-cost gap of perovskite solar cells
NASA Astrophysics Data System (ADS)
Hou, Yi; Du, Xiaoyan; Scheiner, Simon; McMeekin, David P.; Wang, Zhiping; Li, Ning; Killian, Manuela S.; Chen, Haiwei; Richter, Moses; Levchuk, Ievgen; Schrenker, Nadine; Spiecker, Erdmann; Stubhan, Tobias; Luechinger, Norman A.; Hirsch, Andreas; Schmuki, Patrik; Steinrück, Hans-Peter; Fink, Rainer H.; Halik, Marcus; Snaith, Henry J.; Brabec, Christoph J.
2017-12-01
A major bottleneck delaying the further commercialization of thin-film solar cells based on hybrid organohalide lead perovskites is interface loss in state-of-the-art devices. We present a generic interface architecture that combines solution-processed, reliable, and cost-efficient hole-transporting materials without compromising efficiency, stability, or scalability of perovskite solar cells. Tantalum-doped tungsten oxide (Ta-WOx)/conjugated polymer multilayers offer a surprisingly small interface barrier and form quasi-ohmic contacts universally with various scalable conjugated polymers. In a simple device with regular planar architecture and a self-assembled monolayer, Ta-WOx-doped interface-based perovskite solar cells achieve maximum efficiencies of 21.2% and offer more than 1000 hours of light stability. By eliminating additional ionic dopants, these findings open up the entire class of organics as scalable hole-transporting materials for perovskite solar cells.
Scalable Light Module for Low-Cost, High-Efficiency Light- Emitting Diode Luminaires
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tarsa, Eric
2015-08-31
During this two-year program Cree developed a scalable, modular optical architecture for low-cost, high-efficacy light emitting diode (LED) luminaires. Stated simply, the goal of this architecture was to efficiently and cost-effectively convey light from LEDs (point sources) to broad luminaire surfaces (area sources). By simultaneously developing warm-white LED components and low-cost, scalable optical elements, a high system optical efficiency resulted. To meet program goals, Cree evaluated novel approaches to improve LED component efficacy at high color quality while not sacrificing LED optical efficiency relative to conventional packages. Meanwhile, efficiently coupling light from LEDs into modular optical elements, followed by optimallymore » distributing and extracting this light, were challenges that were addressed via novel optical design coupled with frequent experimental evaluations. Minimizing luminaire bill of materials and assembly costs were two guiding principles for all design work, in the effort to achieve luminaires with significantly lower normalized cost ($/klm) than existing LED fixtures. Chief project accomplishments included the achievement of >150 lm/W warm-white LEDs having primary optics compatible with low-cost modular optical elements. In addition, a prototype Light Module optical efficiency of over 90% was measured, demonstrating the potential of this scalable architecture for ultra-high-efficacy LED luminaires. Since the project ended, Cree has continued to evaluate optical element fabrication and assembly methods in an effort to rapidly transfer this scalable, cost-effective technology to Cree production development groups. The Light Module concept is likely to make a strong contribution to the development of new cost-effective, high-efficacy luminaries, thereby accelerating widespread adoption of energy-saving SSL in the U.S.« less
AsyncStageOut: Distributed user data management for CMS Analysis
NASA Astrophysics Data System (ADS)
Riahi, H.; Wildish, T.; Ciangottini, D.; Hernández, J. M.; Andreeva, J.; Balcas, J.; Karavakis, E.; Mascheroni, M.; Tanasijczuk, A. J.; Vaandering, E. W.
2015-12-01
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirements for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.
AsyncStageOut: Distributed User Data Management for CMS Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riahi, H.; Wildish, T.; Ciangottini, D.
2015-12-23
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirementsmore » for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.« less
SME2EM: Smart mobile end-to-end monitoring architecture for life-long diseases.
Serhani, Mohamed Adel; Menshawy, Mohamed El; Benharref, Abdelghani
2016-01-01
Monitoring life-long diseases requires continuous measurements and recording of physical vital signs. Most of these diseases are manifested through unexpected and non-uniform occurrences and behaviors. It is impractical to keep patients in hospitals, health-care institutions, or even at home for long periods of time. Monitoring solutions based on smartphones combined with mobile sensors and wireless communication technologies are a potential candidate to support complete mobility-freedom, not only for patients, but also for physicians. However, existing monitoring architectures based on smartphones and modern communication technologies are not suitable to address some challenging issues, such as intensive and big data, resource constraints, data integration, and context awareness in an integrated framework. This manuscript provides a novel mobile-based end-to-end architecture for live monitoring and visualization of life-long diseases. The proposed architecture provides smartness features to cope with continuous monitoring, data explosion, dynamic adaptation, unlimited mobility, and constrained devices resources. The integration of the architecture׳s components provides information about diseases׳ recurrences as soon as they occur to expedite taking necessary actions, and thus prevent severe consequences. Our architecture system is formally model-checked to automatically verify its correctness against designers׳ desirable properties at design time. Its components are fully implemented as Web services with respect to the SOA architecture to be easy to deploy and integrate, and supported by Cloud infrastructure and services to allow high scalability, availability of processes and data being stored and exchanged. The architecture׳s applicability is evaluated through concrete experimental scenarios on monitoring and visualizing states of epileptic diseases. The obtained theoretical and experimental results are very promising and efficiently satisfy the proposed architecture׳s objectives, including resource awareness, smart data integration and visualization, cost reduction, and performance guarantee. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cavity-Mediated Coherent Coupling between Distant Quantum Dots
NASA Astrophysics Data System (ADS)
Nicolí, Giorgio; Ferguson, Michael Sven; Rössler, Clemens; Wolfertz, Alexander; Blatter, Gianni; Ihn, Thomas; Ensslin, Klaus; Reichl, Christian; Wegscheider, Werner; Zilberberg, Oded
2018-06-01
Scalable architectures for quantum information technologies require one to selectively couple long-distance qubits while suppressing environmental noise and cross talk. In semiconductor materials, the coherent coupling of a single spin on a quantum dot to a cavity hosting fermionic modes offers a new solution to this technological challenge. Here, we demonstrate coherent coupling between two spatially separated quantum dots using an electronic cavity design that takes advantage of whispering-gallery modes in a two-dimensional electron gas. The cavity-mediated, long-distance coupling effectively minimizes undesirable direct cross talk between the dots and defines a scalable architecture for all-electronic semiconductor-based quantum information processing.
Space-Filling Supercapacitor Carpets: Highly scalable fractal architecture for energy storage
NASA Astrophysics Data System (ADS)
Tiliakos, Athanasios; Trefilov, Alexandra M. I.; Tanasǎ, Eugenia; Balan, Adriana; Stamatin, Ioan
2018-04-01
Revamping ground-breaking ideas from fractal geometry, we propose an alternative micro-supercapacitor configuration realized by laser-induced graphene (LIG) foams produced via laser pyrolysis of inexpensive commercial polymers. The Space-Filling Supercapacitor Carpet (SFSC) architecture introduces the concept of nested electrodes based on the pre-fractal Peano space-filling curve, arranged in a symmetrical equilateral setup that incorporates multiple parallel capacitor cells sharing common electrodes for maximum efficiency and optimal length-to-area distribution. We elucidate on the theoretical foundations of the SFSC architecture, and we introduce innovations (high-resolution vector-mode printing) in the LIG method that allow for the realization of flexible and scalable devices based on low iterations of the Peano algorithm. SFSCs exhibit distributed capacitance properties, leading to capacitance, energy, and power ratings proportional to the number of nested electrodes (up to 4.3 mF, 0.4 μWh, and 0.2 mW for the largest tested model of low iteration using aqueous electrolytes), with competitively high energy and power densities. This can pave the road for full scalability in energy storage, reaching beyond the scale of micro-supercapacitors for incorporating into larger and more demanding applications.
Advanced and secure architectural EHR approaches.
Blobel, Bernd
2006-01-01
Electronic Health Records (EHRs) provided as a lifelong patient record advance towards core applications of distributed and co-operating health information systems and health networks. For meeting the challenge of scalable, flexible, portable, secure EHR systems, the underlying EHR architecture must be based on the component paradigm and model driven, separating platform-independent and platform-specific models. Allowing manageable models, real systems must be decomposed and simplified. The resulting modelling approach has to follow the ISO Reference Model - Open Distributing Processing (RM-ODP). The ISO RM-ODP describes any system component from different perspectives. Platform-independent perspectives contain the enterprise view (business process, policies, scenarios, use cases), the information view (classes and associations) and the computational view (composition and decomposition), whereas platform-specific perspectives concern the engineering view (physical distribution and realisation) and the technology view (implementation details from protocols up to education and training) on system components. Those views have to be established for components reflecting aspects of all domains involved in healthcare environments including administrative, legal, medical, technical, etc. Thus, security-related component models reflecting all view mentioned have to be established for enabling both application and communication security services as integral part of the system's architecture. Beside decomposition and simplification of system regarding the different viewpoint on their components, different levels of systems' granularity can be defined hiding internals or focusing on properties of basic components to form a more complex structure. The resulting models describe both structure and behaviour of component-based systems. The described approach has been deployed in different projects defining EHR systems and their underlying architectural principles. In that context, the Australian GEHR project, the openEHR initiative, the revision of CEN ENV 13606 "Electronic Health Record communication", all based on Archetypes, but also the HL7 version 3 activities are discussed in some detail. The latter include the HL7 RIM, the HL7 Development Framework, the HL7's clinical document architecture (CDA) as well as the set of models from use cases, activity diagrams, sequence diagrams up to Domain Information Models (DMIMs) and their building blocks Common Message Element Types (CMET) Constraining Models to their underlying concepts. The future-proof EHR architecture as open, user-centric, user-friendly, flexible, scalable, portable core application in health information systems and health networks has to follow advanced architectural paradigms.
Iavindrasana, Jimison; Depeursinge, Adrien; Ruch, Patrick; Spahni, Stéphane; Geissbuhler, Antoine; Müller, Henning
2007-01-01
The diagnostic and therapeutic processes, as well as the development of new treatments, are hindered by the fragmentation of information which underlies them. In a multi-institutional research study database, the clinical information system (CIS) contains the primary data input. An important part of the money of large scale clinical studies is often paid for data creation and maintenance. The objective of this work is to design a decentralized, scalable, reusable database architecture with lower maintenance costs for managing and integrating distributed heterogeneous data required as basis for a large-scale research project. Technical and legal aspects are taken into account based on various use case scenarios. The architecture contains 4 layers: data storage and access are decentralized at their production source, a connector as a proxy between the CIS and the external world, an information mediator as a data access point and the client side. The proposed design will be implemented inside six clinical centers participating in the @neurIST project as part of a larger system on data integration and reuse for aneurism treatment.
MIDEX Advanced Modular and Distributed Spacecraft Avionics Architecture
NASA Technical Reports Server (NTRS)
Ruffa, John A.; Castell, Karen; Flatley, Thomas; Lin, Michael
1998-01-01
MIDEX (Medium Class Explorer) is the newest line in NASA's Explorer spacecraft development program. As part of the MIDEX charter, the MIDEX spacecraft development team has developed a new modular, distributed, and scaleable spacecraft architecture that pioneers new spaceflight technologies and implementation approaches, all designed to reduce overall spacecraft cost while increasing overall functional capability. This resultant "plug and play" system dramatically decreases the complexity and duration of spacecraft integration and test, providing a basic framework that supports spacecraft modularity and scalability for missions of varying size and complexity. Together, these subsystems form a modular, flexible avionics suite that can be modified and expanded to support low-end and very high-end mission requirements with a minimum of redesign, as well as allowing a smooth, continuous infusion of new technologies as they are developed without redesigning the system. This overall approach has the net benefit of allowing a greater portion of the overall mission budget to be allocated to mission science instead of a spacecraft bus. The MIDEX scaleable architecture is currently being manufactured and tested for use on the Microwave Anisotropy Probe (MAP), an inhouse program at GSFC.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
A scalable quantum computer with ions in an array of microtraps
Cirac; Zoller
2000-04-06
Quantum computers require the storage of quantum information in a set of two-level systems (called qubits), the processing of this information using quantum gates and a means of final readout. So far, only a few systems have been identified as potentially viable quantum computer models--accurate quantum control of the coherent evolution is required in order to realize gate operations, while at the same time decoherence must be avoided. Examples include quantum optical systems (such as those utilizing trapped ions or neutral atoms, cavity quantum electrodynamics and nuclear magnetic resonance) and solid state systems (using nuclear spins, quantum dots and Josephson junctions). The most advanced candidates are the quantum optical and nuclear magnetic resonance systems, and we expect that they will allow quantum computing with about ten qubits within the next few years. This is still far from the numbers required for useful applications: for example, the factorization of a 200-digit number requires about 3,500 qubits, rising to 100,000 if error correction is implemented. Scalability of proposed quantum computer architectures to many qubits is thus of central importance. Here we propose a model for an ion trap quantum computer that combines scalability (a feature usually associated with solid state proposals) with the advantages of quantum optical systems (in particular, quantum control and long decoherence times).
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
A Reusable Framework for Regional Climate Model Evaluation
NASA Astrophysics Data System (ADS)
Hart, A. F.; Goodale, C. E.; Mattmann, C. A.; Lean, P.; Kim, J.; Zimdars, P.; Waliser, D. E.; Crichton, D. J.
2011-12-01
Climate observations are currently obtained through a diverse network of sensors and platforms that include space-based observatories, airborne and seaborne platforms, and distributed, networked, ground-based instruments. These global observational measurements are critical inputs to the efforts of the climate modeling community and can provide a corpus of data for use in analysis and validation of climate models. The Regional Climate Model Evaluation System (RCMES) is an effort currently being undertaken to address the challenges of integrating this vast array of observational climate data into a coherent resource suitable for performing model analysis at the regional level. Developed through a collaboration between the NASA Jet Propulsion Laboratory (JPL) and the UCLA Joint Institute for Regional Earth System Science and Engineering (JIFRESSE), the RCMES uses existing open source technologies (MySQL, Apache Hadoop, and Apache OODT), to construct a scalable, parametric, geospatial data store that incorporates decades of observational data from a variety of NASA Earth science missions, as well as other sources into a consistently annotated, highly available scientific resource. By eliminating arbitrary partitions in the data (individual file boundaries, differing file formats, etc), and instead treating each individual observational measurement as a unique, geospatially referenced data point, the RCMES is capable of transforming large, heterogeneous collections of disparate observational data into a unified resource suitable for comparison to climate model output. This facility is further enhanced by the availability of a model evaluation toolkit which consists of a set of Python libraries, a RESTful web service layer, and a browser-based graphical user interface that allows for orchestration of model-to-data comparisons by composing them visually through web forms. This combination of tools and interfaces dramatically simplifies the process of interacting with and utilizing large volumes of observational data for model evaluation research. We feel that the RCMES is particularly appealing in that it represents a principled, reusable architectural approach rather than a one-off technological implementation. In fact, early RCMES prototypes have already utilized a variety of implementation technologies in an effort to address different performance and scalability concerns. This has been greatly facilitated by the fact that, at the architectural level, the RCMES is fundamentally domain agnostic. Strictly separating the data model from the implementation has enabled us to create a reusable architecture that we believe can be modified and configured to suit the demands of researchers in other domains.
myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection.
Ahn, Junho; Han, Richard
2016-05-23
We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users' daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzi, Silvio; Hereld, Mark; Insley, Joseph
In this work we perform in-situ visualization of molecular dynamics simulations, which can help scientists to visualize simulation output on-the-fly, without incurring storage overheads. We present a case study to couple LAMMPS, the large-scale molecular dynamics simulation code with vl3, our parallel framework for large-scale visualization and analysis. Our motivation is to identify effective approaches for covisualization and exploration of large-scale atomistic simulations at interactive frame rates.We propose a system of coupled libraries and describe its architecture, with an implementation that runs on GPU-based clusters. We present the results of strong and weak scalability experiments, as well as future researchmore » avenues based on our results.« less
myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection
Ahn, Junho; Han, Richard
2016-01-01
We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users’ daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period. PMID:27223292
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J.C.H.; Lung, H.; Katsumata, Y.
1995-12-01
In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Symplectic multi-particle tracking on GPUs
NASA Astrophysics Data System (ADS)
Liu, Zhicong; Qiang, Ji
2018-05-01
A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Weikuan; Vetter, Jeffrey S
Parallel NFS (pNFS) is touted as an emergent standard protocol for parallel I/O access in various storage environments. Several pNFS prototypes have been implemented for initial validation and protocol examination. Previous efforts have focused on realizing the pNFS protocol to expose the best bandwidth potential from underlying file and storage systems. In this presentation, we provide an initial characterization of two pNFS prototype implementations, lpNFS (a Lustre-based parallel NFS implementation) and spNFS (another reference implementation from Network Appliance, Inc.). We show that both lpNFS and spNFS can faithfully achieve the primary goal of pNFS, i.e., aggregating I/O bandwidth from manymore » storage servers. However, they both face the challenge of scalable metadata management. Particularly, the throughput of sp-NFS metadata operations degrades significanlty with an increasing number of data servers. Even for the better-performing lpNFS, we discuss its architecture and propose a direct I/O request flow protocol to improve its performance.« less
On implementation of DCTCP on three-tier and fat-tree data center network topologies.
Zafar, Saima; Bashir, Abeer; Chaudhry, Shafique Ahmad
2016-01-01
A data center is a facility for housing computational and storage systems interconnected through a communication network called data center network (DCN). Due to a tremendous growth in the computational power, storage capacity and the number of inter-connected servers, the DCN faces challenges concerning efficiency, reliability and scalability. Although transmission control protocol (TCP) is a time-tested transport protocol in the Internet, DCN challenges such as inadequate buffer space in switches and bandwidth limitations have prompted the researchers to propose techniques to improve TCP performance or design new transport protocols for DCN. Data center TCP (DCTCP) emerge as one of the most promising solutions in this domain which employs the explicit congestion notification feature of TCP to enhance the TCP congestion control algorithm. While DCTCP has been analyzed for two-tier tree-based DCN topology for traffic between servers in the same rack which is common in cloud applications, it remains oblivious to the traffic patterns common in university and private enterprise networks which traverse the complete network interconnect spanning upper tier layers. We also recognize that DCTCP performance cannot remain unaffected by the underlying DCN architecture hence there is a need to test and compare DCTCP performance when implemented over diverse DCN architectures. Some of the most notable DCN architectures are the legacy three-tier, fat-tree, BCube, DCell, VL2, and CamCube. In this research, we simulate the two switch-centric DCN architectures; the widely deployed legacy three-tier architecture and the promising fat-tree architecture using network simulator and analyze the performance of DCTCP in terms of throughput and delay for realistic traffic patterns. We also examine how DCTCP prevents incast and outcast congestion when realistic DCN traffic patterns are employed in above mentioned topologies. Our results show that the underlying DCN architecture significantly impacts DCTCP performance. We find that DCTCP gives optimal performance in fat-tree topology and is most suitable for large networks.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samfass, Philipp
LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
NASA Astrophysics Data System (ADS)
Litinski, Daniel; Kesselring, Markus S.; Eisert, Jens; von Oppen, Felix
2017-07-01
We present a scalable architecture for fault-tolerant topological quantum computation using networks of voltage-controlled Majorana Cooper pair boxes and topological color codes for error correction. Color codes have a set of transversal gates which coincides with the set of topologically protected gates in Majorana-based systems, namely, the Clifford gates. In this way, we establish color codes as providing a natural setting in which advantages offered by topological hardware can be combined with those arising from topological error-correcting software for full-fledged fault-tolerant quantum computing. We provide a complete description of our architecture, including the underlying physical ingredients. We start by showing that in topological superconductor networks, hexagonal cells can be employed to serve as physical qubits for universal quantum computation, and we present protocols for realizing topologically protected Clifford gates. These hexagonal-cell qubits allow for a direct implementation of open-boundary color codes with ancilla-free syndrome read-out and logical T gates via magic-state distillation. For concreteness, we describe how the necessary operations can be implemented using networks of Majorana Cooper pair boxes, and we give a feasibility estimate for error correction in this architecture. Our approach is motivated by nanowire-based networks of topological superconductors, but it could also be realized in alternative settings such as quantum-Hall-superconductor hybrids.
Providing scalable system software for high-end simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greenberg, D.
1997-12-31
Detailed, full-system, complex physics simulations have been shown to be feasible on systems containing thousands of processors. In order to manage these computer systems it has been necessary to create scalable system services. In this talk Sandia`s research on scalable systems will be described. The key concepts of low overhead data movement through portals and of flexible services through multi-partition architectures will be illustrated in detail. The talk will conclude with a discussion of how these techniques can be applied outside of the standard monolithic MPP system.
NASA Astrophysics Data System (ADS)
Jing, Changfeng; Liang, Song; Ruan, Yong; Huang, Jie
2008-10-01
During the urbanization process, when facing complex requirements of city development, ever-growing urban data, rapid development of planning business and increasing planning complexity, a scalable, extensible urban planning management information system is needed urgently. PM2006 is such a system that can deal with these problems. In response to the status and problems in urban planning, the scalability and extensibility of PM2006 are introduced which can be seen as business-oriented workflow extensibility, scalability of DLL-based architecture, flexibility on platforms of GIS and database, scalability of data updating and maintenance and so on. It is verified that PM2006 system has good extensibility and scalability which can meet the requirements of all levels of administrative divisions and can adapt to ever-growing changes in urban planning business. At the end of this paper, the application of PM2006 in Urban Planning Bureau of Suzhou city is described.
SLURM: Simple Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Dunlap, C; Garlick, J
2002-07-08
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. The design also includes a scalable, general-purpose communication infrastructure. This paper presents a overview of the SLURM architecture and functionality.
Multimode entanglement in reconfigurable graph states using optical frequency combs
Cai, Y.; Roslund, J.; Ferrini, G.; Arzani, F.; Xu, X.; Fabre, C.; Treps, N.
2017-01-01
Multimode entanglement is an essential resource for quantum information processing and quantum metrology. However, multimode entangled states are generally constructed by targeting a specific graph configuration. This yields to a fixed experimental setup that therefore exhibits reduced versatility and scalability. Here we demonstrate an optical on-demand, reconfigurable multimode entangled state, using an intrinsically multimode quantum resource and a homodyne detection apparatus. Without altering either the initial squeezing source or experimental architecture, we realize the construction of thirteen cluster states of various sizes and connectivities as well as the implementation of a secret sharing protocol. In particular, this system enables the interrogation of quantum correlations and fluctuations for any multimode Gaussian state. This initiates an avenue for implementing on-demand quantum information processing by only adapting the measurement process and not the experimental layout. PMID:28585530
Sensor network based vehicle classification and license plate identification system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frigo, Janette Rose; Brennan, Sean M; Rosten, Edward J
Typically, for energy efficiency and scalability purposes, sensor networks have been used in the context of environmental and traffic monitoring applications in which operations at the sensor level are not computationally intensive. But increasingly, sensor network applications require data and compute intensive sensors such video cameras and microphones. In this paper, we describe the design and implementation of two such systems: a vehicle classifier based on acoustic signals and a license plate identification system using a camera. The systems are implemented in an energy-efficient manner to the extent possible using commercially available hardware, the Mica motes and the Stargate platform.more » Our experience in designing these systems leads us to consider an alternate more flexible, modular, low-power mote architecture that uses a combination of FPGAs, specialized embedded processing units and sensor data acquisition systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar
2014-01-01
In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
Feasibility of Using Distributed Wireless Mesh Networks for Medical Emergency Response
Braunstein, Brian; Trimble, Troy; Mishra, Rajesh; Manoj, B. S.; Rao, Ramesh; Lenert, Leslie
2006-01-01
Achieving reliable, efficient data communications networks at a disaster site is a difficult task. Network paradigms, such as Wireless Mesh Network (WMN) architectures, form one exemplar for providing high-bandwidth, scalable data communication for medical emergency response activity. WMNs are created by self-organized wireless nodes that use multi-hop wireless relaying for data transfer. In this paper, we describe our experience using a mesh network architecture we developed for homeland security and medical emergency applications. We briefly discuss the architecture and present the traffic behavioral observations made by a client-server medical emergency application tested during a large-scale homeland security drill. We present our traffic measurements, describe lessons learned, and offer functional requirements (based on field testing) for practical 802.11 mesh medical emergency response networks. With certain caveats, the results suggest that 802.11 mesh networks are feasible and scalable systems for field communications in disaster settings. PMID:17238308
A Cloud-based Approach to Medical NLP
Chard, Kyle; Russell, Michael; Lussier, Yves A.; Mendonça, Eneida A; Silverstein, Jonathan C.
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN. PMID:22195072
A cloud-based approach to medical NLP.
Chard, Kyle; Russell, Michael; Lussier, Yves A; Mendonça, Eneida A; Silverstein, Jonathan C
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.
Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun
2015-12-01
Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.
Semantic interoperability--HL7 Version 3 compared to advanced architecture standards.
Blobel, B G M E; Engel, K; Pharow, P
2006-01-01
To meet the challenge for high quality and efficient care, highly specialized and distributed healthcare establishments have to communicate and co-operate in a semantically interoperable way. Information and communication technology must be open, flexible, scalable, knowledge-based and service-oriented as well as secure and safe. For enabling semantic interoperability, a unified process for defining and implementing the architecture, i.e. structure and functions of the cooperating systems' components, as well as the approach for knowledge representation, i.e. the used information and its interpretation, algorithms, etc. have to be defined in a harmonized way. Deploying the Generic Component Model, systems and their components, underlying concepts and applied constraints must be formally modeled, strictly separating platform-independent from platform-specific models. As HL7 Version 3 claims to represent the most successful standard for semantic interoperability, HL7 has been analyzed regarding the requirements for model-driven, service-oriented design of semantic interoperable information systems, thereby moving from a communication to an architecture paradigm. The approach is compared with advanced architectural approaches for information systems such as OMG's CORBA 3 or EHR systems such as GEHR/openEHR and CEN EN 13606 Electronic Health Record Communication. HL7 Version 3 is maturing towards an architectural approach for semantic interoperability. Despite current differences, there is a close collaboration between the teams involved guaranteeing a convergence between competing approaches.
Pinheiro, Alexandre; Dias Canedo, Edna; de Sousa Junior, Rafael Timoteo; de Oliveira Albuquerque, Robson; García Villalba, Luis Javier; Kim, Tai-Hoon
2018-03-02
Cloud computing is considered an interesting paradigm due to its scalability, availability and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verifications regarding data preservation. Due to these requirements, integrity, availability, privacy and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance.
2018-01-01
Cloud computing is considered an interesting paradigm due to its scalability, availability and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verifications regarding data preservation. Due to these requirements, integrity, availability, privacy and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance. PMID:29498641
Circuit quantum electrodynamics with a spin qubit.
Petersson, K D; McFaul, L W; Schroer, M D; Jung, M; Taylor, J M; Houck, A A; Petta, J R
2012-10-18
Electron spins trapped in quantum dots have been proposed as basic building blocks of a future quantum processor. Although fast, 180-picosecond, two-quantum-bit (two-qubit) operations can be realized using nearest-neighbour exchange coupling, a scalable, spin-based quantum computing architecture will almost certainly require long-range qubit interactions. Circuit quantum electrodynamics (cQED) allows spatially separated superconducting qubits to interact via a superconducting microwave cavity that acts as a 'quantum bus', making possible two-qubit entanglement and the implementation of simple quantum algorithms. Here we combine the cQED architecture with spin qubits by coupling an indium arsenide nanowire double quantum dot to a superconducting cavity. The architecture allows us to achieve a charge-cavity coupling rate of about 30 megahertz, consistent with coupling rates obtained in gallium arsenide quantum dots. Furthermore, the strong spin-orbit interaction of indium arsenide allows us to drive spin rotations electrically with a local gate electrode, and the charge-cavity interaction provides a measurement of the resulting spin dynamics. Our results demonstrate how the cQED architecture can be used as a sensitive probe of single-spin physics and that a spin-cavity coupling rate of about one megahertz is feasible, presenting the possibility of long-range spin coupling via superconducting microwave cavities.
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator
Wang, Runchun M.; Thakur, Chetan S.; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks. PMID:29692702
An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator.
Wang, Runchun M; Thakur, Chetan S; van Schaik, André
2018-01-01
This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200 k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 μW per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.
2018-07-01
This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Folding Proteins at 500 ns/hour with Work Queue.
Abdul-Wahid, Badi'; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2012-10-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Folding Proteins at 500 ns/hour with Work Queue
Abdul-Wahid, Badi’; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A.
2014-01-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour. PMID:25540799
Open release of the DCA++ project
NASA Astrophysics Data System (ADS)
Haehner, Urs; Solca, Raffaele; Staar, Peter; Alvarez, Gonzalo; Maier, Thomas; Summers, Michael; Schulthess, Thomas
We present the first open release of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems with cutting edge quantum cluster algorithms. The implemented dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy capture nonlocal correlations in strongly correlated electron systems thereby allowing insight into high-Tc superconductivity. With the increasing heterogeneity of modern machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU-GPU and Xeon Phi, sustaining multiple petaflops on ORNL's Titan and CSCS' Piz Daint. Moreover, we will describe how best practices in software engineering can be applied to make software development sustainable and scalable in a research group. Software testing and documentation not only prevent productivity collapse, but more importantly, they are necessary for correctness, credibility and reproducibility of scientific results. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
NASA Technical Reports Server (NTRS)
Liu, Zhuo; Wang, Bin; Wang, Teng; Tian, Yuan; Xu, Cong; Wang, Yandong; Yu, Weikuan; Cruz, Carlos A.; Zhou, Shujia; Clune, Tom;
2013-01-01
Exascale computing systems are soon to emerge, which will pose great challenges on the huge gap between computing and I/O performance. Many large-scale scientific applications play an important role in our daily life. The huge amounts of data generated by such applications require highly parallel and efficient I/O management policies. In this paper, we adopt a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems. Through in-detail architectural and experimental characterization, we observe that current legacy I/O schemes incur significant network communication overheads and are unable to fully parallelize the data access, thus degrading applications' I/O performance and scalability. To address these inefficiencies, we redesign its I/O framework along with a set of parallel I/O techniques to achieve high scalability and performance. Evaluation results on the NASA discover cluster show that our optimization of GEOS-5 with ADIOS has led to significant performance improvements compared to the original GEOS-5 implementation.
Efficient data management tools for the heterogeneous big data warehouse
NASA Astrophysics Data System (ADS)
Alekseev, A. A.; Osipova, V. V.; Ivanov, M. A.; Klimentov, A.; Grigorieva, N. V.; Nalamwar, H. S.
2016-09-01
The traditional RDBMS has been consistent for the normalized data structures. RDBMS served well for decades, but the technology is not optimal for data processing and analysis in data intensive fields like social networks, oil-gas industry, experiments at the Large Hadron Collider, etc. Several challenges have been raised recently on the scalability of data warehouse like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary and accounting purposes. The paper evaluates new database technologies like HBase, Cassandra, and MongoDB commonly referred as NoSQL databases for handling messy, varied and large amount of data. The evaluation depends upon the performance, throughput and scalability of the above technologies for several scientific and industrial use-cases. This paper outlines the technologies and architectures needed for processing Big Data, as well as the description of the back-end application that implements data migration from RDBMS to NoSQL data warehouse, NoSQL database organization and how it could be useful for further data analytics.
Transportation Network Topologies
NASA Technical Reports Server (NTRS)
Holmes, Bruce J.; Scott, John M.
2004-01-01
A discomforting reality has materialized on the transportation scene: our existing air and ground infrastructures will not scale to meet our nation's 21st century demands and expectations for mobility, commerce, safety, and security. The consequence of inaction is diminished quality of life and economic opportunity in the 21st century. Clearly, new thinking is required for transportation that can scale to meet to the realities of a networked, knowledge-based economy in which the value of time is a new coin of the realm. This paper proposes a framework, or topology, for thinking about the problem of scalability of the system of networks that comprise the aviation system. This framework highlights the role of integrated communication-navigation-surveillance systems in enabling scalability of future air transportation networks. Scalability, in this vein, is a goal of the recently formed Joint Planning and Development Office for the Next Generation Air Transportation System. New foundations for 21PstP thinking about air transportation are underpinned by several technological developments in the traditional aircraft disciplines as well as in communication, navigation, surveillance and information systems. Complexity science and modern network theory give rise to one of the technological developments of importance. Scale-free (i.e., scalable) networks represent a promising concept space for modeling airspace system architectures, and for assessing network performance in terms of scalability, efficiency, robustness, resilience, and other metrics. The paper offers an air transportation system topology as framework for transportation system innovation. Successful outcomes of innovation in air transportation could lay the foundations for new paradigms for aircraft and their operating capabilities, air transportation system architectures, and airspace architectures and procedural concepts. The topology proposed considers air transportation as a system of networks, within which strategies for scalability of the topology may be enabled by technologies and policies. In particular, the effects of scalable ICNS concepts are evaluated within this proposed topology. Alternative business models are appearing on the scene as the old centralized hub-and-spoke model reaches the limits of its scalability. These models include growth of point-to-point scheduled air transportation service (e.g., the RJ phenomenon and the 'Southwest Effect'). Another is a new business model for on-demand, widely distributed, air mobility in jet taxi services. The new businesses forming around this vision are targeting personal air mobility to virtually any of the thousands of origins and destinations throughout suburban, rural, and remote communities and regions. Such advancement in air mobility has many implications for requirements for airports, airspace, and consumers. These new paradigms could support scalable alternatives for the expansion of future air mobility to more consumers in more places.
Transportation Network Topologies
NASA Technical Reports Server (NTRS)
Holmes, Bruce J.; Scott, John
2004-01-01
A discomforting reality has materialized on the transportation scene: our existing air and ground infrastructures will not scale to meet our nation's 21st century demands and expectations for mobility, commerce, safety, and security. The consequence of inaction is diminished quality of life and economic opportunity in the 21st century. Clearly, new thinking is required for transportation that can scale to meet to the realities of a networked, knowledge-based economy in which the value of time is a new coin of the realm. This paper proposes a framework, or topology, for thinking about the problem of scalability of the system of networks that comprise the aviation system. This framework highlights the role of integrated communication-navigation-surveillance systems in enabling scalability of future air transportation networks. Scalability, in this vein, is a goal of the recently formed Joint Planning and Development Office for the Next Generation Air Transportation System. New foundations for 21st thinking about air transportation are underpinned by several technological developments in the traditional aircraft disciplines as well as in communication, navigation, surveillance and information systems. Complexity science and modern network theory give rise to one of the technological developments of importance. Scale-free (i.e., scalable) networks represent a promising concept space for modeling airspace system architectures, and for assessing network performance in terms of scalability, efficiency, robustness, resilience, and other metrics. The paper offers an air transportation system topology as framework for transportation system innovation. Successful outcomes of innovation in air transportation could lay the foundations for new paradigms for aircraft and their operating capabilities, air transportation system architectures, and airspace architectures and procedural concepts. The topology proposed considers air transportation as a system of networks, within which strategies for scalability of the topology may be enabled by technologies and policies. In particular, the effects of scalable ICNS concepts are evaluated within this proposed topology. Alternative business models are appearing on the scene as the old centralized hub-and-spoke model reaches the limits of its scalability. These models include growth of point-to-point scheduled air transportation service (e.g., the RJ phenomenon and the Southwest Effect). Another is a new business model for on-demand, widely distributed, air mobility in jet taxi services. The new businesses forming around this vision are targeting personal air mobility to virtually any of the thousands of origins and destinations throughout suburban, rural, and remote communities and regions. Such advancement in air mobility has many implications for requirements for airports, airspace, and consumers. These new paradigms could support scalable alternatives for the expansion of future air mobility to more consumers in more places.
Emulating short-term synaptic dynamics with memristive devices
NASA Astrophysics Data System (ADS)
Berdan, Radu; Vasilaki, Eleni; Khiat, Ali; Indiveri, Giacomo; Serb, Alexandru; Prodromakis, Themistoklis
2016-01-01
Neuromorphic architectures offer great promise for achieving computation capacities beyond conventional Von Neumann machines. The essential elements for achieving this vision are highly scalable synaptic mimics that do not undermine biological fidelity. Here we demonstrate that single solid-state TiO2 memristors can exhibit non-associative plasticity phenomena observed in biological synapses, supported by their metastable memory state transition properties. We show that, contrary to conventional uses of solid-state memory, the existence of rate-limiting volatility is a key feature for capturing short-term synaptic dynamics. We also show how the temporal dynamics of our prototypes can be exploited to implement spatio-temporal computation, demonstrating the memristors full potential for building biophysically realistic neural processing systems.
Optimizing microwave photodetection: input-output theory
NASA Astrophysics Data System (ADS)
Schöndorf, M.; Govia, L. C. G.; Vavilov, M. G.; McDermott, R.; Wilhelm, F. K.
2018-04-01
High fidelity microwave photon counting is an important tool for various areas from background radiation analysis in astronomy to the implementation of circuit quantum electrodynamic architectures for the realization of a scalable quantum information processor. In this work we describe a microwave photon counter coupled to a semi-infinite transmission line. We employ input-output theory to examine a continuously driven transmission line as well as traveling photon wave packets. Using analytic and numerical methods, we calculate the conditions on the system parameters necessary to optimize measurement and achieve high detection efficiency. With this we can derive a general matching condition depending on the different system rates, under which the measurement process is optimal.
An Open Avionics and Software Architecture to Support Future NASA Exploration Missions
NASA Technical Reports Server (NTRS)
Schlesinger, Adam
2017-01-01
The presentation describes an avionics and software architecture that has been developed through NASAs Advanced Exploration Systems (AES) division. The architecture is open-source, highly reliable with fault tolerance, and utilizes standard capabilities and interfaces, which are scalable and customizable to support future exploration missions. Specific focus areas of discussion will include command and data handling, software, human interfaces, communication and wireless systems, and systems engineering and integration.
A novel digital pulse processing architecture for nuclear instrumentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moline, Yoann; Thevenin, Mathieu; Corre, Gwenole
The field of nuclear instrumentation covers a wide range of applications, including counting, spectrometry, pulse shape discrimination and multi-channel coincidence. These applications are the topic of many researches, new algorithms and implementations are constantly proposed thanks to advances in digital signal processing. However, these improvements are not yet implemented in instrumentation devices. This is especially true for neutron-gamma discrimination applications which traditionally use charge comparison method while literature proposes other algorithms based on frequency domain or wavelet theory which show better performances. Another example is pileups which are generally rejected while pileup correction algorithms also exist. These processes are traditionallymore » performed offline due to two issues. The first is the Poissonian characteristic of the signal, composed of random arrival pulses which requires to current architectures to work in data flow. The second is the real-time requirement, which implies losing pulses when the pulse rate is too high. Despite the possibility of treating the pulses independently from each other, current architectures paralyze the acquisition of the signal during the processing of a pulse. This loss is called dead-time. These two issues have led current architectures to use dedicated solutions based on re-configurable components like Field Programmable Gate Arrays (FPGAs) to overcome the need of performance necessary to deal with dead-time. However, dedicated hardware algorithm implementations on re-configurable technologies are complex and time-consuming. For all these reasons, a programmable Digital pulse Processing (DPP) architecture in a high level language such as Cor C++ which can reduce dead-time would be worthwhile for nuclear instrumentation. This would reduce prototyping and test duration by reducing the level of hardware expertise to implement new algorithms. However, today's programmable solutions do not meet the need of performance to operate online and not allow scaling with the increase in the number of measurement channel. That is why an innovative DPP architecture is proposed in this paper. This architecture is able to overcome dead-time while being programmable and is flexible with the number of measurement channel. Proposed architecture is based on an innovative execution model for pulse processing applications which can be summarized as follow. The signal is not composed of pulses only, consequently, pulses processing does not have to operate on the entire signal. Therefore, the first step of our proposal is pulse extraction by the use of dedicated components named pulse extractors. The triggering step can be achieved after the analog-to-digital conversion without any signal shaping or filtering stages. Pileup detection and accurate pulse time stamping are done at this stage. Any application downstream this step can work on adaptive variable-sized array of samples simplifying pulse processing methods. Then, once the data flow is broken, it is possible to distribute pulses on Functional Units (FUs) which perform processing. As the date of each pulse is known, they can be processed individually out-of-order to provide the results. To manage the pulses distribution, a scheduler and an interconnection network are used. pulses are distributed on the first FU which is not busy without congesting the interconnection network. For this reason, the process duration does not result anymore in dead-time if there are enough FUs. FUs are designed to be standalone and to comprises at least a programmable general purpose processor (ARM, Microblaze) allowing the implementation of complex algorithms without any modification of the hardware. An acquisition chain is composed of a succession of algorithms which lead to organize our FUs as a software macro-pipeline, A simple approach consists in assigning one algorithm per FU. Consequently, the global latency becomes the worst latency of algorithms execution on FU. Moreover, as algorithms are executed locally - i.e. on a FU - this approach limits shared memory requirement. To handle multichannel, we propose FUs sharing, this approach maximize the chance to find a non-busy FU to process an incoming pulse. This is possible since each channel receive random event independently, the pulse extractors associated to them do not necessarily need to access simultaneously to all Computing resources at the same time to distribute their pulses. The major contribution of this paper is the proposition of an execution model and its associated hardware programmable architecture for digital pulse processing that can handle multiple acquisition channels while maintaining the scalability thanks to the use of shared resources. This execution model and associated architecture are validated by simulation of a cycle accurate architecture SystemC model. Proposed architecture shows promising results in terms of scalability while maintaining zero dead-time. This work also permit the sizing of hardware resources requirement required for a predefined set of applications. Future work will focus on the interconnection network and a scheduling policy that can exploit the variable-length of pulses. Then, the hardware implementation of this architecture will be performed and tested for a representative set of application.« less
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Jefferson Lab Mass Storage and File Replication Services
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ian Bird; Ying Chen; Bryan Hess
Jefferson Lab has implemented a scalable, distributed, high performance mass storage system - JASMine. The system is entirely implemented in Java, provides access to robotic tape storage and includes disk cache and stage manager components. The disk manager subsystem may be used independently to manage stand-alone disk pools. The system includes a scheduler to provide policy-based access to the storage systems. Security is provided by pluggable authentication modules and is implemented at the network socket level. The tape and disk cache systems have well defined interfaces in order to provide integration with grid-based services. The system is in production andmore » being used to archive 1 TB per day from the experiments, and currently moves over 2 TB per day total. This paper will describe the architecture of JASMine; discuss the rationale for building the system, and present a transparent 3rd party file replication service to move data to collaborating institutes using JASMine, XM L, and servlet technology interfacing to grid-based file transfer mechanisms.« less
Reconfigurable Hardware for Compressing Hyperspectral Image Data
NASA Technical Reports Server (NTRS)
Aranki, Nazeeh; Namkung, Jeffrey; Villapando, Carlos; Kiely, Aaron; Klimesh, Matthew; Xie, Hua
2010-01-01
High-speed, low-power, reconfigurable electronic hardware has been developed to implement ICER-3D, an algorithm for compressing hyperspectral-image data. The algorithm and parts thereof have been the topics of several NASA Tech Briefs articles, including Context Modeler for Wavelet Compression of Hyperspectral Images (NPO-43239) and ICER-3D Hyperspectral Image Compression Software (NPO-43238), which appear elsewhere in this issue of NASA Tech Briefs. As described in more detail in those articles, the algorithm includes three main subalgorithms: one for computing wavelet transforms, one for context modeling, and one for entropy encoding. For the purpose of designing the hardware, these subalgorithms are treated as modules to be implemented efficiently in field-programmable gate arrays (FPGAs). The design takes advantage of industry- standard, commercially available FPGAs. The implementation targets the Xilinx Virtex II pro architecture, which has embedded PowerPC processor cores with flexible on-chip bus architecture. It incorporates an efficient parallel and pipelined architecture to compress the three-dimensional image data. The design provides for internal buffering to minimize intensive input/output operations while making efficient use of offchip memory. The design is scalable in that the subalgorithms are implemented as independent hardware modules that can be combined in parallel to increase throughput. The on-chip processor manages the overall operation of the compression system, including execution of the top-level control functions as well as scheduling, initiating, and monitoring processes. The design prototype has been demonstrated to be capable of compressing hyperspectral data at a rate of 4.5 megasamples per second at a conservative clock frequency of 50 MHz, with a potential for substantially greater throughput at a higher clock frequency. The power consumption of the prototype is less than 6.5 W. The reconfigurability (by means of reprogramming) of the FPGAs makes it possible to effectively alter the design to some extent to satisfy different requirements without adding hardware. The implementation could be easily propagated to future FPGA generations and/or to custom application-specific integrated circuits.
The TOTEM DAQ based on the Scalable Readout System (SRS)
NASA Astrophysics Data System (ADS)
Quinto, Michele; Cafagna, Francesco S.; Fiergolski, Adrian; Radicioni, Emilio
2018-02-01
The TOTEM (TOTal cross section, Elastic scattering and diffraction dissociation Measurement at the LHC) experiment at LHC, has been designed to measure the total proton-proton cross-section and study the elastic and diffractive scattering at the LHC energies. In order to cope with the increased machine luminosity and the higher statistic required by the extension of the TOTEM physics program, approved for the LHC's Run Two phase, the previous VME based data acquisition system has been replaced with a new one based on the Scalable Readout System. The system features an aggregated data throughput of 2GB / s towards the online storage system. This makes it possible to sustain a maximum trigger rate of ˜ 24kHz, to be compared with the 1KHz rate of the previous system. The trigger rate is further improved by implementing zero-suppression and second-level hardware algorithms in the Scalable Readout System. The new system fulfils the requirements for an increased efficiency, providing higher bandwidth, and increasing the purity of the data recorded. Moreover full compatibility has been guaranteed with the legacy front-end hardware, as well as with the DAQ interface of the CMS experiment and with the LHC's Timing, Trigger and Control distribution system. In this contribution we describe in detail the architecture of full system and its performance measured during the commissioning phase at the LHC Interaction Point.
BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.
Nogueira, David; Tomas, Pedro; Roma, Nuno
2016-01-01
The computational demand of exact-search procedures has pressed the exploitation of parallel processing accelerators to reduce the execution time of many applications. However, this often imposes strict restrictions in terms of the problem size and implementation efforts, mainly due to their possibly distinct architectures. To circumvent this limitation, a new exact-search alignment tool (BowMapCL) based on the Burrows-Wheeler Transform and FM-Index is presented. Contrasting to other alternatives, BowMapCL is based on a unified implementation using OpenCL, allowing the exploitation of multiple and possibly different devices (e.g., NVIDIA, AMD/ATI, and Intel GPUs/APUs). Furthermore, to efficiently exploit such heterogeneous architectures, BowMapCL incorporates several techniques to promote its performance and scalability, including multiple buffering, work-queue task-distribution, and dynamic load-balancing, together with index partitioning, bit-encoding, and sampling. When compared with state-of-the-art tools, the attained results showed that BowMapCL (using a single GPU) is 2 × to 7.5 × faster than mainstream multi-threaded CPU BWT-based aligners, like Bowtie, BWA, and SOAP2; and up to 4 × faster than the best performing state-of-the-art GPU implementations (namely, SOAP3 and HPG-BWT). When multiple and completely distinct devices are considered, BowMapCL efficiently scales the offered throughput, ensuring a convenient load-balance of the involved processing in the several distinct devices.
Quantum Devices Bonded Beneath a Superconducting Shield: Part 2
NASA Astrophysics Data System (ADS)
McRae, Corey Rae; Abdallah, Adel; Bejanin, Jeremy; Earnest, Carolyn; McConkey, Thomas; Pagel, Zachary; Mariantoni, Matteo
The next-generation quantum computer will rely on physical quantum bits (qubits) organized into arrays to form error-robust logical qubits. In the superconducting quantum circuit implementation, this architecture will require the use of larger and larger chip sizes. In order for on-chip superconducting quantum computers to be scalable, various issues found in large chips must be addressed, including the suppression of box modes (due to the sample holder) and the suppression of slot modes (due to fractured ground planes). By bonding a metallized shield layer over a superconducting circuit using thin-film indium as a bonding agent, we have demonstrated proof of concept of an extensible circuit architecture that holds the key to the suppression of spurious modes. Microwave characterization of shielded transmission lines and measurement of superconducting resonators were compared to identical unshielded devices. The elimination of box modes was investigated, as well as bond characteristics including bond homogeneity and the presence of a superconducting connection.
Celesti, Antonio; Fazio, Maria; Romano, Agata; Bramanti, Alessia; Bramanti, Placido; Villari, Massimo
2018-05-01
The Open Archive Information System (OAIS) is a reference model for organizing people and resources in a system, and it is already adopted in care centers and medical systems to efficiently manage clinical data, medical personnel, and patients. Archival storage systems are typically implemented using traditional relational database systems, but the relation-oriented technology strongly limits the efficiency in the management of huge amount of patients' clinical data, especially in emerging cloud-based, that are distributed. In this paper, we present an OAIS healthcare architecture useful to manage a huge amount of HL7 clinical documents in a scalable way. Specifically, it is based on a NoSQL column-oriented Data Base Management System deployed in the cloud, thus to benefit from a big tables and wide rows available over a virtual distributed infrastructure. We developed a prototype of the proposed architecture at the IRCCS, and we evaluated its efficiency in a real case of study.
NASA Astrophysics Data System (ADS)
Baranowski, Z.; Canali, L.; Toebbicke, R.; Hrivnac, J.; Barberis, D.
2017-10-01
This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions of event records, each of which consists of ∼100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. We report also on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports.
Convolutional networks for fast, energy-efficient neuromorphic computing
Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.
2016-01-01
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489
Convolutional networks for fast, energy-efficient neuromorphic computing.
Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S
2016-10-11
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Silicon CMOS architecture for a spin-based quantum computer.
Veldhorst, M; Eenink, H G J; Yang, C H; Dzurak, A S
2017-12-15
Recent advances in quantum error correction codes for fault-tolerant quantum computing and physical realizations of high-fidelity qubits in multiple platforms give promise for the construction of a quantum computer based on millions of interacting qubits. However, the classical-quantum interface remains a nascent field of exploration. Here, we propose an architecture for a silicon-based quantum computer processor based on complementary metal-oxide-semiconductor (CMOS) technology. We show how a transistor-based control circuit together with charge-storage electrodes can be used to operate a dense and scalable two-dimensional qubit system. The qubits are defined by the spin state of a single electron confined in quantum dots, coupled via exchange interactions, controlled using a microwave cavity, and measured via gate-based dispersive readout. We implement a spin qubit surface code, showing the prospects for universal quantum computation. We discuss the challenges and focus areas that need to be addressed, providing a path for large-scale quantum computing.
Morales-Navarrete, Hernán; Segovia-Miranda, Fabián; Klukowski, Piotr; Meyer, Kirstin; Nonaka, Hidenori; Marsico, Giovanni; Chernykh, Mikhail; Kalaidzidis, Alexander; Zerial, Marino; Kalaidzidis, Yannis
2015-01-01
A prerequisite for the systems biology analysis of tissues is an accurate digital three-dimensional reconstruction of tissue structure based on images of markers covering multiple scales. Here, we designed a flexible pipeline for the multi-scale reconstruction and quantitative morphological analysis of tissue architecture from microscopy images. Our pipeline includes newly developed algorithms that address specific challenges of thick dense tissue reconstruction. Our implementation allows for a flexible workflow, scalable to high-throughput analysis and applicable to various mammalian tissues. We applied it to the analysis of liver tissue and extracted quantitative parameters of sinusoids, bile canaliculi and cell shapes, recognizing different liver cell types with high accuracy. Using our platform, we uncovered an unexpected zonation pattern of hepatocytes with different size, nuclei and DNA content, thus revealing new features of liver tissue organization. The pipeline also proved effective to analyse lung and kidney tissue, demonstrating its generality and robustness. DOI: http://dx.doi.org/10.7554/eLife.11214.001 PMID:26673893
The Gaia Archive at ESAC: a VO-inside archive
NASA Astrophysics Data System (ADS)
Gonzalez-Nunez, J.
2015-12-01
The ESDC (ESAC Science Data Center) is one of the active members of the IVOA (International Virtual Observatory Alliance) that have defined a set of standards, libraries and concepts that allows to create flexible,scalable and interoperable architectures on the data archives development. In the case of astronomy science that involves the use of big catalogues, as in Gaia or Euclid, TAP, UWS and VOSpace standards can be used to create an architecture that allows the explotation of this valuable data from the community. Also, new challenges arise like the implementation of the new paradigm "move code close to the data", what can be partially obtained by the extension of the protocols (TAP+, UWS+, etc) or the languages (ADQL). We explain how we have used VO standards and libraries for the Gaia Archive that, not only have producing an open and interoperable archive but, also, minimizing the developement on certain areas. Also we will explain how we have extended these protocols and the future plans.
Towards scalable Byzantine fault-tolerant replication
NASA Astrophysics Data System (ADS)
Zbierski, Maciej
2017-08-01
Byzantine fault-tolerant (BFT) replication is a powerful technique, enabling distributed systems to remain available and correct even in the presence of arbitrary faults. Unfortunately, existing BFT replication protocols are mostly load-unscalable, i.e. they fail to respond with adequate performance increase whenever new computational resources are introduced into the system. This article proposes a universal architecture facilitating the creation of load-scalable distributed services based on BFT replication. The suggested approach exploits parallel request processing to fully utilize the available resources, and uses a load balancer module to dynamically adapt to the properties of the observed client workload. The article additionally provides a discussion on selected deployment scenarios, and explains how the proposed architecture could be used to increase the dependability of contemporary large-scale distributed systems.
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures
Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone
2018-01-01
We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
The PMS project: Poor man's supercomputer
NASA Astrophysics Data System (ADS)
Csikor, F.; Fodor, Z.; Hegedüs, P.; Horváth, V. K.; Katz, S. D.; Piróth, A.
2001-02-01
We briefly describe the Poor Man's Supercomputer (PMS) project carried out at Eötvös University, Budapest. The goal was to construct a cost effective, scalable, fast parallel computer to perform numerical calculations of physical problems that can be implemented on a lattice with nearest neighbour interactions. To this end we developed the PMS architecture using PC components and designed a special, low cost communication hardware and the driver software for Linux OS. Our first implementation of PMS includes 32 nodes (PMS1). The performance of PMS1 was tested by Lattice Gauge Theory simulations. Using pure SU(3) gauge theory or the bosonic part of the minimal supersymmetric extention of the standard model (MSSM) on PMS1 we obtained 3 / Mflops and 0.60 / Mflops price-to-sustained performance ratio for double and single precision operations, respectively. The design of the special hardware and the communication driver are freely available upon request for non-profit organizations.
NASA Astrophysics Data System (ADS)
Barone, F.; Giordano, G.
2018-03-01
The UNISA Folded Pendulum technological platform is very promising for the implementation of high sensitive, large band miniaturized mechanical seismometers and accelerometers in different materials. In fact, the symmetry of its mechanical architecture allows to take full advantage of one of the most relevant properties of the folded pendulum, that is the scalability. This property is very useful for the design of folded pendulums of small size and weight, provided with a suitable combination of physical and geometrical parameters. Using a lagrangian simplified model of folded pendulum, we present and discuss this idea, showing different possible approaches that may lead to the miniaturization of a folded pendulum. Finally we present a first prototype of miniaturized folded pendulum, discussing its characteristics and limitations, in connection with scientific ground, marine and space applications.
Model-based Executive Control through Reactive Planning for Autonomous Rovers
NASA Technical Reports Server (NTRS)
Finzi, Alberto; Ingrand, Felix; Muscettola, Nicola
2004-01-01
This paper reports on the design and implementation of a real-time executive for a mobile rover that uses a model-based, declarative approach. The control system is based on the Intelligent Distributed Execution Architecture (IDEA), an approach to planning and execution that provides a unified representational and computational framework for an autonomous agent. The basic hypothesis of IDEA is that a large control system can be structured as a collection of interacting agents, each with the same fundamental structure. We show that planning and real-time response are compatible if the executive minimizes the size of the planning problem. We detail the implementation of this approach on an exploration rover (Gromit an RWI ATRV Junior at NASA Ames) presenting different IDEA controllers of the same domain and comparing them with more classical approaches. We demonstrate that the approach is scalable to complex coordination of functional modules needed for autonomous navigation and exploration.
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Production experience with the ATLAS Event Service
NASA Astrophysics Data System (ADS)
Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.
Multi-link laser interferometry architecture for interspacecraft displacement metrology
NASA Astrophysics Data System (ADS)
Francis, Samuel P.; Lam, Timothy T.-Y.; McClelland, David E.; Shaddock, Daniel A.
2018-03-01
Targeting a future Gravity Recovery and Climate Experiment (GRACE) mission, we present a new laser interferometry architecture that can be used to recover the displacement between two spacecraft from multiple interspacecraft measurements. We show it is possible to recover the displacement between the spacecraft centers of mass in post-processing by forming linear combinations of multiple, spatially offset, interspacecraft measurements. By canceling measurement error due to angular misalignment of the spacecraft, we remove the need for precise placement or alignment of the interferometer, potentially simplifying spacecraft integration. To realize this multi-link architecture, we propose an all-fiber interferometer, removing the need for any ultrastable optical components such as the GRACE Follow-On mission's triple mirror assembly. Using digitally enhanced heterodyne interferometry, the number of links is readily scalable, adding redundancy to our measurement. We present the concept, an example multi-link implementation and the signal processing required to recover the center of mass displacement from multiple link measurements. Finally, in a simulation, we analyze the limiting noise sources in a 9 link interferometer and ultimately show we can recover the 80 {nm}/√{ {Hz}} displacement sensitivity required by the GRACE Follow-On laser ranging interferometer.
NASA Technical Reports Server (NTRS)
LaValley, Brian W.; Little, Phillip D.; Walter, Chris J.
2011-01-01
This report documents the capabilities of the EDICT tools for error modeling and error propagation analysis when operating with models defined in the Architecture Analysis & Design Language (AADL). We discuss our experience using the EDICT error analysis capabilities on a model of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER) architecture that uses the Reliable Optical Bus (ROBUS). Based on these experiences we draw some initial conclusions about model based design techniques for error modeling and analysis of highly reliable computing architectures.
Design of Complex BPF with Automatic Digital Tuning Circuit for Low-IF Receivers
NASA Astrophysics Data System (ADS)
Kondo, Hideaki; Sawada, Masaru; Murakami, Norio; Masui, Shoichi
This paper describes the architecture and implementations of an automatic digital tuning circuit for a complex bandpass filter (BPF) in a low-power and low-cost transceiver for applications such as personal authentication and wireless sensor network systems. The architectural design analysis demonstrates that an active RC filter in a low-IF architecture can be at least 47.7% smaller in area than a conventional gm-C filter; in addition, it features a simple implementation of an associated tuning circuit. The principle of simultaneous tuning of both the center frequency and bandwidth through calibration of a capacitor array is illustrated as based on an analysis of filter characteristics, and a scalable automatic digital tuning circuit with simple analog blocks and control logic having only 835 gates is introduced. The developed capacitor tuning technique can achieve a tuning error of less than ±3.5% and lower a peaking in the passband filter characteristics. An experimental complex BPF using 0.18µm CMOS technology can successfully reduce the tuning error from an initial value of -20% to less than ±2.5% after tuning. The filter block dimensions are 1.22mm × 1.01mm; and in measurement results of the developed complex BPF with the automatic digital tuning circuit, current consumption is 705µA and the image rejection ratio is 40.3dB. Complete evaluation of the BPF indicates that this technique can be applied to low-power, low-cost transceivers.
Ground System Architectures Workshop GMSEC SERVICES SUITE (GSS): an Agile Development Story
NASA Technical Reports Server (NTRS)
Ly, Vuong
2017-01-01
The GMSEC (Goddard Mission Services Evolution Center) Services Suite (GSS) is a collection of tools and software services along with a robust customizable web-based portal that enables the user to capture, monitor, report, and analyze system-wide GMSEC data. Given our plug-and-play architecture and the needs for rapid system development, we opted to follow the Scrum Agile Methodology for software development. Being one of the first few projects to implement the Agile methodology at NASA GSFC, in this presentation we will present our approaches, tools, successes, and challenges in implementing this methodology. The GMSEC architecture provides a scalable, extensible ground and flight system for existing and future missions. GMSEC comes with a robust Application Programming Interface (GMSEC API) and a core set of Java-based GMSEC components that facilitate the development of a GMSEC-based ground system. Over the past few years, we have seen an upbeat in the number of customers who are moving from a native desktop application environment to a web based environment particularly for data monitoring and analysis. We also see a need to provide separation of the business logic from the GUI display for our Java-based components and also to consolidate all the GUI displays into one interface. This combination of separation and consolidation brings immediate value to a GMSEC-based ground system through increased ease of data access via a uniform interface, built-in security measures, centralized configuration management, and ease of feature extensibility.
A framework for plasticity implementation on the SpiNNaker neural architecture.
Galluppi, Francesco; Lagorce, Xavier; Stromatias, Evangelos; Pfeiffer, Michael; Plana, Luis A; Furber, Steve B; Benosman, Ryad B
2014-01-01
Many of the precise biological mechanisms of synaptic plasticity remain elusive, but simulations of neural networks have greatly enhanced our understanding of how specific global functions arise from the massively parallel computation of neurons and local Hebbian or spike-timing dependent plasticity rules. For simulating large portions of neural tissue, this has created an increasingly strong need for large scale simulations of plastic neural networks on special purpose hardware platforms, because synaptic transmissions and updates are badly matched to computing style supported by current architectures. Because of the great diversity of biological plasticity phenomena and the corresponding diversity of models, there is a great need for testing various hypotheses about plasticity before committing to one hardware implementation. Here we present a novel framework for investigating different plasticity approaches on the SpiNNaker distributed digital neural simulation platform. The key innovation of the proposed architecture is to exploit the reconfigurability of the ARM processors inside SpiNNaker, dedicating a subset of them exclusively to process synaptic plasticity updates, while the rest perform the usual neural and synaptic simulations. We demonstrate the flexibility of the proposed approach by showing the implementation of a variety of spike- and rate-based learning rules, including standard Spike-Timing dependent plasticity (STDP), voltage-dependent STDP, and the rate-based BCM rule. We analyze their performance and validate them by running classical learning experiments in real time on a 4-chip SpiNNaker board. The result is an efficient, modular, flexible and scalable framework, which provides a valuable tool for the fast and easy exploration of learning models of very different kinds on the parallel and reconfigurable SpiNNaker system.
A framework for plasticity implementation on the SpiNNaker neural architecture
Galluppi, Francesco; Lagorce, Xavier; Stromatias, Evangelos; Pfeiffer, Michael; Plana, Luis A.; Furber, Steve B.; Benosman, Ryad B.
2015-01-01
Many of the precise biological mechanisms of synaptic plasticity remain elusive, but simulations of neural networks have greatly enhanced our understanding of how specific global functions arise from the massively parallel computation of neurons and local Hebbian or spike-timing dependent plasticity rules. For simulating large portions of neural tissue, this has created an increasingly strong need for large scale simulations of plastic neural networks on special purpose hardware platforms, because synaptic transmissions and updates are badly matched to computing style supported by current architectures. Because of the great diversity of biological plasticity phenomena and the corresponding diversity of models, there is a great need for testing various hypotheses about plasticity before committing to one hardware implementation. Here we present a novel framework for investigating different plasticity approaches on the SpiNNaker distributed digital neural simulation platform. The key innovation of the proposed architecture is to exploit the reconfigurability of the ARM processors inside SpiNNaker, dedicating a subset of them exclusively to process synaptic plasticity updates, while the rest perform the usual neural and synaptic simulations. We demonstrate the flexibility of the proposed approach by showing the implementation of a variety of spike- and rate-based learning rules, including standard Spike-Timing dependent plasticity (STDP), voltage-dependent STDP, and the rate-based BCM rule. We analyze their performance and validate them by running classical learning experiments in real time on a 4-chip SpiNNaker board. The result is an efficient, modular, flexible and scalable framework, which provides a valuable tool for the fast and easy exploration of learning models of very different kinds on the parallel and reconfigurable SpiNNaker system. PMID:25653580
application architecture, energy informatics, scalable acquisition of sensor data, and software tools for engaging occupants in building energy performance. Prior to joining NREL, Anya developed custom business
HACC: Extreme Scaling and Performance Across Diverse Architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin
2013-11-01
Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
NASA Astrophysics Data System (ADS)
Prasad, Guru; Jayaram, Sanjay; Ward, Jami; Gupta, Pankaj
2004-08-01
In this paper, Aximetric proposes a decentralized Command and Control (C2) architecture for a distributed control of a cluster of on-board health monitoring and software enabled control systems called SimBOX that will use some of the real-time infrastructure (RTI) functionality from the current military real-time simulation architecture. The uniqueness of the approach is to provide a "plug and play environment" for various system components that run at various data rates (Hz) and the ability to replicate or transfer C2 operations to various subsystems in a scalable manner. This is possible by providing a communication bus called "Distributed Shared Data Bus" and a distributed computing environment used to scale the control needs by providing a self-contained computing, data logging and control function module that can be rapidly reconfigured to perform different functions. This kind of software-enabled control is very much needed to meet the needs of future aerospace command and control functions.
NASA Astrophysics Data System (ADS)
Prasad, Guru; Jayaram, Sanjay; Ward, Jami; Gupta, Pankaj
2004-09-01
In this paper, Aximetric proposes a decentralized Command and Control (C2) architecture for a distributed control of a cluster of on-board health monitoring and software enabled control systems called
Scalable quantum computer architecture with coupled donor-quantum dot qubits
Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; Lyon, Stephen; Tyryshkin, Alexei; Bokor, Jeffrey
2014-08-26
A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, applying a first voltage across at least one aligned pair of a quantum dot and donor atom to control a donor-quantum dot coupling, and applying a second voltage between one or more quantum dots to control a Heisenberg exchange J coupling between quantum dots and to cause transport of a single spin polarized electron between quantum dots.
NASA Astrophysics Data System (ADS)
Navlakha, Nupur; Kranti, Abhinav
2017-11-01
The work reports on the use of a planar tri-gate tunnel field effect transistor (TFET) to operate as dynamic memory at 85 °C with an enhanced sense margin (SM). Two symmetric gates (G1) aligned to the source at a partial region of intrinsic film result into better electrostatic control that regulates the read mechanism based on band-to-band tunneling, while the other gate (G2), positioned adjacent to the first front gate is responsible for charge storage and sustenance. The proposed architecture results in an enhanced SM of ˜1.2 μA μm-1 along with a longer retention time (RT) of ˜1.8 s at 85 °C, for a total length of 600 nm. The double gate architecture towards the source increases the tunneling current and also reduces short channel effects, enhancing SM and scalability, thereby overcoming the critical bottleneck faced by TFET based dynamic memories. The work also discusses the impact of overlap/underlap and interface charges on the performance of TFET based dynamic memory. Insights into device operation demonstrate that the choice of appropriate architecture and biases not only limit the trade-off between SM and RT, but also result in improved scalability with drain voltage and total length being scaled down to 0.8 V and 115 nm, respectively.
Scalable 3D bicontinuous fluid networks: polymer heat exchangers toward artificial organs.
Roper, Christopher S; Schubert, Randall C; Maloney, Kevin J; Page, David; Ro, Christopher J; Yang, Sophia S; Jacobsen, Alan J
2015-04-17
A scalable method for fabricating architected materials well-suited for heat and mass exchange is presented. These materials exhibit unprecedented combinations of small hydraulic diameters (13.0-0.09 mm) and large hydraulic-diameter-to-thickness ratios (5.0-30,100). This process expands the range of material architectures achievable starting from photopolymer waveguide lattices or additive manufacturing. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The ATLAS Event Service: A new approach to event processing
NASA Astrophysics Data System (ADS)
Calafiura, P.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.
2015-12-01
The ATLAS Event Service (ES) implements a new fine grained approach to HEP event processing, designed to be agile and efficient in exploiting transient, short-lived resources such as HPC hole-filling, spot market commercial clouds, and volunteer computing. Input and output control and data flows, bookkeeping, monitoring, and data storage are all managed at the event level in an implementation capable of supporting ATLAS-scale distributed processing throughputs (about 4M CPU-hours/day). Input data flows utilize remote data repositories with no data locality or pre-staging requirements, minimizing the use of costly storage in favor of strongly leveraging powerful networks. Object stores provide a highly scalable means of remotely storing the quasi-continuous, fine grained outputs that give ES based applications a very light data footprint on a processing resource, and ensure negligible losses should the resource suddenly vanish. We will describe the motivations for the ES system, its unique features and capabilities, its architecture and the highly scalable tools and technologies employed in its implementation, and its applications in ATLAS processing on HPCs, commercial cloud resources, volunteer computing, and grid resources. Notice: This manuscript has been authored by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
A real-time architecture for time-aware agents.
Prouskas, Konstantinos-Vassileios; Pitt, Jeremy V
2004-06-01
This paper describes the specification and implementation of a new three-layer time-aware agent architecture. This architecture is designed for applications and environments where societies of humans and agents play equally active roles, but interact and operate in completely different time frames. The architecture consists of three layers: the April real-time run-time (ART) layer, the time aware layer (TAL), and the application agents layer (AAL). The ART layer forms the underlying real-time agent platform. An original online, real-time, dynamic priority-based scheduling algorithm is described for scheduling the computation time of agent processes, and it is shown that the algorithm's O(n) complexity and scalable performance are sufficient for application in real-time domains. The TAL layer forms an abstraction layer through which human and agent interactions are temporally unified, that is, handled in a common way irrespective of their temporal representation and scale. A novel O(n2) interaction scheduling algorithm is described for predicting and guaranteeing interactions' initiation and completion times. The time-aware predicting component of a workflow management system is also presented as an instance of the AAL layer. The described time-aware architecture addresses two key challenges in enabling agents to be effectively configured and applied in environments where humans and agents play equally active roles. It provides flexibility and adaptability in its real-time mechanisms while placing them under direct agent control, and it temporally unifies human and agent interactions.
NASA Astrophysics Data System (ADS)
Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.
2015-12-01
The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.
A Generic Ground Framework for Image Expertise Centres and Small-Sized Production Centres
NASA Astrophysics Data System (ADS)
Sellé, A.
2009-05-01
Initiated by the Pleiadas Earth Observation Program, the CNES (French Space Agency) has developed a generic collaborative framework for its image quality centre, highly customisable for any upcoming expertise centre. This collaborative framework has been design to be used by a group of experts or scientists that want to share data and processings and manage interfaces with external entities. Its flexible and scalable architecture complies with the core requirements: defining a user data model with no impact on the software (generic access data), integrating user processings with a GUI builder and built-in APIs, and offering a scalable architecture to fit any preformance requirement and accompany growing projects. The CNES jas given licensing grants for two software companies that will be able to redistribute this framework to any customer.
Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H
2014-05-28
Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
Cotič, Živa; Rees, Rebecca; Wark, Petra A; Car, Josip
2016-10-19
In 2013, there was a shortage of approximately 7.2 million health workers worldwide, which is larger among family physicians than among specialists. eLearning could provide a potential solution to some of these global workforce challenges. However, there is little evidence on factors facilitating or hindering implementation, adoption, use, scalability and sustainability of eLearning. This review aims to synthesise results from qualitative and mixed methods studies to provide insight on factors influencing implementation of eLearning for family medicine specialty education and training. Additionally, this review aims to identify the actions needed to increase effectiveness of eLearning and identify the strategies required to improve eLearning implementation, adoption, use, sustainability and scalability for family medicine speciality education and training. A systematic search will be conducted across a range of databases for qualitative studies focusing on experiences, barriers, facilitators, and other factors related to the implementation, adoption, use, sustainability and scalability of eLearning for family medicine specialty education and training. Studies will be synthesised by using the framework analysis approach. This study will contribute to the evaluation of eLearning implementation, adoption, use, sustainability and scalability for family medicine specialty training and education and the development of eLearning guidelines for postgraduate medical education. PROSPERO http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42016036449.
GMR biosensor arrays: a system perspective.
Hall, D A; Gaster, R S; Lin, T; Osterfeld, S J; Han, S; Murmann, B; Wang, S X
2010-05-15
Giant magnetoresistive biosensors are becoming more prevalent for sensitive, quantifiable biomolecular detection. However, in order for magnetic biosensing to become competitive with current optical protein microarray technology, there is a need to increase the number of sensors while maintaining the high sensitivity and fast readout time characteristic of smaller arrays (1-8 sensors). In this paper, we present a circuit architecture scalable for larger sensor arrays (64 individually addressable sensors) while maintaining a high readout rate (scanning the entire array in less than 4s). The system utilizes both time domain multiplexing and frequency domain multiplexing in order to achieve this scan rate. For the implementation, we propose a new circuit architecture that does not use a classical Wheatstone bridge to measure the small change in resistance of the sensor. Instead, an architecture designed around a transimpedance amplifier is employed. A detailed analysis of this architecture including the noise, distortion, and potential sources of errors is presented, followed by a global optimization strategy for the entire system comprising the magnetic tags, sensors, and interface electronics. To demonstrate the sensitivity, quantifiable detection of two blindly spiked samples of unknown concentrations has been performed at concentrations below the limit of detection for the enzyme-linked immunosorbent assay. Lastly, the multiplexing capability and reproducibility of the system was demonstrated by simultaneously monitoring sensors functionalized with three unique proteins at different concentrations in real-time. 2010 Elsevier B.V. All rights reserved.
GMR Biosensor Arrays: A System Perspective
Hall, D. A.; Gaster, R. S.; Lin, T.; Osterfeld, S. J.; Han, S.; Murmann, B.; Wang, S. X.
2010-01-01
Giant magnetoresistive biosensors are becoming more prevalent for sensitive, quantifiable biomolecular detection. However, in order for magnetic biosensing to become competitive with current optical protein microarray technology, there is a need to increase the number of sensors while maintaining the high sensitivity and fast readout time characteristic of smaller arrays (1 – 8 sensors). In this paper, we present a circuit architecture scalable for larger sensor arrays (64 individually addressable sensors) while maintaining a high readout rate (scanning the entire array in less than 4 seconds). The system utilizes both time domain multiplexing and frequency domain multiplexing in order to achieve this scan rate. For the implementation, we propose a new circuit architecture that does not use a classical Wheatstone bridge to measure the small change in resistance of the sensor. Instead, an architecture designed around a transimpedance amplifier is employed. A detailed analysis of this architecture including the noise, distortion, and potential sources of errors is presented, followed by a global optimization strategy for the entire system comprising the magnetic tags, sensors, and interface electronics. To demonstrate the sensitivity, quantifiable detection of two blindly spiked samples of unknown concentrations has been performed at concentrations below the limit of detection for the enzyme-linked immunosorbent assay. Lastly, the multipexability and reproducibility of the system was demonstrated by simultaneously monitoring sensors functionalized with three unique proteins at different concentrations in real-time. PMID:20207130
Architectures Toward Reusable Science Data Systems
NASA Astrophysics Data System (ADS)
Moses, J. F.
2014-12-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building ground systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research, NOAA's weather satellites and USGS's Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience the goal is to recognize architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
PathCase-SB architecture and database design
2011-01-01
Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889
Architectures Toward Reusable Science Data Systems
NASA Technical Reports Server (NTRS)
Moses, John
2015-01-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research and NOAAs Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience we expect to find architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Braa, Jørn; Kanter, Andrew S.; Lesh, Neal; Crichton, Ryan; Jolliffe, Bob; Sæbø, Johan; Kossi, Edem; Seebregts, Christopher J.
2010-01-01
We address the problem of how to integrate health information systems in low-income African countries in which technical infrastructure and human resources vary wildly within countries. We describe a set of tools to meet the needs of different service areas including managing aggregate indicators, patient level record systems, and mobile tools for community outreach. We present the case of Sierra Leone and use this case to motivate and illustrate an architecture that allows us to provide services at each level of the health system (national, regional, facility and community) and provide different configurations of the tools as appropriate for the individual area. Finally, we present a, collaborative implementation of this approach in Sierra Leone. PMID:21347003
Direct diode pumped Ti:sapphire ultrafast regenerative amplifier system
Backus, Sterling; Durfee, Charles; Lemons, Randy; ...
2017-02-10
Here, we report on a direct diode-pumped Ti:sapphire ultrafast regenerative amplifier laser system producing multi-uJ energies with repetition rate from 50 to 250 kHz. By combining cryogenic cooling of Ti:sapphire with high brightness fiber-coupled 450nm laser diodes, we for the first time demonstrate a power-scalable CW-pumped architecture that can be directly applied to demanding ultrafast applications such as coherent high-harmonic EUV generation without any complex post-amplification pulse compression. Initial results promise a new era for Ti:sapphire amplifiers not only for ultrafast laser applications, but also for tunable CW sources. We discuss the unique challenges to implementation, as well as themore » solutions to these challenges.« less
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
NASA Astrophysics Data System (ADS)
Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling
Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
Direct diode pumped Ti:sapphire ultrafast regenerative amplifier system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Backus, Sterling; Durfee, Charles; Lemons, Randy
Here, we report on a direct diode-pumped Ti:sapphire ultrafast regenerative amplifier laser system producing multi-uJ energies with repetition rate from 50 to 250 kHz. By combining cryogenic cooling of Ti:sapphire with high brightness fiber-coupled 450nm laser diodes, we for the first time demonstrate a power-scalable CW-pumped architecture that can be directly applied to demanding ultrafast applications such as coherent high-harmonic EUV generation without any complex post-amplification pulse compression. Initial results promise a new era for Ti:sapphire amplifiers not only for ultrafast laser applications, but also for tunable CW sources. We discuss the unique challenges to implementation, as well as themore » solutions to these challenges.« less
Parallel Programming Strategies for Irregular Adaptive Applications
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.
A feasibility study on porting the community land model onto accelerators using OpenACC
Wang, Dali; Wu, Wei; Winkler, Frank; ...
2014-01-01
As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less
Authentication and Authorization of End User in Microservice Architecture
NASA Astrophysics Data System (ADS)
He, Xiuyu; Yang, Xudong
2017-10-01
As the market and business continues to expand; the traditional single monolithic architecture is facing more and more challenges. The development of cloud computing and container technology promote microservice architecture became more popular. While the low coupling, fine granularity, scalability, flexibility and independence of the microservice architecture bring convenience, the inherent complexity of the distributed system make the security of microservice architecture important and difficult. This paper aims to study the authentication and authorization of the end user under the microservice architecture. By comparing with the traditional measures and researching on existing technology, this paper put forward a set of authentication and authorization strategies suitable for microservice architecture, such as distributed session, SSO solutions, client-side JSON web token and JWT + API Gateway, and summarize the advantages and disadvantages of each method.
Evaluating a NoSQL Alternative for Chilean Virtual Observatory Services
NASA Astrophysics Data System (ADS)
Antognini, J.; Araya, M.; Solar, M.; Valenzuela, C.; Lira, F.
2015-09-01
Currently, the standards and protocols for data access in the Virtual Observatory architecture (DAL) are generally implemented with relational databases based on SQL. In particular, the Astronomical Data Query Language (ADQL), language used by IVOA to represent queries to VO services, was created to satisfy the different data access protocols, such as Simple Cone Search. ADQL is based in SQL92, and has extra functionality implemented using PgSphere. An emergent alternative to SQL are the so called NoSQL databases, which can be classified in several categories such as Column, Document, Key-Value, Graph, Object, etc.; each one recommended for different scenarios. Within their notable characteristics we can find: schema-free, easy replication support, simple API, Big Data, etc. The Chilean Virtual Observatory (ChiVO) is developing a functional prototype based on the IVOA architecture, with the following relevant factors: Performance, Scalability, Flexibility, Complexity, and Functionality. Currently, it's very difficult to compare these factors, due to a lack of alternatives. The objective of this paper is to compare NoSQL alternatives with SQL through the implementation of a Web API REST that satisfies ChiVO's needs: a SESAME-style name resolver for the data from ALMA. Therefore, we propose a test scenario by configuring a NoSQL database with data from different sources and evaluating the feasibility of creating a Simple Cone Search service and its performance. This comparison will allow to pave the way for the application of Big Data databases in the Virtual Observatory.
Parametric dense stereovision implementation on a system-on chip (SoC).
Gardel, Alfredo; Montejo, Pablo; García, Jorge; Bravo, Ignacio; Lázaro, José L
2012-01-01
This paper proposes a novel hardware implementation of a dense recovery of stereovision 3D measurements. Traditionally 3D stereo systems have imposed the maximum number of stereo correspondences, introducing a large restriction on artificial vision algorithms. The proposed system-on-chip (SoC) provides great performance and efficiency, with a scalable architecture available for many different situations, addressing real time processing of stereo image flow. Using double buffering techniques properly combined with pipelined processing, the use of reconfigurable hardware achieves a parametrisable SoC which gives the designer the opportunity to decide its right dimension and features. The proposed architecture does not need any external memory because the processing is done as image flow arrives. Our SoC provides 3D data directly without the storage of whole stereo images. Our goal is to obtain high processing speed while maintaining the accuracy of 3D data using minimum resources. Configurable parameters may be controlled by later/parallel stages of the vision algorithm executed on an embedded processor. Considering hardware FPGA clock of 100 MHz, image flows up to 50 frames per second (fps) of dense stereo maps of more than 30,000 depth points could be obtained considering 2 Mpix images, with a minimum initial latency. The implementation of computer vision algorithms on reconfigurable hardware, explicitly low level processing, opens up the prospect of its use in autonomous systems, and they can act as a coprocessor to reconstruct 3D images with high density information in real time.
Control and Measurement of an Xmon with the Quantum Socket
NASA Astrophysics Data System (ADS)
McConkey, T. G.; Bejanin, J. H.; Earnest, C. T.; McRae, C. R. H.; Rinehart, J. R.; Weides, M.; Mariantoni, M.
The implementation of superconducting quantum processors is rapidly reaching scalability limitations. Extensible electronics and wiring solutions for superconducting quantum bits (qubits) are among the most imminent issues to be tackled. The necessity to substitute planar electrical interconnects (e.g., wire bonds) with three-dimensional wires is emerging as a fundamental pillar towards scalability. In a previous work, we have shown that three-dimensional wires housed in a suitable package, named the quantum socket, can be utilized to measure high-quality superconducting resonators. In this work, we set out to test the quantum socket with actual superconducting qubits to verify its suitability as a wiring solution in the development of an extensible quantum computing architecture. To this end, we have designed and fabricated a series of Xmon qubits. The qubits range in frequency from about 6 to 7 GHz with anharmonicity of 200 MHz and can be tuned by means of Z pulses. Controlling tunable Xmons will allow us to verify whether the three-dimensional wires contact resistance is low enough for qubit operation. Qubit T1 and T2 times and single qubit gate fidelities are compared against current standards in the field.
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Scalable and Interactive Segmentation and Visualization of Neural Processes in EM Datasets
Jeong, Won-Ki; Beyer, Johanna; Hadwiger, Markus; Vazquez, Amelio; Pfister, Hanspeter; Whitaker, Ross T.
2011-01-01
Recent advances in scanning technology provide high resolution EM (Electron Microscopy) datasets that allow neuroscientists to reconstruct complex neural connections in a nervous system. However, due to the enormous size and complexity of the resulting data, segmentation and visualization of neural processes in EM data is usually a difficult and very time-consuming task. In this paper, we present NeuroTrace, a novel EM volume segmentation and visualization system that consists of two parts: a semi-automatic multiphase level set segmentation with 3D tracking for reconstruction of neural processes, and a specialized volume rendering approach for visualization of EM volumes. It employs view-dependent on-demand filtering and evaluation of a local histogram edge metric, as well as on-the-fly interpolation and ray-casting of implicit surfaces for segmented neural structures. Both methods are implemented on the GPU for interactive performance. NeuroTrace is designed to be scalable to large datasets and data-parallel hardware architectures. A comparison of NeuroTrace with a commonly used manual EM segmentation tool shows that our interactive workflow is faster and easier to use for the reconstruction of complex neural processes. PMID:19834227
A scalable and flexible hybrid energy storage system design and implementation
NASA Astrophysics Data System (ADS)
Kim, Younghyun; Koh, Jason; Xie, Qing; Wang, Yanzhi; Chang, Naehyuck; Pedram, Massoud
2014-06-01
Energy storage systems (ESS) are becoming one of the most important components that noticeably change overall system performance in various applications, ranging from the power grid infrastructure to electric vehicles (EV) and portable electronics. However, a homogeneous ESS is subject to limited characteristics in terms of cost, efficiency, lifetime, etc., by the energy storage technology that comprises the ESS. On the other hand, hybrid ESS (HESS) are a viable solution for a practical ESS with currently available technologies as they have potential to overcome such limitations by exploiting only advantages of heterogeneous energy storage technologies while hiding their drawbacks. However, the HESS concept basically mandates sophisticated design and control to actually make the benefits happen. The HESS architecture should be able to provide controllability of many parts, which are often fixed in homogeneous ESS, and novel management policies should be able to utilize the control features. This paper introduces a complete design practice of a HESS prototype to demonstrate scalability, flexibility, and energy efficiency. It is composed of three heterogenous energy storage elements: lead-acid batteries, lithium-ion batteries, and supercapacitors. We demonstrate a novel system control methodology and enhanced energy efficiency through this design practice.
A scalable architecture for online anomaly detection of WLCG batch jobs
NASA Astrophysics Data System (ADS)
Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.
2016-10-01
For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring traffic data and characteristics of WLCG batch jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.
Simplified Parallel Domain Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
Scalable and Resilient Middleware to Handle Information Exchange during Environment Crisis
NASA Astrophysics Data System (ADS)
Tao, R.; Poslad, S.; Moßgraber, J.; Middleton, S.; Hammitzsch, M.
2012-04-01
The EU FP7 TRIDEC project focuses on enabling real-time, intelligent, information management of collaborative, complex, critical decision processes for earth management. A key challenge is to promote a communication infrastructure to facilitate interoperable environment information services during environment events and crises such as tsunamis and drilling, during which increasing volumes and dimensionality of disparate information sources, including sensor-based and human-based ones, can result, and need to be managed. Such a system needs to support: scalable, distributed messaging; asynchronous messaging; open messaging to handling changing clients such as new and retired automated system and human information sources becoming online or offline; flexible data filtering, and heterogeneous access networks (e.g., GSM, WLAN and LAN). In addition, the system needs to be resilient to handle the ICT system failures, e.g. failure, degradation and overloads, during environment events. There are several system middleware choices for TRIDEC based upon a Service-oriented-architecture (SOA), Event-driven-Architecture (EDA), Cloud Computing, and Enterprise Service Bus (ESB). In an SOA, everything is a service (e.g. data access, processing and exchange); clients can request on demand or subscribe to services registered by providers; more often interaction is synchronous. In an EDA system, events that represent significant changes in state can be processed simply, or as streams or more complexly. Cloud computing is a virtualization, interoperable and elastic resource allocation model. An ESB, a fundamental component for enterprise messaging, supports synchronous and asynchronous message exchange models and has inbuilt resilience against ICT failure. Our middleware proposal is an ESB based hybrid architecture model: an SOA extension supports more synchronous workflows; EDA assists the ESB to handle more complex event processing; Cloud computing can be used to increase and decrease the ESB resources on demand. To reify this hybrid ESB centric architecture, we will adopt two complementary approaches: an open source one for scalability and resilience improvement while a commercial one can be used for ultra-speed messaging, whilst we can bridge between these two to support interoperability. In TRIDEC, to manage such a hybrid messaging system, overlay and underlay management techniques will be adopted. The managers (both global and local) will collect, store and update status information (e.g. CPU utilization, free space, number of clients) and balance the usage, throughput, and delays to improve resilience and scalability. The expected resilience improvement includes dynamic failover, self-healing, pre-emptive load balancing, and bottleneck prediction while the expected improvement for scalability includes capacity estimation, Http Bridge, and automatic configuration and reconfiguration (e.g. add or delete clients and servers).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L
2013-01-01
Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI Allreduce and MPI Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In this work, we propose a hierarchical design to implement the reduction operations on multicore systems. This design aims to improve the efficiency of reductions by 1) tailoring the algorithms and customizing the implementations for various communication mechanisms in the system 2) providing the ability to configure the depth ofmore » hierarchy to match the system architecture, and 3) providing the ability to independently progress each of this hierarchy. Using this design, we implement MPI Allreduce and MPI Reduce operations (and its nonblocking variants MPI Iallreduce and MPI Ireduce) for all message sizes, and evaluate on multiple architectures including InfiniBand and Cray XT5. We leverage and enhance our existing infrastructure, Cheetah, which is a framework for implementing hierarchical collective operations to implement these reductions. The experimental results show that the Cheetah reduction operations outperform the production-grade MPI implementations such as Open MPI default, Cray MPI, and MVAPICH2, demonstrating its efficiency, flexibility and portability. On Infini- Band systems, with a microbenchmark, a 512-process Cheetah nonblocking Allreduce and Reduce achieves a speedup of 23x and 10x, respectively, compared to the default Open MPI reductions. The blocking variants of the reduction operations also show similar performance benefits. A 512-process nonblocking Cheetah Allreduce achieves a speedup of 3x, compared to the default MVAPICH2 Allreduce implementation. On a Cray XT5 system, a 6144-process Cheetah Allreduce outperforms the Cray MPI by 145%. The evaluation with an application kernel, Conjugate Gradient solver, shows that the Cheetah reductions speeds up total time to solution by 195%, demonstrating the potential benefits for scientific simulations.« less
Optically Driven Spin Based Quantum Dots for Quantum Computing - Research Area 6 Physics 6.3.2
2015-12-15
quantum dots (SAQD) in Schottky diodes . Based on spins in these dots, a scalable architecture has been proposed [Adv. in Physics, 59, 703 (2010)] by us...housed in two coupled quantum dots with tunneling between them, as described above, may not be scalable but can serve as a node in a quantum network. The... tunneling -coupled two-electron spin ground states in the vertically coupled quantum dots for “universal computation” two spin qubits within the universe of
Engineering PFLOTRAN for Scalable Performance on Cray XT and IBM BlueGene Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mills, Richard T; Sripathi, Vamsi K; Mahinthakumar, Gnanamanika
We describe PFLOTRAN - a code for simulation of coupled hydro-thermal-chemical processes in variably saturated, non-isothermal, porous media - and the approaches we have employed to obtain scalable performance on some of the largest scale supercomputers in the world. We present detailed analyses of I/O and solver performance on Jaguar, the Cray XT5 at Oak Ridge National Laboratory, and Intrepid, the IBM BlueGene/P at Argonne National Laboratory, that have guided our choice of algorithms.
A Mobile IPv6 based Distributed Mobility Management Mechanism of Mobile Internet
NASA Astrophysics Data System (ADS)
Yan, Shi; Jiayin, Cheng; Shanzhi, Chen
A flatter architecture is one of the trends of mobile Internet. Traditional centralized mobility management mechanism faces the challenges such as scalability and UE reachability. A MIPv6 based distributed mobility management mechanism is proposed in this paper. Some important network entities and signaling procedures are defined. UE reachability is also considered in this paper through extension to DNS servers. Simulation results show that the proposed approach can overcome the scalability problem of the centralized scheme.
A Distributed Architecture for Tsunami Early Warning and Collaborative Decision-support in Crises
NASA Astrophysics Data System (ADS)
Moßgraber, J.; Middleton, S.; Hammitzsch, M.; Poslad, S.
2012-04-01
The presentation will describe work on the system architecture that is being developed in the EU FP7 project TRIDEC on "Collaborative, Complex and Critical Decision-Support in Evolving Crises". The challenges for a Tsunami Early Warning System (TEWS) are manifold and the success of a system depends crucially on the system's architecture. A modern warning system following a system-of-systems approach has to integrate various components and sub-systems such as different information sources, services and simulation systems. Furthermore, it has to take into account the distributed and collaborative nature of warning systems. In order to create an architecture that supports the whole spectrum of a modern, distributed and collaborative warning system one must deal with multiple challenges. Obviously, one cannot expect to tackle these challenges adequately with a monolithic system or with a single technology. Therefore, a system architecture providing the blueprints to implement the system-of-systems approach has to combine multiple technologies and architectural styles. At the bottom layer it has to reliably integrate a large set of conventional sensors, such as seismic sensors and sensor networks, buoys and tide gauges, and also innovative and unconventional sensors, such as streams of messages from social media services. At the top layer it has to support collaboration on high-level decision processes and facilitates information sharing between organizations. In between, the system has to process all data and integrate information on a semantic level in a timely manner. This complex communication follows an event-driven mechanism allowing events to be published, detected and consumed by various applications within the architecture. Therefore, at the upper layer the event-driven architecture (EDA) aspects are combined with principles of service-oriented architectures (SOA) using standards for communication and data exchange. The most prominent challenges on this layer include providing a framework for information integration on a syntactic and semantic level, leveraging distributed processing resources for a scalable data processing platform, and automating data processing and decision support workflows.
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
The NASA Space Communications Data Networking Architecture
NASA Technical Reports Server (NTRS)
Israel, David J.; Hooke, Adrian J.; Freeman, Kenneth; Rush, John J.
2006-01-01
The NASA Space Communications Architecture Working Group (SCAWG) has recently been developing an integrated agency-wide space communications architecture in order to provide the necessary communication and navigation capabilities to support NASA's new Exploration and Science Programs. A critical element of the space communications architecture is the end-to-end Data Networking Architecture, which must provide a wide range of services required for missions ranging from planetary rovers to human spaceflight, and from sub-orbital space to deep space. Requirements for a higher degree of user autonomy and interoperability between a variety of elements must be accommodated within an architecture that necessarily features minimum operational complexity. The architecture must also be scalable and evolvable to meet mission needs for the next 25 years. This paper will describe the recommended NASA Data Networking Architecture, present some of the rationale for the recommendations, and will illustrate an application of the architecture to example NASA missions.
Highly parallel implementation of non-adiabatic Ehrenfest molecular dynamics
NASA Astrophysics Data System (ADS)
Kanai, Yosuke; Schleife, Andre; Draeger, Erik; Anisimov, Victor; Correa, Alfredo
2014-03-01
While the adiabatic Born-Oppenheimer approximation tremendously lowers computational effort, many questions in modern physics, chemistry, and materials science require an explicit description of coupled non-adiabatic electron-ion dynamics. Electronic stopping, i.e. the energy transfer of a fast projectile atom to the electronic system of the target material, is a notorious example. We recently implemented real-time time-dependent density functional theory based on the plane-wave pseudopotential formalism in the Qbox/qb@ll codes. We demonstrate that explicit integration using a fourth-order Runge-Kutta scheme is very suitable for modern highly parallelized supercomputers. Applying the new implementation to systems with hundreds of atoms and thousands of electrons, we achieved excellent performance and scalability on a large number of nodes both on the BlueGene based ``Sequoia'' system at LLNL as well as the Cray architecture of ``Blue Waters'' at NCSA. As an example, we discuss our work on computing the electronic stopping power of aluminum and gold for hydrogen projectiles, showing an excellent agreement with experiment. These first-principles calculations allow us to gain important insight into the the fundamental physics of electronic stopping.
Input-independent, Scalable and Fast String Matching on the Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Chavarría-Miranda, Daniel; Maschhoff, Kristyn J
2009-05-25
String searching is at the core of many security and network applications like search engines, intrusion detection systems, virus scanners and spam filters. The growing size of on-line content and the increasing wire speeds push the need for fast, and often real- time, string searching solutions. For these conditions, many software implementations (if not all) targeting conventional cache-based microprocessors do not perform well. They either exhibit overall low performance or exhibit highly variable performance depending on the types of inputs. For this reason, real-time state of the art solutions rely on the use of either custom hardware or Field-Programmable Gatemore » Arrays (FPGAs) at the expense of overall system flexibility and programmability. This paper presents a software based implementation of the Aho-Corasick string searching algorithm on the Cray XMT multithreaded shared memory machine. Our so- lution relies on the particular features of the XMT architecture and on several algorith- mic strategies: it is fast, scalable and its performance is virtually content-independent. On a 128-processor Cray XMT, it reaches a scanning speed of ≈ 28 Gbps with a performance variability below 10 %. In the 10 Gbps performance range, variability is below 2.5%. By comparison, an Intel dual-socket, 8-core system running at 2.66 GHz achieves a peak performance which varies from 500 Mbps to 10 Gbps depending on the type of input and dictionary size.« less
A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ciangottini, D.; Balcas, J.; Mascheroni, M.
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses amore » NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.« less
A comparison of different database technologies for the CMS AsyncStageOut transfer database
NASA Astrophysics Data System (ADS)
Ciangottini, D.; Balcas, J.; Mascheroni, M.; Rupeika, E. A.; Vaandering, E.; Riahi, H.; Silva, J. M. D.; Hernandez, J. M.; Belforte, S.; Ivanov, T. T.
2017-10-01
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.
NASA Technical Reports Server (NTRS)
McGuire, Tim
1998-01-01
In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
LDRD project final report : hybrid AI/cognitive tactical behavior framework for LVC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Djordjevich, Donna D.; Xavier, Patrick Gordon; Brannon, Nathan Gregory
This Lab-Directed Research and Development (LDRD) sought to develop technology that enhances scenario construction speed, entity behavior robustness, and scalability in Live-Virtual-Constructive (LVC) simulation. We investigated issues in both simulation architecture and behavior modeling. We developed path-planning technology that improves the ability to express intent in the planning task while still permitting an efficient search algorithm. An LVC simulation demonstrated how this enables 'one-click' layout of squad tactical paths, as well as dynamic re-planning for simulated squads and for real and simulated mobile robots. We identified human response latencies that can be exploited in parallel/distributed architectures. We did an experimentalmore » study to determine where parallelization would be productive in Umbra-based force-on-force (FOF) simulations. We developed and implemented a data-driven simulation composition approach that solves entity class hierarchy issues and supports assurance of simulation fairness. Finally, we proposed a flexible framework to enable integration of multiple behavior modeling components that model working memory phenomena with different degrees of sophistication.« less
Experience with ATLAS MySQL PanDA database service
NASA Astrophysics Data System (ADS)
Smirnov, Y.; Wlodek, T.; De, K.; Hover, J.; Ozturk, N.; Smith, J.; Wenaus, T.; Yu, D.
2010-04-01
The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all PanDA information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis.
Enhanced service zone architecture for multiservices over IP
NASA Astrophysics Data System (ADS)
Michaely, Boaz; Mohan, Seshadri
2001-07-01
Recently, the field of IP Telephony has been experienced considerable evolution through the specification of new protocols and introduction of products implementing these protocols. We visualize IP Telephony evolving to soon offer multiservices encompassing not only voice, but also data, video and multimedia. While the progress has focused on refining protocols and architectures, very little attention has been given to business models for offering these services. This paper introduces the concept of a Service Zone, which from a service provider/network operator perspective fits within the operator's administrative domain, but is viewed as an independent zone with its own management and services, requiring minimal integration with the core network services. Besides its own management, the Enhanced Services Zone may also provide provisioning and maintenance features needed to provide the customer services and availability that subscribers expect from a telephony service providers. The platform must provide reliable service over time, be scalable to meet increased capacity demands, and be upgradeable to incorporate advanced services and features as they become available. Signaling flows are illustrated using SIP and H.323.
Argonne Simulation Framework for Intelligent Transportation Systems
DOT National Transportation Integrated Search
1996-01-01
A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...
Astronomy In The Cloud: Using Mapreduce For Image Coaddition
NASA Astrophysics Data System (ADS)
Wiley, Keith; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-01-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computational challenges such as anomaly detection, classification, and moving object tracking. Since such studies require the highest quality data, methods such as image coaddition, i.e., registration, stacking, and mosaicing, will be critical to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources, e.g., asteroids, or transient objects, e.g., supernovae, these datastreams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, i.e., platforms where Hadoop is offered as a service. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results compring their performance. This work is funded by the NSF and by NASA.
Astronomy in the Cloud: Using MapReduce for Image Co-Addition
NASA Astrophysics Data System (ADS)
Wiley, K.; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-03-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification and moving-object tracking. Since such studies benefit from the highest-quality data, methods such as image co-addition, i.e., astrometric registration followed by per-pixel summation, will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this article we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data are partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources: i.e., platforms where Hadoop is offered as a service. We report on our experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multiterabyte imaging data set provides a good testbed for algorithm development, since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image co-addition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.
Scalable Conjunction Processing using Spatiotemporally Indexed Ephemeris Data
NASA Astrophysics Data System (ADS)
Budianto-Ho, I.; Johnson, S.; Sivilli, R.; Alberty, C.; Scarberry, R.
2014-09-01
The collision warnings produced by the Joint Space Operations Center (JSpOC) are of critical importance in protecting U.S. and allied spacecraft against destructive collisions and protecting the lives of astronauts during space flight. As the Space Surveillance Network (SSN) improves its sensor capabilities for tracking small and dim space objects, the number of tracked objects increases from thousands to hundreds of thousands of objects, while the number of potential conjunctions increases with the square of the number of tracked objects. Classical filtering techniques such as apogee and perigee filters have proven insufficient. Novel and orders of magnitude faster conjunction analysis algorithms are required to find conjunctions in a timely manner. Stellar Science has developed innovative filtering techniques for satellite conjunction processing using spatiotemporally indexed ephemeris data that efficiently and accurately reduces the number of objects requiring high-fidelity and computationally-intensive conjunction analysis. Two such algorithms, one based on the k-d Tree pioneered in robotics applications and the other based on Spatial Hash Tables used in computer gaming and animation, use, at worst, an initial O(N log N) preprocessing pass (where N is the number of tracked objects) to build large O(N) spatial data structures that substantially reduce the required number of O(N^2) computations, substituting linear memory usage for quadratic processing time. The filters have been implemented as Open Services Gateway initiative (OSGi) plug-ins for the Continuous Anomalous Orbital Situation Discriminator (CAOS-D) conjunction analysis architecture. We have demonstrated the effectiveness, efficiency, and scalability of the techniques using a catalog of 100,000 objects, an analysis window of one day, on a 64-core computer with 1TB shared memory. Each algorithm can process the full catalog in 6 minutes or less, almost a twenty-fold performance improvement over the baseline implementation running on the same machine. We will present an overview of the algorithms and results that demonstrate the scalability of our concepts.
Back to the future: virtualization of the computing environment at the W. M. Keck Observatory
NASA Astrophysics Data System (ADS)
McCann, Kevin L.; Birch, Denny A.; Holt, Jennifer M.; Randolph, William B.; Ward, Josephine A.
2014-07-01
Over its two decades of science operations, the W.M. Keck Observatory computing environment has evolved to contain a distributed hybrid mix of hundreds of servers, desktops and laptops of multiple different hardware platforms, O/S versions and vintages. Supporting the growing computing capabilities to meet the observatory's diverse, evolving computing demands within fixed budget constraints, presents many challenges. This paper describes the significant role that virtualization is playing in addressing these challenges while improving the level and quality of service as well as realizing significant savings across many cost areas. Starting in December 2012, the observatory embarked on an ambitious plan to incrementally test and deploy a migration to virtualized platforms to address a broad range of specific opportunities. Implementation to date has been surprisingly glitch free, progressing well and yielding tangible benefits much faster than many expected. We describe here the general approach, starting with the initial identification of some low hanging fruit which also provided opportunity to gain experience and build confidence among both the implementation team and the user community. We describe the range of challenges, opportunities and cost savings potential. Very significant among these was the substantial power savings which resulted in strong broad support for moving forward. We go on to describe the phasing plan, the evolving scalable architecture, some of the specific technical choices, as well as some of the individual technical issues encountered along the way. The phased implementation spans Windows and Unix servers for scientific, engineering and business operations, virtualized desktops for typical office users as well as more the more demanding graphics intensive CAD users. Other areas discussed in this paper include staff training, load balancing, redundancy, scalability, remote access, disaster readiness and recovery.
Electrical and computer architecture of an autonomous Mars sample return rover prototype
NASA Astrophysics Data System (ADS)
Leslie, Caleb Thomas
Space truly is the final frontier. As man looks to explore beyond the confines of our planet, we use the lessons learned from traveling to the Moon and orbiting in the International Space Station, and we set our sights upon Mars. For decades, Martian probes consisting of orbiters, landers, and even robotic rovers have been sent to study Mars. Their discoveries have yielded a wealth of new scientific knowledge regarding the Martian environment and the secrets it holds. Armed with this knowledge, NASA and others have begun preparations to send humans to Mars with the ultimate goal of colonization and permanent human habitation. The ultimate success of any long term manned mission to Mars will require in situ resource utilization techniques and technologies to both support their stay and make a return trip to Earth viable. A sample return mission to Mars will play a pivotal role in developing these necessary technologies to ensure such an endeavor to be a successful one. This thesis describes an electrical and computer architecture for autonomous robotic applications. The architecture is one that is modular, scalable, and adaptable. These traits are achieved by maximizing commonality and reusability within modules that can be added, removed, or reconfigured within the system. This architecture, called the Modular Architecture for Autonomous Robotic Systems (MAARS), was implemented on the University of Alabama's Collection and Extraction Rover for Extraterrestrial Samples (CERES). The CERES rover competed in the 2016 NASA Sample Return Robot Challenge where robots were tasked with autonomously finding, collecting, and returning samples to the landing site.
A flexible software architecture for scalable real-time image and video processing applications
NASA Astrophysics Data System (ADS)
Usamentiaga, Rubén; Molleda, Julio; García, Daniel F.; Bulnes, Francisco G.
2012-06-01
Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility because they are normally oriented towards particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse and inefficient execution on multicore processors. This paper presents a novel software architecture for real-time image and video processing applications which addresses these issues. The architecture is divided into three layers: the platform abstraction layer, the messaging layer, and the application layer. The platform abstraction layer provides a high level application programming interface for the rest of the architecture. The messaging layer provides a message passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of messages. The application layer provides a repository for reusable application modules designed for real-time image and video processing applications. These modules, which include acquisition, visualization, communication, user interface and data processing modules, take advantage of the power of other well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, we present different prototypes and applications to show the possibilities of the proposed architecture.
Back-end and interface implementation of the STS-XYTER2 prototype ASIC for the CBM experiment
NASA Astrophysics Data System (ADS)
Kasinski, K.; Szczygiel, R.; Zabolotny, W.
2016-11-01
Each front-end readout ASIC for the High-Energy Physics experiments requires robust and effective hit data streaming and control mechanism. A new STS-XYTER2 full-size prototype chip for the Silicon Tracking System and Muon Chamber detectors in the Compressed Baryonic Matter experiment at Facility for Antiproton and Ion Research (FAIR, Germany) is a 128-channel time and amplitude measuring solution for silicon microstrip and gas detectors. It operates at 250 kHit/s/channel hit rate, each hit producing 27 bits of information (5-bit amplitude, 14-bit timestamp, position and diagnostics data). The chip back-end implements fast front-end channel read-out, timestamp-wise hit sorting, and data streaming via a scalable interface implementing the dedicated protocol (STS-HCTSP) for chip control and hit transfer with data bandwidth from 9.7 MHit/s up to 47 MHit/s. It also includes multiple options for link diagnostics, failure detection, and throttling features. The back-end is designed to operate with the data acquisition architecture based on the CERN GBTx transceivers. This paper presents the details of the back-end and interface design and its implementation in the UMC 180 nm CMOS process.
A LabVIEW® based generic CT scanner control software platform.
Dierick, M; Van Loo, D; Masschaele, B; Boone, M; Van Hoorebeke, L
2010-01-01
UGCT, the Centre for X-ray tomography at Ghent University (Belgium) does research on X-ray tomography and its applications. This includes the development and construction of state-of-the-art CT scanners for scientific research. Because these scanners are built for very different purposes they differ considerably in their physical implementations. However, they all share common principle functionality. In this context a generic software platform was developed using LabVIEW® in order to provide the same interface and functionality on all scanners. This article describes the concept and features of this software, and its potential for tomography in a research setting. The core concept is to rigorously separate the abstract operation of a CT scanner from its actual physical configuration. This separation is achieved by implementing a sender-listener architecture. The advantages are that the resulting software platform is generic, scalable, highly efficient, easy to develop and to extend, and that it can be deployed on future scanners with minimal effort.
A flexible architecture for advanced process control solutions
NASA Astrophysics Data System (ADS)
Faron, Kamyar; Iourovitski, Ilia
2005-05-01
Advanced Process Control (APC) is now mainstream practice in the semiconductor manufacturing industry. Over the past decade and a half APC has evolved from a "good idea", and "wouldn"t it be great" concept to mandatory manufacturing practice. APC developments have primarily dealt with two major thrusts, algorithms and infrastructure, and often the line between them has been blurred. The algorithms have evolved from very simple single variable solutions to sophisticated and cutting edge adaptive multivariable (input and output) solutions. Spending patterns in recent times have demanded that the economics of a comprehensive APC infrastructure be completely justified for any and all cost conscious manufacturers. There are studies suggesting integration costs as high as 60% of the total APC solution costs. Such cost prohibitive figures clearly diminish the return on APC investments. This has limited the acceptance and development of pure APC infrastructure solutions for many fabs. Modern APC solution architectures must satisfy the wide array of requirements from very manual R&D environments to very advanced and automated "lights out" manufacturing facilities. A majority of commercially available control solutions and most in house developed solutions lack important attributes of scalability, flexibility, and adaptability and hence require significant resources for integration, deployment, and maintenance. Many APC improvement efforts have been abandoned and delayed due to legacy systems and inadequate architectural design. Recent advancements (Service Oriented Architectures) in the software industry have delivered ideal technologies for delivering scalable, flexible, and reliable solutions that can seamlessly integrate into any fabs" existing system and business practices. In this publication we shall evaluate the various attributes of the architectures required by fabs and illustrate the benefits of a Service Oriented Architecture to satisfy these requirements. Blue Control Technologies has developed an advance service oriented architecture Run to Run Control System which addresses these requirements.
NASA Technical Reports Server (NTRS)
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
The CMS Data Management System
NASA Astrophysics Data System (ADS)
Giffels, M.; Guo, Y.; Kuznetsov, V.; Magini, N.; Wildish, T.
2014-06-01
The data management elements in CMS are scalable, modular, and designed to work together. The main components are PhEDEx, the data transfer and location system; the Data Booking Service (DBS), a metadata catalog; and the Data Aggregation Service (DAS), designed to aggregate views and provide them to users and services. Tens of thousands of samples have been cataloged and petabytes of data have been moved since the run began. The modular system has allowed the optimal use of appropriate underlying technologies. In this contribution we will discuss the use of both Oracle and NoSQL databases to implement the data management elements as well as the individual architectures chosen. We will discuss how the data management system functioned during the first run, and what improvements are planned in preparation for 2015.
Development of Mission Enabling Infrastructure — Cislunar Autonomous Positioning System (CAPS)
NASA Astrophysics Data System (ADS)
Cheetham, B. W.
2017-10-01
Advanced Space, LLC is developing the Cislunar Autonomous Positioning System (CAPS) which would provide a scalable and evolvable architecture for navigation to reduce ground congestion and improve operations for missions throughout cislunar space.
Job Scheduling in a Heterogeneous Grid Environment
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak
2004-01-01
Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
Scalable Lunar Surface Networks and Adaptive Orbit Access
NASA Technical Reports Server (NTRS)
Wang, Xudong
2015-01-01
Teranovi Technologies, Inc., has developed innovative network architecture, protocols, and algorithms for both lunar surface and orbit access networks. A key component of the overall architecture is a medium access control (MAC) protocol that includes a novel mechanism of overlaying time division multiple access (TDMA) and carrier sense multiple access with collision avoidance (CSMA/CA), ensuring scalable throughput and quality of service. The new MAC protocol is compatible with legacy Institute of Electrical and Electronics Engineers (IEEE) 802.11 networks. Advanced features include efficiency power management, adaptive channel width adjustment, and error control capability. A hybrid routing protocol combines the advantages of ad hoc on-demand distance vector (AODV) routing and disruption/delay-tolerant network (DTN) routing. Performance is significantly better than AODV or DTN and will be particularly effective for wireless networks with intermittent links, such as lunar and planetary surface networks and orbit access networks.
Evaluating Discovery Services Architectures in the Context of the Internet of Things
NASA Astrophysics Data System (ADS)
Polytarchos, Elias; Eliakis, Stelios; Bochtis, Dimitris; Pramatari, Katerina
As the "Internet of Things" is expected to grow rapidly in the following years, the need to develop and deploy efficient and scalable Discovery Services in this context is very important for its success. Thus, the ability to evaluate and compare the performance of different Discovery Services architectures is vital if we want to allege that a given design is better at meeting requirements of a specific application. The purpose of this chapter is to provide a paradigm for the evaluation of different Discovery Services for the Internet of Things in terms of efficiency, scalability and performance through the use of simulations. The methodology presented uses the application of Discovery Services to a supply chain with the Service Lookup Service Discovery Service using OMNeT++, an open source network simulation suite. Then, we delve into the simulation design and the details of our findings.
Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): Architecture
Mandl, Kenneth D; Kohane, Isaac S; McFadden, Douglas; Weber, Griffin M; Natter, Marc; Mandel, Joshua; Schneeweiss, Sebastian; Weiler, Sarah; Klann, Jeffrey G; Bickel, Jonathan; Adams, William G; Ge, Yaorong; Zhou, Xiaobo; Perkins, James; Marsolo, Keith; Bernstam, Elmer; Showalter, John; Quarshie, Alexander; Ofili, Elizabeth; Hripcsak, George; Murphy, Shawn N
2014-01-01
We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative ‘apps’ to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components. PMID:24821734
Diskless supercomputers: Scalable, reliable I/O for the Tera-Op technology base
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Ousterhout, John K.; Patterson, David A.
1993-01-01
Computing is seeing an unprecedented improvement in performance; over the last five years there has been an order-of-magnitude improvement in the speeds of workstation CPU's. At least another order of magnitude seems likely in the next five years, to machines with 500 MIPS or more. The goal of the ARPA Teraop program is to realize even larger, more powerful machines, executing as many as a trillion operations per second. Unfortunately, we have seen no comparable breakthroughs in I/O performance; the speeds of I/O devices and the hardware and software architectures for managing them have not changed substantially in many years. We have completed a program of research to demonstrate hardware and software I/O architectures capable of supporting the kinds of internetworked 'visualization' workstations and supercomputers that will appear in the mid 1990s. The project had three overall goals: high performance, high reliability, and scalable, multipurpose system.
Scalable synthesis of sequence-defined, unimolecular macromolecules by Flow-IEG
Leibfarth, Frank A.; Johnson, Jeremiah A.; Jamison, Timothy F.
2015-01-01
We report a semiautomated synthesis of sequence and architecturally defined, unimolecular macromolecules through a marriage of multistep flow synthesis and iterative exponential growth (Flow-IEG). The Flow-IEG system performs three reactions and an in-line purification in a total residence time of under 10 min, effectively doubling the molecular weight of an oligomeric species in an uninterrupted reaction sequence. Further iterations using the Flow-IEG system enable an exponential increase in molecular weight. Incorporating a variety of monomer structures and branching units provides control over polymer sequence and architecture. The synthesis of a uniform macromolecule with a molecular weight of 4,023 g/mol is demonstrated. The user-friendly nature, scalability, and modularity of Flow-IEG provide a general strategy for the automated synthesis of sequence-defined, unimolecular macromolecules. Flow-IEG is thus an enabling tool for theory validation, structure–property studies, and advanced applications in biotechnology and materials science. PMID:26269573
Networking and AI systems: Requirements and benefits
NASA Technical Reports Server (NTRS)
1988-01-01
The price performance benefits of network systems is well documented. The ability to share expensive resources sold timesharing for mainframes, department clusters of minicomputers, and now local area networks of workstations and servers. In the process, other fundamental system requirements emerged. These have now been generalized with open system requirements for hardware, software, applications and tools. The ability to interconnect a variety of vendor products has led to a specification of interfaces that allow new techniques to extend existing systems for new and exciting applications. As an example of the message passing system, local area networks provide a testbed for many of the issues addressed by future concurrent architectures: synchronization, load balancing, fault tolerance and scalability. Gold Hill has been working with a number of vendors on distributed architectures that range from a network of workstations to a hypercube of microprocessors with distributed memory. Results from early applications are promising both for performance and scalability.
Sentinel-1 Interferometry from the Cloud to the Scientist
NASA Astrophysics Data System (ADS)
Garron, J.; Stoner, C.; Johnston, A.; Arko, S. A.
2017-12-01
Big data problems and solutions are growing in the technological and scientific sectors daily. Cloud computing is a vertically and horizontally scalable solution available now for archiving and processing large volumes of data quickly, without significant on-site computing hardware costs. Be that as it may, the conversion of scientific data processors to these powerful platforms requires not only the proof of concept, but the demonstration of credibility in an operational setting. The Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC), in partnership with NASA's Jet Propulsion Laboratory, is exploring the functional architecture of Amazon Web Services cloud computing environment for the processing, distribution and archival of Synthetic Aperture Radar data in preparation for the NASA-ISRO Synthetic Aperture Radar (NISAR) Mission. Leveraging built-in AWS services for logging, monitoring and dashboarding, the GRFN (Getting Ready for NISAR) team has built a scalable processing, distribution and archival system of Sentinel-1 L2 interferograms produced using the ISCE algorithm. This cloud-based functional prototype provides interferograms over selected global land deformation features (volcanoes, land subsidence, seismic zones) and are accessible to scientists via NASA's EarthData Search client and the ASF DAACs primary SAR interface, Vertex, for direct download. The interferograms are produced using nearest-neighbor logic for identifying pairs of granules for interferometric processing, creating deep stacks of BETA products from almost every satellite orbit for scientists to explore. This presentation highlights the functional lessons learned to date from this exercise, including the cost analysis of various data lifecycle policies as implemented through AWS. While demonstrating the architecture choices in support of efficient big science data management, we invite feedback and questions about the process and products from the InSAR community.
Samal, Lipika; D'Amore, John D; Bates, David W; Wright, Adam
2017-11-01
Clinical decision support tools for risk prediction are readily available, but typically require workflow interruptions and manual data entry so are rarely used. Due to new data interoperability standards for electronic health records (EHRs), other options are available. As a clinical case study, we sought to build a scalable, web-based system that would automate calculation of kidney failure risk and display clinical decision support to users in primary care practices. We developed a single-page application, web server, database, and application programming interface to calculate and display kidney failure risk. Data were extracted from the EHR using the Consolidated Clinical Document Architecture interoperability standard for Continuity of Care Documents (CCDs). EHR users were presented with a noninterruptive alert on the patient's summary screen and a hyperlink to details and recommendations provided through a web application. Clinic schedules and CCDs were retrieved using existing application programming interfaces to the EHR, and we provided a clinical decision support hyperlink to the EHR as a service. We debugged a series of terminology and technical issues. The application was validated with data from 255 patients and subsequently deployed to 10 primary care clinics where, over the course of 1 year, 569 533 CCD documents were processed. We validated the use of interoperable documents and open-source components to develop a low-cost tool for automated clinical decision support. Since Consolidated Clinical Document Architecture-based data extraction extends to any certified EHR, this demonstrates a successful modular approach to clinical decision support. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Scalable Implementation of Finite Elements by NASA _ Implicit (ScIFEi)
NASA Technical Reports Server (NTRS)
Warner, James E.; Bomarito, Geoffrey F.; Heber, Gerd; Hochhalter, Jacob D.
2016-01-01
Scalable Implementation of Finite Elements by NASA (ScIFEN) is a parallel finite element analysis code written in C++. ScIFEN is designed to provide scalable solutions to computational mechanics problems. It supports a variety of finite element types, nonlinear material models, and boundary conditions. This report provides an overview of ScIFEi (\\Sci-Fi"), the implicit solid mechanics driver within ScIFEN. A description of ScIFEi's capabilities is provided, including an overview of the tools and features that accompany the software as well as a description of the input and output le formats. Results from several problems are included, demonstrating the efficiency and scalability of ScIFEi by comparing to finite element analysis using a commercial code.
Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd, Joshua S; Gorentla Venkata, Manjunath; Shamis, Pavel
2011-01-01
The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility's Jaguar (OLCF) platform, obtaining better performance and scalability than the native MPI implementation. This paper discuss Cheetah's design and implementation, and optimizations to the framework for Cray XT 5 platforms. Our results show that the Cheetah's Broadcast and Barrier perform better than the native MPI implementation. For medium data,more » the Cheetah's Broadcast outperforms the native MPI implementation by 93% for 49,152 processes problem size. For small and large data, it out performs the native MPI implementation by 10% and 9%, respectively, at 24,576 processes problem size. The Cheetah's Barrier performs 10% better than the native MPI implementation for 12,288 processes problem size.« less
High-Performance Monitoring Architecture for Large-Scale Distributed Systems Using Event Filtering
NASA Technical Reports Server (NTRS)
Maly, K.
1998-01-01
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. We propose a scalable high-performance monitoring architecture for LSD systems to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding end- points management applications such as debugging and reactive control tools to improve the application performance and reliability. A large volume of events may be generated due to the extensive demands of the monitoring applications and the high interaction of LSD systems. The monitoring architecture employs a high-performance event filtering mechanism to efficiently process the large volume of event traffic generated by LSD systems and minimize the intrusiveness of the monitoring process by reducing the event traffic flow in the system and distributing the monitoring computation. Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its Instrumentation and subscription components. As a case study, we show how our monitoring architecture can be utilized to improve the reliability and the performance of the Interactive Remote Instruction (IRI) system which is a large-scale distributed system for collaborative distance learning. The filtering mechanism represents an Intrinsic component integrated with the monitoring architecture to reduce the volume of event traffic flow in the system, and thereby reduce the intrusiveness of the monitoring process. We are developing an event filtering architecture to efficiently process the large volume of event traffic generated by LSD systems (such as distributed interactive applications). This filtering architecture is used to monitor collaborative distance learning application for obtaining debugging and feedback information. Our architecture supports the dynamic (re)configuration and optimization of event filters in large-scale distributed systems. Our work represents a major contribution by (1) survey and evaluating existing event filtering mechanisms In supporting monitoring LSD systems and (2) devising an integrated scalable high- performance architecture of event filtering that spans several kev application domains, presenting techniques to improve the functionality, performance and scalability. This paper describes the primary characteristics and challenges of developing high-performance event filtering for monitoring LSD systems. We survey existing event filtering mechanisms and explain key characteristics for each technique. In addition, we discuss limitations with existing event filtering mechanisms and outline how our architecture will improve key aspects of event filtering.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Layered Architectures for Quantum Computers and Quantum Repeaters
NASA Astrophysics Data System (ADS)
Jones, Nathan C.
This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.
Requirements for an Integrated UAS CNS Architecture
NASA Technical Reports Server (NTRS)
Templin, Fred L.; Jain, Raj; Sheffield, Greg; Taboso-Ballesteros, Pedro; Ponchak, Denise
2017-01-01
Communications, Navigation and Surveillance (CNS) requirements must be developed in order to establish a CNS architecture supporting Unmanned Air Systems integration in the National Air Space (UAS in the NAS). These requirements must address cybersecurity, future communications, satellite-based navigation and APNT, and scalable surveillance and situational awareness. CNS integration, consolidation and miniaturization requirements are also important to support the explosive growth in small UAS deployment. Air Traffic Management (ATM) must also be accommodated to support critical Command and Control (C2) for Air Traffic Controllers (ATC). This document therefore presents UAS CNS requirements that will guide the architecture.
Automation Hooks Architecture for Flexible Test Orchestration - Concept Development and Validation
NASA Technical Reports Server (NTRS)
Lansdowne, C. A.; Maclean, John R.; Winton, Chris; McCartney, Pat
2011-01-01
The Automation Hooks Architecture Trade Study for Flexible Test Orchestration sought a standardized data-driven alternative to conventional automated test programming interfaces. The study recommended composing the interface using multicast DNS (mDNS/SD) service discovery, Representational State Transfer (Restful) Web Services, and Automatic Test Markup Language (ATML). We describe additional efforts to rapidly mature the Automation Hooks Architecture candidate interface definition by validating it in a broad spectrum of applications. These activities have allowed us to further refine our concepts and provide observations directed toward objectives of economy, scalability, versatility, performance, severability, maintainability, scriptability and others.
Mit castor satellite: Design, implementation, and testing of the communication system
NASA Astrophysics Data System (ADS)
Babuscia, Alessandra; McCormack, Matthew Michael; Munoz, Michael; Parra, Spencer; Miller, David W.
2012-12-01
Cathode Anode Satellite Thruster for Orbital Reposition (CASTOR) is an orbital manoeuvre and transfer micro-satellite bus developed at MIT Space System Laboratory. The technical objective of the mission is achieving 1 km/s of delta-V over a 1 year mission in Low Earth Orbit (LEO). This will be accomplished using a novel electric propulsion system, the Diverging Cusped Field Thruster (DCFT), which enables high efficiency orbital changes of the ESPA-ring class satellite. CASTOR is capable of improving rapid access to space capabilities by providing an orbital transfer platform with a very high performance to mass ratio, thus greatly reducing launch costs and allowing for highly efficient orbital manoeuvre. Furthermore, CASTOR is highly scalable and modular, allowing it to be adapted to a wide range of scales and applications. CASTOR is developed as part of the University Nanosatellite Program (UNP) funded by Air Force Research Laboratory (AFRL). In order to accomplish CASTOR mission objective, a highly optimized, scalable, light weight, and low cost communication system needed to be developed. These constraints imply the development of trade studies to select the final communication system architecture able to maximize the amount of data transmitted, while guaranteeing reliability, redundancy and limited mass, power consumption, and cost. A special attention is also required to guarantee a reliable communication system in cases of tumbling, or in case of strong Doppler shift which is inevitable due to the high delta-V capabilities of the vehicle. In order to accomplish all the mission requirements, different features have been introduced in the design of the communication system for this mission. Specifically, customized patch antennas have been realized, and a customized communication protocol has been designed and implemented. The communication subsystem has been validated through an intense testing campaign which included software tests in the laboratory, hardware tests in anechoic chamber, and in flight tests through a balloon experiment. The article presents an overview of CASTOR mission, a presentation of the trade studies analysis and of the final communication architecture selected, a description of the customized antenna developed, of the customized protocol designed, and a presentation of the results of the tests performed.
Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J
2004-09-01
We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Carrillo, Snaider; Harkin, Jim; McDaid, Liam; Pande, Sandeep; Cawley, Seamus; McGinley, Brian; Morgan, Fearghal
2012-09-01
The brain is highly efficient in how it processes information and tolerates faults. Arguably, the basic processing units are neurons and synapses that are interconnected in a complex pattern. Computer scientists and engineers aim to harness this efficiency and build artificial neural systems that can emulate the key information processing principles of the brain. However, existing approaches cannot provide the dense interconnect for the billions of neurons and synapses that are required. Recently a reconfigurable and biologically inspired paradigm based on network-on-chip (NoC) and spiking neural networks (SNNs) has been proposed as a new method of realising an efficient, robust computing platform. However, the use of the NoC as an interconnection fabric for large-scale SNNs demands a good trade-off between scalability, throughput, neuron/synapse ratio and power consumption. This paper presents a novel traffic-aware, adaptive NoC router, which forms part of a proposed embedded mixed-signal SNN architecture called EMBRACE (EMulating Biologically-inspiRed ArChitectures in hardwarE). The proposed adaptive NoC router provides the inter-neuron connectivity for EMBRACE, maintaining router communication and avoiding dropped router packets by adapting to router traffic congestion. Results are presented on throughput, power and area performance analysis of the adaptive router using a 90 nm CMOS technology which outperforms existing NoCs in this domain. The adaptive behaviour of the router is also verified on a Stratix II FPGA implementation of a 4 × 2 router array with real-time traffic congestion. The presented results demonstrate the feasibility of using the proposed adaptive NoC router within the EMBRACE architecture to realise large-scale SNNs on embedded hardware. Copyright © 2012 Elsevier Ltd. All rights reserved.
Diamond, Alan; Nowotny, Thomas; Schmuker, Michael
2016-01-01
Neuromorphic computing employs models of neuronal circuits to solve computing problems. Neuromorphic hardware systems are now becoming more widely available and “neuromorphic algorithms” are being developed. As they are maturing toward deployment in general research environments, it becomes important to assess and compare them in the context of the applications they are meant to solve. This should encompass not just task performance, but also ease of implementation, speed of processing, scalability, and power efficiency. Here, we report our practical experience of implementing a bio-inspired, spiking network for multivariate classification on three different platforms: the hybrid digital/analog Spikey system, the digital spike-based SpiNNaker system, and GeNN, a meta-compiler for parallel GPU hardware. We assess performance using a standard hand-written digit classification task. We found that whilst a different implementation approach was required for each platform, classification performances remained in line. This suggests that all three implementations were able to exercise the model's ability to solve the task rather than exposing inherent platform limits, although differences emerged when capacity was approached. With respect to execution speed and power consumption, we found that for each platform a large fraction of the computing time was spent outside of the neuromorphic device, on the host machine. Time was spent in a range of combinations of preparing the model, encoding suitable input spiking data, shifting data, and decoding spike-encoded results. This is also where a large proportion of the total power was consumed, most markedly for the SpiNNaker and Spikey systems. We conclude that the simulation efficiency advantage of the assessed specialized hardware systems is easily lost in excessive host-device communication, or non-neuronal parts of the computation. These results emphasize the need to optimize the host-device communication architecture for scalability, maximum throughput, and minimum latency. Moreover, our results indicate that special attention should be paid to minimize host-device communication when designing and implementing networks for efficient neuromorphic computing. PMID:26778950
Research of future network with multi-layer IP address
NASA Astrophysics Data System (ADS)
Li, Guoling; Long, Zhaohua; Wei, Ziqiang
2018-04-01
The shortage of IP addresses and the scalability of routing systems [1] are challenges for the Internet. The idea of dividing existing IP addresses between identities and locations is one of the important research directions. This paper proposed a new decimal network architecture based on IPv9 [11], and decimal network IP address from E.164 principle of traditional telecommunication network, the IP address level, which helps to achieve separation and identification and location of IP address, IP address form a multilayer network structure, routing scalability problem in remission at the same time, to solve the problem of IPv4 address depletion. On the basis of IPv9, a new decimal network architecture is proposed, and the IP address of the decimal network draws on the E.164 principle of the traditional telecommunication network, and the IP addresses are hierarchically divided, which helps to realize the identification and location separation of IP addresses, the formation of multi-layer IP address network structure, while easing the scalability of the routing system to find a way out of IPv4 address exhausted. In addition to modifying DNS [10] simply and adding the function of digital domain, a DDNS [12] is formed. At the same time, a gateway device is added, that is, IPV9 gateway. The original backbone network and user network are unchanged.
Scalable digital hardware for a trapped ion quantum computer
NASA Astrophysics Data System (ADS)
Mount, Emily; Gaultney, Daniel; Vrijsen, Geert; Adams, Michael; Baek, So-Young; Hudek, Kai; Isabella, Louis; Crain, Stephen; van Rynbach, Andre; Maunz, Peter; Kim, Jungsang
2016-12-01
Many of the challenges of scaling quantum computer hardware lie at the interface between the qubits and the classical control signals used to manipulate them. Modular ion trap quantum computer architectures address scalability by constructing individual quantum processors interconnected via a network of quantum communication channels. Successful operation of such quantum hardware requires a fully programmable classical control system capable of frequency stabilizing the continuous wave lasers necessary for loading, cooling, initialization, and detection of the ion qubits, stabilizing the optical frequency combs used to drive logic gate operations on the ion qubits, providing a large number of analog voltage sources to drive the trap electrodes, and a scheme for maintaining phase coherence among all the controllers that manipulate the qubits. In this work, we describe scalable solutions to these hardware development challenges.
Interconnection network architectures based on integrated orbital angular momentum emitters
NASA Astrophysics Data System (ADS)
Scaffardi, Mirco; Zhang, Ning; Malik, Muhammad Nouman; Lazzeri, Emma; Klitis, Charalambos; Lavery, Martin; Sorel, Marc; Bogoni, Antonella
2018-02-01
Novel architectures for two-layer interconnection networks based on concentric OAM emitters are presented. A scalability analysis is done in terms of devices characteristics, power budget and optical signal to noise ratio by exploiting experimentally measured parameters. The analysis shows that by exploiting optical amplifications, the proposed interconnection networks can support a number of ports higher than 100. The OAM crosstalk induced-penalty, evaluated through an experimental characterization, do not significantly affect the interconnection network performance.
3D structural patterns in scalable, elastomeric scaffolds guide engineered tissue architecture.
Kolewe, Martin E; Park, Hyoungshin; Gray, Caprice; Ye, Xiaofeng; Langer, Robert; Freed, Lisa E
2013-08-27
Microfabricated elastomeric scaffolds with 3D structural patterns are created by semiautomated layer-by-layer assembly of planar polymer sheets with through-pores. The mesoscale interconnected pore architectures governed by the relative alignment of layers are shown to direct cell and muscle-like fiber orientation in both skeletal and cardiac muscle, enabling scale up of tissue constructs towards clinically relevant dimensions. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Disparity : scalable anomaly detection for clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, N.; Bradshaw, R.; Lusk, E.
2008-01-01
In this paper, we describe disparity, a tool that does parallel, scalable anomaly detection for clusters. Disparity uses basic statistical methods and scalable reduction operations to perform data reduction on client nodes and uses these results to locate node anomalies. We discuss the implementation of disparity and present results of its use on a SiCortex SC5832 system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.
Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
Entangling spin-spin interactions of ions in individually controlled potential wells
NASA Astrophysics Data System (ADS)
Wilson, Andrew; Colombe, Yves; Brown, Kenton; Knill, Emanuel; Leibfried, Dietrich; Wineland, David
2014-03-01
Physical systems that cannot be modeled with classical computers appear in many different branches of science, including condensed-matter physics, statistical mechanics, high-energy physics, atomic physics and quantum chemistry. Despite impressive progress on the control and manipulation of various quantum systems, implementation of scalable devices for quantum simulation remains a formidable challenge. As one approach to scalability in simulation, here we demonstrate an elementary building-block of a configurable quantum simulator based on atomic ions. Two ions are trapped in separate potential wells that can individually be tailored to emulate a number of different spin-spin couplings mediated by the ions' Coulomb interaction together with classical laser and microwave fields. We demonstrate deterministic tuning of this interaction by independent control of the local wells and emulate a particular spin-spin interaction to entangle the internal states of the two ions with 0.81(2) fidelity. Extension of the building-block demonstrated here to a 2D-network, which ion-trap micro-fabrication processes enable, may provide a new quantum simulator architecture with broad flexibility in designing and scaling the arrangement of ions and their mutual interactions. This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), ONR, and the NIST Quantum Information Program.
NASA Astrophysics Data System (ADS)
Haemisch, York; Frach, Thomas; Degenhardt, Carsten; Thon, Andreas
Silicon Photomultipliers (SiPMs) have emerged as promising alternative to fast vacuum photomultiplier tubes (PMT). A fully digital implementation of the Silicon Photomultiplier (dSiPM) has been developed in order to overcome the deficiencies and limitations of the so far only analog SiPMs (aSiPMs). Our sensor is based on arrays of single photon avalanche photodiodes (SPADs) integrated in a standard CMOS process. Photons are detected directly by sensing the voltage at the SPAD anode using a dedicated cell electronics block next to each diode. This block also contains active quenching and recharge circuits as well as a one bit memory for the selective inhibit of detector cells. A balanced trigger network is used to propagate the trigger signal from all cells to the integrated time-to-digital converter. In consequence, photons are detected and counted as digital signals, thus making the sensor less susceptible to temperature variations and electronic noise. The integration with CMOS logic provides the added benefit of low power consumption and possible integration of data post-processing directly in the sensor. In this overview paper, we discuss the sensor architecture together with its characteristics with a focus on scalability and practicability aspects for applications in medical imaging, high energy- and astrophysics.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Machine Learning in the Big Data Era: Are We There Yet?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas Rangan
In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstandingmore » challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less
Synthetic cognitive development. Where intelligence comes from
NASA Astrophysics Data System (ADS)
Weinbaum (Weaver), D.; Veitas, V.
2017-01-01
The human cognitive system is a remarkable exemplar of a general intelligent system whose competence is not confined to a specific problem domain. Evidently, general cognitive competences are a product of a prolonged and complex process of cognitive development. Therefore, the process of cognitive development is a primary key to understanding the emergence of intelligent behavior. This paper develops the theoretical foundations for a model that generalizes the process of cognitive development. The model aims to provide a realistic scheme for the synthesis of scalable cognitive systems with an open-ended range of capabilities. Major concepts and theories of human cognitive development are introduced and briefly explored, focusing on the enactive approach to cognition and the concept of sense-making. The initial scheme of human cognitive development is then generalized by introducing the philosophy of individuation and the abstract mechanism of transduction. The theory of individuation provides the ground for the necessary paradigmatic shift from cognitive systems as given products to cognitive development as a formative process of self-organization. Next, the conceptual model is specified as a scalable scheme of networks of agents. The mechanisms of individuation are formulated in context-independent information theoretical terms. Finally, the paper discusses two concrete aspects of the generative model - mechanisms of transduction and value modulating systems. These are topics of further research towards an implementable architecture.
NASA Astrophysics Data System (ADS)
Rodriguez, M.; Brualla, L.
2018-04-01
Monte Carlo simulation of radiation transport is computationally demanding to obtain reasonably low statistical uncertainties of the estimated quantities. Therefore, it can benefit in a large extent from high-performance computing. This work is aimed at assessing the performance of the first generation of the many-integrated core architecture (MIC) Xeon Phi coprocessor with respect to that of a CPU consisting of a double 12-core Xeon processor in Monte Carlo simulation of coupled electron-photonshowers. The comparison was made twofold, first, through a suite of basic tests including parallel versions of the random number generators Mersenne Twister and a modified implementation of RANECU. These tests were addressed to establish a baseline comparison between both devices. Secondly, through the p DPM code developed in this work. p DPM is a parallel version of the Dose Planning Method (DPM) program for fast Monte Carlo simulation of radiation transport in voxelized geometries. A variety of techniques addressed to obtain a large scalability on the Xeon Phi were implemented in p DPM. Maximum scalabilities of 84 . 2 × and 107 . 5 × were obtained in the Xeon Phi for simulations of electron and photon beams, respectively. Nevertheless, in none of the tests involving radiation transport the Xeon Phi performed better than the CPU. The disadvantage of the Xeon Phi with respect to the CPU owes to the low performance of the single core of the former. A single core of the Xeon Phi was more than 10 times less efficient than a single core of the CPU for all radiation transport simulations.
Developing a modular architecture for creation of rule-based clinical diagnostic criteria.
Hong, Na; Pathak, Jyotishman; Chute, Christopher G; Jiang, Guoqian
2016-01-01
With recent advances in computerized patient records system, there is an urgent need for producing computable and standards-based clinical diagnostic criteria. Notably, constructing rule-based clinical diagnosis criteria has become one of the goals in the International Classification of Diseases (ICD)-11 revision. However, few studies have been done in building a unified architecture to support the need for diagnostic criteria computerization. In this study, we present a modular architecture for enabling the creation of rule-based clinical diagnostic criteria leveraging Semantic Web technologies. The architecture consists of two modules: an authoring module that utilizes a standards-based information model and a translation module that leverages Semantic Web Rule Language (SWRL). In a prototype implementation, we created a diagnostic criteria upper ontology (DCUO) that integrates ICD-11 content model with the Quality Data Model (QDM). Using the DCUO, we developed a transformation tool that converts QDM-based diagnostic criteria into Semantic Web Rule Language (SWRL) representation. We evaluated the domain coverage of the upper ontology model using randomly selected diagnostic criteria from broad domains (n = 20). We also tested the transformation algorithms using 6 QDM templates for ontology population and 15 QDM-based criteria data for rule generation. As the results, the first draft of DCUO contains 14 root classes, 21 subclasses, 6 object properties and 1 data property. Investigation Findings, and Signs and Symptoms are the two most commonly used element types. All 6 HQMF templates are successfully parsed and populated into their corresponding domain specific ontologies and 14 rules (93.3 %) passed the rule validation. Our efforts in developing and prototyping a modular architecture provide useful insight into how to build a scalable solution to support diagnostic criteria representation and computerization.
NASA Astrophysics Data System (ADS)
Mazurov, Alexander; Couturier, Ben; Popov, Dmitry; Farley, Nathanael
2017-10-01
Any time you modify an implementation within a program, change compiler version or operating system, you should also do regression testing. You can do regression testing by rerunning existing tests against the changes to determine whether this breaks anything that worked prior to the change and by writing new tests where necessary. At LHCb we have a huge codebase which is maintained by many people and can be run within different setups. Such situations lead to the crucial necessity to guide refactoring with a central profiling system that helps to run tests and find the impact of changes. In our work we present a software architecture and tools for running a profiling system. This system is responsible for systematically running regression tests, collecting and comparing results of these tests so changes between different setups can be observed and reported. The main feature of our solution is that it is based on a microservices architecture. Microservices break a large project into loosely coupled modules, which communicate with each other through simple APIs. Such modular architectural style helps us to avoid general pitfalls of monolithic architectures such as hard to understand a codebase as well as maintaining a large codebase and ineffective scalability. Our solution also allows to escape a complexity of microservices deployment process by using software containers and services management tools. Containers and service managers let us quickly deploy linked modules in development, production or in any other environments. Most of the developed modules are generic which means that the proposed architecture and tools can be used not only in LHCb but adopted for other experiments and companies.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.
Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.
A Core Plug and Play Architecture for Reusable Flight Software Systems
NASA Technical Reports Server (NTRS)
Wilmot, Jonathan
2006-01-01
The Flight Software Branch, at Goddard Space Flight Center (GSFC), has been working on a run-time approach to facilitate a formal software reuse process. The reuse process is designed to enable rapid development and integration of high-quality software systems and to more accurately predict development costs and schedule. Previous reuse practices have been somewhat successful when the same teams are moved from project to project. But this typically requires taking the software system in an all-or-nothing approach where useful components cannot be easily extracted from the whole. As a result, the system is less flexible and scalable with limited applicability to new projects. This paper will focus on the rationale behind, and implementation of the run-time executive. This executive is the core for the component-based flight software commonality and reuse process adopted at Goddard.
Designing, programming, and optimizing a (small) quantum computer
NASA Astrophysics Data System (ADS)
Svore, Krysta
In 1982, Richard Feynman proposed to use a computer founded on the laws of quantum physics to simulate physical systems. In the more than thirty years since, quantum computers have shown promise to solve problems in number theory, chemistry, and materials science that would otherwise take longer than the lifetime of the universe to solve on an exascale classical machine. The practical realization of a quantum computer requires understanding and manipulating subtle quantum states while experimentally controlling quantum interference. It also requires an end-to-end software architecture for programming, optimizing, and implementing a quantum algorithm on the quantum device hardware. In this talk, we will introduce recent advances in connecting abstract theory to present-day real-world applications through software. We will highlight recent advancement of quantum algorithms and the challenges in ultimately performing a scalable solution on a quantum device.
NASA Astrophysics Data System (ADS)
Aonishi, Toru; Mimura, Kazushi; Utsunomiya, Shoko; Okada, Masato; Yamamoto, Yoshihisa
2017-10-01
The coherent Ising machine (CIM) has attracted attention as one of the most effective Ising computing architectures for solving large scale optimization problems because of its scalability and high-speed computational ability. However, it is difficult to implement the Ising computation in the CIM because the theories and techniques of classical thermodynamic equilibrium Ising spin systems cannot be directly applied to the CIM. This means we have to adapt these theories and techniques to the CIM. Here we focus on a ferromagnetic model and a finite loading Hopfield model, which are canonical models sharing a common mathematical structure with almost all other Ising models. We derive macroscopic equations to capture nonequilibrium phase transitions in these models. The statistical mechanical methods developed here constitute a basis for constructing evaluation methods for other Ising computation models.
Modelling multimedia teleservices with OSI upper layers framework: Short paper
NASA Astrophysics Data System (ADS)
Widya, I.; Vanrijssen, E.; Michiels, E.
The paper presents the use of the concepts and modelling principles of the Open Systems Interconnection (OSI) upper layers structure in the modelling of multimedia teleservices. It puts emphasis on the revised Application Layer Structure (OSI/ALS). OSI/ALS is an object based reference model which intends to coordinate the development of application oriented services and protocols in a consistent and modular way. It enables the rapid deployment and integrated use of these services. The paper emphasizes further on the nesting structure defined in OSI/ALS which allows the design of scalable and user tailorable/controllable teleservices. OSI/ALS consistent teleservices are moreover implementable on communication platforms of different capabilities. An analysis of distributed multimedia architectures which can be found in the literature, confirms the ability of the OSI/ALS framework to model the interworking functionalities of teleservices.
Cloud Computing in Support of Synchronized Disaster Response Operations
2010-09-01
scalable, Web application based on cloud computing technologies to facilitate communication between a broad range of public and private entities without...requiring them to compromise security or competitive advantage. The proposed design applies the unique benefits of cloud computing architectures such as
A Collaborative Web-Based Architecture For Sharing ToxCast Data
Collaborative Drug Discovery (CDD) has created a scalable platform that combines traditional drug discovery informatics with Web2.0 features. Traditional drug discovery capabilities include substructure, similarity searching and export to excel or sdf formats. Web2.0 features inc...
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2017-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Scalable Quantum Networks for Distributed Computing and Sensing
2016-04-01
probabilistic measurement , so we developed quantum memories and guided-wave implementations of same, demonstrating controlled delay of a heralded single...Second, fundamental scalability requires a method to synchronize protocols based on quantum measurements , which are inherently probabilistic. To meet...AFRL-AFOSR-UK-TR-2016-0007 Scalable Quantum Networks for Distributed Computing and Sensing Ian Walmsley THE UNIVERSITY OF OXFORD Final Report 04/01
Takeda, Shuntaro; Furusawa, Akira
2017-09-22
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
The architecture of the High Performance Storage System (HPSS)
NASA Technical Reports Server (NTRS)
Teaff, Danny; Watson, Dick; Coyne, Bob
1994-01-01
The rapid growth in the size of datasets has caused a serious imbalance in I/O and storage system performance and functionality relative to application requirements and the capabilities of other system components. The High Performance Storage System (HPSS) is a scalable, next-generation storage system that will meet the functionality and performance requirements or large-scale scientific and commercial computing environments. Our goal is to improve the performance and capacity of storage by two orders of magnitude or more over what is available in the general or mass marketplace today. We are also providing corresponding improvements in architecture and functionality. This paper describes the architecture and functionality of HPSS.
NASA Astrophysics Data System (ADS)
Takeda, Shuntaro; Furusawa, Akira
2017-09-01
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
RAIN: A Bio-Inspired Communication and Data Storage Infrastructure.
Monti, Matteo; Rasmussen, Steen
2017-01-01
We summarize the results and perspectives from a companion article, where we presented and evaluated an alternative architecture for data storage in distributed networks. We name the bio-inspired architecture RAIN, and it offers file storage service that, in contrast with current centralized cloud storage, has privacy by design, is open source, is more secure, is scalable, is more sustainable, has community ownership, is inexpensive, and is potentially faster, more efficient, and more reliable. We propose that a RAIN-style architecture could form the backbone of the Internet of Things that likely will integrate multiple current and future infrastructures ranging from online services and cryptocurrency to parts of government administration.
Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo
2018-02-01
Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Serial Back-Plane Technologies in Advanced Avionics Architectures
NASA Technical Reports Server (NTRS)
Varnavas, Kosta
2005-01-01
Current back plane technologies such as VME, and current personal computer back planes such as PCI, are shared bus systems that can exhibit nondeterministic latencies. This means a card can take control of the bus and use resources indefinitely affecting the ability of other cards in the back plane to acquire the bus. This provides a real hit on the reliability of the system. Additionally, these parallel busses only have bandwidths in the 100s of megahertz range and EMI and noise effects get worse the higher the bandwidth goes. To provide scalable, fault-tolerant, advanced computing systems, more applicable to today s connected computing environment and to better meet the needs of future requirements for advanced space instruments and vehicles, serial back-plane technologies should be implemented in advanced avionics architectures. Serial backplane technologies eliminate the problem of one card getting the bus and never relinquishing it, or one minor problem on the backplane bringing the whole system down. Being serial instead of parallel improves the reliability by reducing many of the signal integrity issues associated with parallel back planes and thus significantly improves reliability. The increased speeds associated with a serial backplane are an added bonus.
Local wavelet transform: a cost-efficient custom processor for space image compression
NASA Astrophysics Data System (ADS)
Masschelein, Bart; Bormans, Jan G.; Lafruit, Gauthier
2002-11-01
Thanks to its intrinsic scalability features, the wavelet transform has become increasingly popular as decorrelator in image compression applications. Throuhgput, memory requirements and complexity are important parameters when developing hardware image compression modules. An implementation of the classical, global wavelet transform requires large memory sizes and implies a large latency between the availability of the input image and the production of minimal data entities for entropy coding. Image tiling methods, as proposed by JPEG2000, reduce the memory sizes and the latency, but inevitably introduce image artefacts. The Local Wavelet Transform (LWT), presented in this paper, is a low-complexity wavelet transform architecture using a block-based processing that results in the same transformed images as those obtained by the global wavelet transform. The architecture minimizes the processing latency with a limited amount of memory. Moreover, as the LWT is an instruction-based custom processor, it can be programmed for specific tasks, such as push-broom processing of infinite-length satelite images. The features of the LWT makes it appropriate for use in space image compression, where high throughput, low memory sizes, low complexity, low power and push-broom processing are important requirements.
Selecting the Right Courseware for Your Online Learning Program.
ERIC Educational Resources Information Center
O'Mara, Heather
2000-01-01
Presents criteria for selecting courseware for online classes. Highlights include ease of use, including navigation; assessment tools; advantages of Java-enabled courseware; advantages of Oracle databases, including scalability; future possibilities for multimedia technology; and open architecture that will integrate with other systems. (LRW)
Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS): architecture.
Mandl, Kenneth D; Kohane, Isaac S; McFadden, Douglas; Weber, Griffin M; Natter, Marc; Mandel, Joshua; Schneeweiss, Sebastian; Weiler, Sarah; Klann, Jeffrey G; Bickel, Jonathan; Adams, William G; Ge, Yaorong; Zhou, Xiaobo; Perkins, James; Marsolo, Keith; Bernstam, Elmer; Showalter, John; Quarshie, Alexander; Ofili, Elizabeth; Hripcsak, George; Murphy, Shawn N
2014-01-01
We describe the architecture of the Patient Centered Outcomes Research Institute (PCORI) funded Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS, http://www.SCILHS.org) clinical data research network, which leverages the $48 billion dollar federal investment in health information technology (IT) to enable a queryable semantic data model across 10 health systems covering more than 8 million patients, plugging universally into the point of care, generating evidence and discovery, and thereby enabling clinician and patient participation in research during the patient encounter. Central to the success of SCILHS is development of innovative 'apps' to improve PCOR research methods and capacitate point of care functions such as consent, enrollment, randomization, and outreach for patient-reported outcomes. SCILHS adapts and extends an existing national research network formed on an advanced IT infrastructure built with open source, free, modular components. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
BlueSky Cloud Framework: An E-Learning Framework Embracing Cloud Computing
NASA Astrophysics Data System (ADS)
Dong, Bo; Zheng, Qinghua; Qiao, Mu; Shu, Jian; Yang, Jie
Currently, E-Learning has grown into a widely accepted way of learning. With the huge growth of users, services, education contents and resources, E-Learning systems are facing challenges of optimizing resource allocations, dealing with dynamic concurrency demands, handling rapid storage growth requirements and cost controlling. In this paper, an E-Learning framework based on cloud computing is presented, namely BlueSky cloud framework. Particularly, the architecture and core components of BlueSky cloud framework are introduced. In BlueSky cloud framework, physical machines are virtualized, and allocated on demand for E-Learning systems. Moreover, BlueSky cloud framework combines with traditional middleware functions (such as load balancing and data caching) to serve for E-Learning systems as a general architecture. It delivers reliable, scalable and cost-efficient services to E-Learning systems, and E-Learning organizations can establish systems through these services in a simple way. BlueSky cloud framework solves the challenges faced by E-Learning, and improves the performance, availability and scalability of E-Learning systems.
Technology for On-Chip Qubit Control with Microfabricated Surface Ion Traps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Highstrete, Clark; Scott, Sean Michael; Nordquist, Christopher D.
2013-11-01
Trapped atomic ions are a leading physical system for quantum information processing. However, scalability and operational fidelity remain limiting technical issues often associated with optical qubit control. One promising approach is to develop on-chip microwave electronic control of ion qubits based on the atomic hyperfine interaction. This project developed expertise and capabilities at Sandia toward on-chip electronic qubit control in a scalable architecture. The project developed a foundation of laboratory capabilities, including trapping the 171Yb + hyperfine ion qubit and developing an experimental microwave coherent control capability. Additionally, the project investigated the integration of microwave device elements with surface ionmore » traps utilizing Sandia’s state-of-the-art MEMS microfabrication processing. This effort culminated in a device design for a multi-purpose ion trap experimental platform for investigating on-chip microwave qubit control, laying the groundwork for further funded R&D to develop on-chip microwave qubit control in an architecture that is suitable to engineering development.« less
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data
Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191
Connecting Architecture and Implementation
NASA Astrophysics Data System (ADS)
Buchgeher, Georg; Weinreich, Rainer
Software architectures are still typically defined and described independently from implementation. To avoid architectural erosion and drift, architectural representation needs to be continuously updated and synchronized with system implementation. Existing approaches for architecture representation like informal architecture documentation, UML diagrams, and Architecture Description Languages (ADLs) provide only limited support for connecting architecture descriptions and implementations. Architecture management tools like Lattix, SonarJ, and Sotoarc and UML-tools tackle this problem by extracting architecture information directly from code. This approach works for low-level architectural abstractions like classes and interfaces in object-oriented systems but fails to support architectural abstractions not found in programming languages. In this paper we present an approach for linking and continuously synchronizing a formalized architecture representation to an implementation. The approach is a synthesis of functionality provided by code-centric architecture management and UML tools and higher-level architecture analysis approaches like ADLs.
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
NASA Technical Reports Server (NTRS)
Stovall, John R.; Wray, Richard B.
1994-01-01
This paper presents a description of a model for a space vehicle operational scenario and the commands for avionics. This model will be used in developing a dynamic architecture simulation model using the Statemate CASE tool for validation of the Space Generic Open Avionics Architecture (SGOAA). The SGOAA has been proposed as an avionics architecture standard to NASA through its Strategic Avionics Technology Working Group (SATWG) and has been accepted by the Society of Automotive Engineers (SAE) for conversion into an SAE Avionics Standard. This architecture was developed for the Flight Data Systems Division (FDSD) of the NASA Johnson Space Center (JSC) by the Lockheed Engineering and Sciences Company (LESC), Houston, Texas. This SGOAA includes a generic system architecture for the entities in spacecraft avionics, a generic processing external and internal hardware architecture, and a nine class model of interfaces. The SGOAA is both scalable and recursive and can be applied to any hierarchical level of hardware/software processing systems.
Design of Power System Architectures for Small Spacecraft Systems
NASA Technical Reports Server (NTRS)
Momoh, James A.; Subramonian, Rama; Dias, Lakshman G.
1996-01-01
The objective of this research is to perform a trade study on several candidate power system architectures for small spacecrafts to be used in NASA's new millennium program. Three initial candidate architectures have been proposed by NASA and two other candidate architectures have been proposed by Howard University. Howard University is currently conducting the necessary analysis, synthesis, and simulation needed to perform the trade studies and arrive at the optimal power system architecture. Statistical, sensitivity and tolerant studies has been performed on the systems. It is concluded from present studies that certain components such as the series regulators, buck-boost converters and power converters can be minimized while retaining the desired functionality of the overall architecture. This in conjunction with battery scalability studies and system efficiency studies have enabled us to develop more economic architectures. Future studies will include artificial neural networks and fuzzy logic to analyze the performance of the systems. Fault simulation studies and fault diagnosis studies using EMTP and artificial neural networks will also be conducted.
SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases
NASA Astrophysics Data System (ADS)
Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.
One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.
Performances of multiprocessor multidisk architectures for continuous media storage
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.
1996-03-01
Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
A SiGe Quadrature Pulse Modulator for Superconducting Qubit State Manipulation
NASA Astrophysics Data System (ADS)
Kwende, Randy; Bardin, Joseph
Manipulation of the quantum states of microwave superconducting qubits typically requires the generation of coherent modulated microwave pulses. While many off-the-shelf instruments are capable of generating such pulses, a more integrated approach is likely required if fault-tolerant quantum computing architectures are to be implemented. In this work, we present progress towards a pulse generator specifically designed to drive superconducing qubits. The device is implemented in a commercial silicon process and has been designed with energy-efficiency and scalability in mind. Pulse generation is carried out using a unique approach in which modulation is applied directly to the in-phase and quadrature components of a carrier signal in the 1-10 GHz frequency range through a unique digital-analog conversion process designed specifically for this application. The prototype pulse generator can be digitally programmed and supports sequencing of pulses with independent amplitude and phase waveforms. These amplitude and phase waveforms can be digitally programmed through a serial programming interface. Detailed performance of the pulse generator at room temperature and 4 K will be presented.
Integration of High-Performance Computing into Cloud Computing Services
NASA Astrophysics Data System (ADS)
Vouk, Mladen A.; Sills, Eric; Dreher, Patrick
High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).
OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Seyong; Vetter, Jeffrey S
2014-01-01
Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research framework to address those issues in the directive-based accelerator programming. This paper explains important design strategies and key compiler transformation techniques needed to implement the reference OpenACC compiler. Moreover, this paper demonstrates the efficacy of OpenARC as a research framework for directive-based programming study, by proposing andmore » implementing OpenACC extensions in the OpenARC framework to 1) support hybrid programming of the unified memory and separate memory and 2) exploit architecture-specific features in an abstract manner. Porting thirteen standard OpenACC programs and three extended OpenACC programs to CUDA GPUs shows that OpenARC performs similarly to a commercial OpenACC compiler, while it serves as a high-level research framework.« less
Empirical study of parallel LRU simulation algorithms
NASA Technical Reports Server (NTRS)
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
NASA Astrophysics Data System (ADS)
Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain
2014-11-01
Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.
A CMOS ASIC Design for SiPM Arrays
Dey, Samrat; Banks, Lushon; Chen, Shaw-Pin; Xu, Wenbin; Lewellen, Thomas K.; Miyaoka, Robert S.; Rudell, Jacques C.
2012-01-01
Our lab has previously reported on novel board-level readout electronics for an 8×8 silicon photomultiplier (SiPM) array featuring row/column summation technique to reduce the hardware requirements for signal processing. We are taking the next step by implementing a monolithic CMOS chip which is based on the row-column architecture. In addition, this paper explores the option of using diagonal summation as well as calibration to compensate for temperature and process variations. Further description of a timing pickoff signal which aligns all of the positioning (spatial channels) pulses in the array is described. The ASIC design is targeted to be scalable with the detector size and flexible to accommodate detectors from different vendors. This paper focuses on circuit implementation issues associated with the design of the ASIC to interface our Phase II MiCES FPGA board with a SiPM array. Moreover, a discussion is provided for strategies to eventually integrate all the analog and mixed-signal electronics with the SiPM, on either a single-silicon substrate or multi-chip module (MCM). PMID:24825923
Semantically Enhanced Online Configuration of Feedback Control Schemes.
Milis, Georgios M; Panayiotou, Christos G; Polycarpou, Marios M
2018-03-01
Recent progress toward the realization of the "Internet of Things" has improved the ability of physical and soft/cyber entities to operate effectively within large-scale, heterogeneous systems. It is important that such capacity be accompanied by feedback control capabilities sufficient to ensure that the overall systems behave according to their specifications and meet their functional objectives. To achieve this, such systems require new architectures that facilitate the online deployment, composition, interoperability, and scalability of control system components. Most current control systems lack scalability and interoperability because their design is based on a fixed configuration of specific components, with knowledge of their individual characteristics only implicitly passed through the design. This paper addresses the need for flexibility when replacing components or installing new components, which might occur when an existing component is upgraded or when a new application requires a new component, without the need to readjust or redesign the overall system. A semantically enhanced feedback control architecture is introduced for a class of systems, aimed at accommodating new components into a closed-loop control framework by exploiting the semantic inference capabilities of an ontology-based knowledge model. This architecture supports continuous operation of the control system, a crucial property for large-scale systems for which interruptions have negative impact on key performance metrics that may include human comfort and welfare or economy costs. A case-study example from the smart buildings domain is used to illustrate the proposed architecture and semantic inference mechanisms.
VPLS: an effective technology for building scalable transparent LAN services
NASA Astrophysics Data System (ADS)
Dong, Ximing; Yu, Shaohua
2005-02-01
Virtual Private LAN Service (VPLS) is generating considerable interest with enterprises and service providers as it offers multipoint transparent LAN service (TLS) over MPLS networks. This paper describes an effective technology - VPLS, which links virtual switch instances (VSIs) through MPLS to form an emulated Ethernet switch and build Scalable Transparent Lan Services. It first focuses on the architecture of VPLS with Ethernet bridging technique at the edge and MPLS at the core, then it tries to elucidate the data forwarding mechanism within VPLS domain, including learning and aging MAC addresses on a per LSP basis, flooding of unknown frames and replication for unknown, multicast, and broadcast frames. The loop-avoidance mechanism, known as split horizon forwarding, is also analyzed. Another important aspect of VPLS service is its basic operation, including autodiscovery and signaling, is discussed. From the perspective of efficiency and scalability the paper compares two important signaling mechanism, BGP and LDP, which are used to set up a PW between the PEs and bind the PWs to a particular VSI. With the extension of VPLS and the increase of full mesh of PWs between PE devices (n*(n-1)/2 PWs in all, a n2 complete problem), VPLS instance could have a large number of remote PE associations, resulting in an inefficient use of network bandwidth and system resources as the ingress PE has to replicate each frame and append MPLS labels for remote PE. So the latter part of this paper focuses on the scalability issue: the Hierarchical VPLS. Within the architecture of HVPLS, this paper addresses two ways to cope with a possibly large number of MAC addresses, which make VPLS operate more efficiently.
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.
Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel
2011-05-20
Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
An overview of the heterogeneous telescope network system: Concept, scalability and operation
NASA Astrophysics Data System (ADS)
White, R. R.; Allan, A.
2008-03-01
In the coming decade there will be an avalanche of data streams devoted to astronomical exploration opening new windows of scientific discovery. The shear volume of data and the diversity of event types (Kantor 2006; Kaiser 2004; Vestrand & Theiler & Wozniak 2004) will necessitate; the move to a common language for the communication of event data, and enabling telescope systems with the ability to not just simply respond, but to act independently in order to take full advantage of available resources in a timely manner. Developed over the past three years, the Virtual Observatory Event (VOEvent) provides the best format for carrying these diverse event messages (White et al. 2006a; Seaman & Warner 2006). However, in order for the telescopes to be able to act independently, a system of interoperable network nodes must be in place, that will allow the astronomical assets to not only issue event notifications, but to coordinate and request specific observations. The Heterogeneous Telescope Network (HTN) is a network architecture that can achieve the goals set forth and provide a scalable design to match both fully autonomous and manual telescope system needs (Allan et al. 2006a; White et al. 2006b; Hessman 2006b). In this paper we will show the design concept of this meta-network and nodes, their scalable architecture and complexity, and how this concept can meet the needs of institutions in the near future.
EnerCage: A Smart Experimental Arena With Scalable Architecture for Behavioral Experiments
Uei-Ming Jow; Peter McMenamin; Mehdi Kiani; Manns, Joseph R.; Ghovanloo, Maysam
2014-01-01
Wireless power, when coupled with miniaturized implantable electronics, has the potential to provide a solution to several challenges facing neuroscientists during basic and preclinical studies with freely behaving animals. The EnerCage system is one such solution as it allows for uninterrupted electrophysiology experiments over extended periods of time and vast experimental arenas, while eliminating the need for bulky battery payloads or tethering. It has a scalable array of overlapping planar spiral coils (PSCs) and three-axis magnetic sensors for focused wireless power transmission to devices on freely moving subjects. In this paper, we present the first fully functional EnerCage system, in which the number of PSC drivers and magnetic sensors was reduced to one-third of the number used in our previous design via multicoil coupling. The power transfer efficiency (PTE) has been improved to 5.6% at a 120 mm coupling distance and a 48.5 mm lateral misalignment (worst case) between the transmitter (Tx) array and receiver (Rx) coils. The new EnerCage system is equipped with an Ethernet backbone, further supporting its modular/scalable architecture, which, in turn, allows experimental arenas with arbitrary shapes and dimensions. A set of experiments on a freely behaving rat were conducted by continuously delivering 20 mW to the electronics in the animal headstage for more than one hour in a powered 3538 cm2 experimental area. PMID:23955695
EnerCage: a smart experimental arena with scalable architecture for behavioral experiments.
Uei-Ming Jow; McMenamin, Peter; Kiani, Mehdi; Manns, Joseph R; Ghovanloo, Maysam
2014-01-01
Wireless power, when coupled with miniaturized implantable electronics, has the potential to provide a solution to several challenges facing neuroscientists during basic and preclinical studies with freely behaving animals. The EnerCage system is one such solution as it allows for uninterrupted electrophysiology experiments over extended periods of time and vast experimental arenas, while eliminating the need for bulky battery payloads or tethering. It has a scalable array of overlapping planar spiral coils (PSCs) and three-axis magnetic sensors for focused wireless power transmission to devices on freely moving subjects. In this paper, we present the first fully functional EnerCage system, in which the number of PSC drivers and magnetic sensors was reduced to one-third of the number used in our previous design via multicoil coupling. The power transfer efficiency (PTE) has been improved to 5.6% at a 120 mm coupling distance and a 48.5 mm lateral misalignment (worst case) between the transmitter (Tx) array and receiver (Rx) coils. The new EnerCage system is equipped with an Ethernet backbone, further supporting its modular/scalable architecture, which, in turn, allows experimental arenas with arbitrary shapes and dimensions. A set of experiments on a freely behaving rat were conducted by continuously delivering 20 mW to the electronics in the animal headstage for more than one hour in a powered 3538 cm(2) experimental area.
Final Report for Project DE-FC02-06ER25755 [Pmodels2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panda, Dhabaleswar; Sadayappan, P.
2014-03-12
In this report, we describe the research accomplished by the OSU team under the Pmodels2 project. The team has worked on various angles: designing high performance MPI implementations on modern networking technologies (Mellanox InfiniBand (including the new ConnectX2 architecture and Quad Data Rate), QLogic InfiniPath, the emerging 10GigE/iWARP and RDMA over Converged Enhanced Ethernet (RoCE) and Obsidian IB-WAN), studying MPI scalability issues for multi-thousand node clusters using XRC transport, scalable job start-up, dynamic process management support, efficient one-sided communication, protocol offloading and designing scalable collective communication libraries for emerging multi-core architectures. New designs conforming to the Argonne’s Nemesis interface havemore » also been carried out. All of these above solutions have been integrated into the open-source MVAPICH/MVAPICH2 software. This software is currently being used by more than 2,100 organizations worldwide (in 71 countries). As of January ’14, more than 200,000 downloads have taken place from the OSU Web site. In addition, many InfiniBand vendors, server vendors, system integrators and Linux distributors have been incorporating MVAPICH/MVAPICH2 into their software stacks and distributing it. Several InfiniBand systems using MVAPICH/MVAPICH2 have obtained positions in the TOP500 ranking of supercomputers in the world. The latest November ’13 ranking include the following systems: 7th ranked Stampede system at TACC with 462,462 cores; 11th ranked Tsubame 2.5 system at Tokyo Institute of Technology with 74,358 cores; 16th ranked Pleiades system at NASA with 81,920 cores; Work on PGAS models has proceeded on multiple directions. The Scioto framework, which supports task-parallelism in one-sided and global-view parallel programming, has been extended to allow multi-processor tasks that are executed by processor groups. A quantum Monte Carlo application is being ported onto the extended Scioto framework. A public release of Global Trees (GT) has been made, along with the Global Chunks (GC) framework on which GT is built. The Global Chunks (GC) layer is also being used as the basis for the development of a higher level Global Graphs (GG) layer. The Global Graphs (GG) system will provide a global address space view of distributed graph data structures on distributed memory systems.« less
A Cloud-based Infrastructure and Architecture for Environmental System Research
NASA Astrophysics Data System (ADS)
Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.
2016-12-01
The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.
Research on high availability architecture of SQL and NoSQL
NASA Astrophysics Data System (ADS)
Wang, Zhiguo; Wei, Zhiqiang; Liu, Hao
2017-03-01
With the advent of the era of big data, amount and importance of data have increased dramatically. SQL database develops in performance and scalability, but more and more companies tend to use NoSQL database as their databases, because NoSQL database has simpler data model and stronger extension capacity than SQL database. Almost all database designers including SQL database and NoSQL database aim to improve performance and ensure availability by reasonable architecture which can reduce the effects of software failures and hardware failures, so that they can provide better experiences for their customers. In this paper, I mainly discuss the architectures of MySQL, MongoDB, and Redis, which are high available and have been deployed in practical application environment, and design a hybrid architecture.
Innovative HPC architectures for the study of planetary plasma environments
NASA Astrophysics Data System (ADS)
Amaya, Jorge; Wolf, Anna; Lembège, Bertrand; Zitz, Anke; Alvarez, Damian; Lapenta, Giovanni
2016-04-01
DEEP-ER is an European Commission founded project that develops a new type of High Performance Computer architecture. The revolutionary system is currently used by KU Leuven to study the effects of the solar wind on the global environments of the Earth and Mercury. The new architecture combines the versatility of Intel Xeon computing nodes with the power of the upcoming Intel Xeon Phi accelerators. Contrary to classical heterogeneous HPC architectures, where it is customary to find CPU and accelerators in the same computing nodes, in the DEEP-ER system CPU nodes are grouped together (Cluster) and independently from the accelerator nodes (Booster). The system is equipped with a state of the art interconnection network, a highly scalable and fast I/O and a fail recovery resiliency system. The final objective of the project is to introduce a scalable system that can be used to create the next generation of exascale supercomputers. The code iPic3D from KU Leuven is being adapted to this new architecture. This particle-in-cell code can now perform the computation of the electromagnetic fields in the Cluster while the particles are moved in the Booster side. Using fast and scalable Xeon Phi accelerators in the Booster we can introduce many more particles per cell in the simulation than what is possible in the current generation of HPC systems, allowing to calculate fully kinetic plasmas with very low interpolation noise. The system will be used to perform fully kinetic, low noise, 3D simulations of the interaction of the solar wind with the magnetosphere of the Earth and Mercury. Preliminary simulations have been performed in other HPC centers in order to compare the results in different systems. In this presentation we show the complexity of the plasma flow around the planets, including the development of hydrodynamic instabilities at the flanks, the presence of the collision-less shock, the magnetosheath, the magnetopause, reconnection zones, the formation of the plasma sheet and the magnetotail, and the variation of ion/electron plasma flows when crossing these frontiers. The simulations also give access to detailed information about the particle dynamics and their velocity distribution at locations that can be used for comparison with satellite data.
NASA Technical Reports Server (NTRS)
Feinberg, Lee; Rioux, Norman; Bolcar, Matthew; Liu, Alice; Guyon, Oliver; Stark, Chris; Arenberg, Jon
2016-01-01
Key challenges of a future large aperture, segmented Ultraviolet Optical Infrared (UVOIR) Telescope capable of performing a spectroscopic survey of hundreds of Exoplanets will be sufficient stability to achieve 10^-10 contrast measurements and sufficient throughput and sensitivity for high yield Exo-Earth spectroscopic detection. Our team has collectively assessed an optimized end to end architecture including a high throughput coronagraph capable of working with a segmented telescope, a cost-effective and heritage based stable segmented telescope, a control architecture that minimizes the amount of new technologies, and an Exo-Earth yield assessment to evaluate potential performance. These efforts are combined through integrated modeling, coronagraph evaluations, and Exo-Earth yield calculations to assess the potential performance of the selected architecture. In addition, we discusses the scalability of this architecture to larger apertures and the technological tall poles to enabling it.
A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture
NASA Astrophysics Data System (ADS)
Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.
2017-12-01
A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.
Demonstration of universal parametric entangling gates on a multi-qubit lattice
Reagor, Matthew; Osborn, Christopher B.; Tezak, Nikolas; Staley, Alexa; Prawiroatmodjo, Guenevere; Scheer, Michael; Alidoust, Nasser; Sete, Eyob A.; Didier, Nicolas; da Silva, Marcus P.; Acala, Ezer; Angeles, Joel; Bestwick, Andrew; Block, Maxwell; Bloom, Benjamin; Bradley, Adam; Bui, Catvu; Caldwell, Shane; Capelluto, Lauren; Chilcott, Rick; Cordova, Jeff; Crossman, Genya; Curtis, Michael; Deshpande, Saniya; El Bouayadi, Tristan; Girshovich, Daniel; Hong, Sabrina; Hudson, Alex; Karalekas, Peter; Kuang, Kat; Lenihan, Michael; Manenti, Riccardo; Manning, Thomas; Marshall, Jayss; Mohan, Yuvraj; O’Brien, William; Otterbach, Johannes; Papageorge, Alexander; Paquette, Jean-Philip; Pelstring, Michael; Polloreno, Anthony; Rawat, Vijay; Ryan, Colm A.; Renzas, Russ; Rubin, Nick; Russel, Damon; Rust, Michael; Scarabelli, Diego; Selvanayagam, Michael; Sinclair, Rodney; Smith, Robert; Suska, Mark; To, Ting-Wai; Vahidpour, Mehrnoosh; Vodrahalli, Nagesh; Whyland, Tyler; Yadav, Kamal; Zeng, William; Rigetti, Chad T.
2018-01-01
We show that parametric coupling techniques can be used to generate selective entangling interactions for multi-qubit processors. By inducing coherent population exchange between adjacent qubits under frequency modulation, we implement a universal gate set for a linear array of four superconducting qubits. An average process fidelity of ℱ = 93% is estimated for three two-qubit gates via quantum process tomography. We establish the suitability of these techniques for computation by preparing a four-qubit maximally entangled state and comparing the estimated state fidelity with the expected performance of the individual entangling gates. In addition, we prepare an eight-qubit register in all possible bitstring permutations and monitor the fidelity of a two-qubit gate across one pair of these qubits. Across all these permutations, an average fidelity of ℱ = 91.6 ± 2.6% is observed. These results thus offer a path to a scalable architecture with high selectivity and low cross-talk. PMID:29423443
Accelerating Climate Simulations Through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark
2009-01-01
Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
MAC layer security issues in wireless mesh networks
NASA Astrophysics Data System (ADS)
Reddy, K. Ganesh; Thilagam, P. Santhi
2016-03-01
Wireless Mesh Networks (WMNs) have emerged as a promising technology for a broad range of applications due to their self-organizing, self-configuring and self-healing capability, in addition to their low cost and easy maintenance. Securing WMNs is more challenging and complex issue due to their inherent characteristics such as shared wireless medium, multi-hop and inter-network communication, highly dynamic network topology and decentralized architecture. These vulnerable features expose the WMNs to several types of attacks in MAC layer. The existing MAC layer standards and implementations are inadequate to secure these features and fail to provide comprehensive security solutions to protect both backbone and client mesh. Hence, there is a need for developing efficient, scalable and integrated security solutions for WMNs. In this paper, we classify the MAC layer attacks and analyze the existing countermeasures. Based on attacks classification and countermeasures analysis, we derive the research directions to enhance the MAC layer security for WMNs.
A heuristic for deriving the optimal number and placement of reconnaissance sensors
NASA Astrophysics Data System (ADS)
Nanda, S.; Weeks, J.; Archer, M.
2008-04-01
A key to mastering asymmetric warfare is the acquisition of accurate intelligence on adversaries and their assets in urban and open battlefields. To achieve this, one needs adequate numbers of tactical sensors placed in locations to optimize coverage, where optimality is realized by covering a given area of interest with the least number of sensors, or covering the largest possible subsection of an area of interest with a fixed set of sensors. Unfortunately, neither problem admits a polynomial time algorithm as a solution, and therefore, the placement of such sensors must utilize intelligent heuristics instead. In this paper, we present a scheme implemented on parallel SIMD processing architectures to yield significantly faster results, and that is highly scalable with respect to dynamic changes in the area of interest. Furthermore, the solution to the first problem immediately translates to serve as a solution to the latter if and when any sensors are rendered inoperable.
Ultracoherent operation of spin qubits with superexchange coupling
NASA Astrophysics Data System (ADS)
Rančić, Marko J.; Burkard, Guido
2017-11-01
With the use of nuclear-spin-free materials such as silicon and germanium, spin-based quantum bits (qubits) have evolved to become among the most coherent systems for quantum information processing. The new frontier for spin qubits has therefore shifted to the ubiquitous charge noise and spin-orbit interaction, which are limiting the coherence times and gate fidelities of solid-state qubits. In this paper we investigate superexchange, as a means of indirect exchange interaction between two single electron spin qubits, each embedded in a single semiconductor quantum dot (QD), mediated by an intermediate, empty QD. Our results suggest the existence of "supersweet spots", in which the qubit operations implemented by superexchange interaction are simultaneously first-order-insensitive to charge noise and to errors due to spin-orbit interaction. The proposed spin-qubit architecture is scalable and within the manufacturing capabilities of semiconductor industry.
IVAN: Intelligent Van for the Distribution of Pharmaceutical Drugs
Moreno, Asier; Angulo, Ignacio; Perallos, Asier; Landaluce, Hugo; Zuazola, Ignacio Julio García; Azpilicueta, Leire; Astrain, José Javier; Falcone, Francisco; Villadangos, Jesús
2012-01-01
This paper describes a telematic system based on an intelligent van which is capable of tracing pharmaceutical drugs over delivery routes from a warehouse to pharmacies, without altering carriers' daily conventional tasks. The intelligent van understands its environment, taking into account its location, the assets and the predefined delivery route; with the capability of reporting incidences to carriers in case of failure according to the established distribution plan. It is a non-intrusive solution which represents a successful experience of using smart environments and an optimized Radio Frequency Identification (RFID) embedded system in a viable way to resolve a real industrial need in the pharmaceutical industry. The combination of deterministic modeling of the indoor vehicle, the implementation of an ad-hoc radiating element and an agile software platform within an overall system architecture leads to a competitive, flexible and scalable solution. PMID:22778659
Scalable fabrication of perovskite solar cells
Li, Zhen; Klein, Talysa R.; Kim, Dong Hoe; ...
2018-03-27
Perovskite materials use earth-abundant elements, have low formation energies for deposition and are compatible with roll-to-roll and other high-volume manufacturing techniques. These features make perovskite solar cells (PSCs) suitable for terawatt-scale energy production with low production costs and low capital expenditure. Demonstrations of performance comparable to that of other thin-film photovoltaics (PVs) and improvements in laboratory-scale cell stability have recently made scale up of this PV technology an intense area of research focus. Here, we review recent progress and challenges in scaling up PSCs and related efforts to enable the terawatt-scale manufacturing and deployment of this PV technology. We discussmore » common device and module architectures, scalable deposition methods and progress in the scalable deposition of perovskite and charge-transport layers. We also provide an overview of device and module stability, module-level characterization techniques and techno-economic analyses of perovskite PV modules.« less
Gil-Santos, Eduardo; Baker, Christopher; Lemaître, Aristide; Gomez, Carmen; Leo, Giuseppe; Favero, Ivan
2017-01-01
Photonic lattices of mutually interacting indistinguishable cavities represent a cornerstone of collective phenomena in optics and could become important in advanced sensing or communication devices. The disorder induced by fabrication technologies has so far hindered the development of such resonant cavity architectures, while post-fabrication tuning methods have been limited by complexity and poor scalability. Here we present a new simple and scalable tuning method for ensembles of microphotonic and nanophotonic resonators, which enables their permanent collective spectral alignment. The method introduces an approach of cavity-enhanced photoelectrochemical etching in a fluid, a resonant process triggered by sub-bandgap light that allows for high selectivity and precision. The technique is presented on a gallium arsenide nanophotonic platform and illustrated by finely tuning one, two and up to five resonators. It opens the way to applications requiring large networks of identical resonators and their spectral referencing to external etalons. PMID:28117394
NASA Astrophysics Data System (ADS)
Qiao, Mu
2015-03-01
Service Oriented Architecture1 (SOA) is widely used in building flexible and scalable web sites and services. In most of the web or mobile photo book and gifting business space, the products ordered are highly variable without a standard template that one can substitute texts or images from similar to that of commercial variable data printing. In this paper, the author describes a SOA workflow in a multi-sites, multi-product lines fulfillment system where three major challenges are addressed: utilization of hardware and equipment, highly automation with fault recovery, and highly scalable and flexible with order volume fluctuation.
Modern Gemini-Approach to Technology Development for Human Space Exploration
NASA Technical Reports Server (NTRS)
White, Harold
2010-01-01
In NASA's plan to put men on the moon, there were three sequential programs: Mercury, Gemini, and Apollo. The Gemini program was used to develop and integrate the technologies that would be necessary for the Apollo program to successfully put men on the moon. We would like to present an analogous modern approach that leverages legacy ISS hardware designs, and integrates developing new technologies into a flexible architecture This new architecture is scalable, sustainable, and can be used to establish human exploration infrastructure beyond low earth orbit and into deep space.
An All-Optical Access Metro Interface for Hybrid WDM/TDM PON Based on OBS
NASA Astrophysics Data System (ADS)
Segarra, Josep; Sales, Vicent; Prat, Josep
2007-04-01
A new all-optical access metro network interface based on optical burst switching (OBS) is proposed. A hybrid wavelength-division multiplexing/time-division multiplexing (WDM/TDM) access architecture with reflective optical network units (ONUs), an arrayed-waveguide-grating outside plant, and a tunable laser stack at the optical line terminal (OLT) is presented as a solution for the passive optical network. By means of OBS and a dynamic bandwidth allocation (DBA) protocol, which polls the ONUs, the available access bandwidth is managed. All the network intelligence and costly equipment is located at the OLT, where the DBA module is centrally implemented, providing quality of service (QoS). To scale this access network, an optical cross connect (OXC) is then used to attain a large number of ONUs by the same OLT. The hybrid WDM/TDM structure is also extended toward the metropolitan area network (MAN) by introducing the concept of OBS multiplexer (OBS-M). The network element OBS-M bridges the MAN and access networks by offering all-optical cross connection, wavelength conversion, and data signaling. The proposed innovative OBS-M node yields a full optical data network, interfacing access and metro with a geographically distributed access control. The resulting novel access metro architectures are nonblocking and, with an improved signaling, provide QoS, scalability, and very low latency. Finally, numerical analysis and simulations demonstrate the traffic performance of the proposed access scheme and all-optical access metro interface and architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murphy, Richard C.
2009-09-01
This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential ofmore » PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Open Research Challenges with Big Data - A Data-Scientist s Perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sukumar, Sreenivas R
In this paper, we discuss data-driven discovery challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are data mining algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emergingmore » and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security, healthcare and manufacturing to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
A Comprehensive Data Architecture for Multi-Disciplinary Marine Mammal Research
NASA Astrophysics Data System (ADS)
Palacios, D. M.; Follett, T.; Winsor, M.; Mate, B. R.
2016-02-01
The Oregon State University Marine Mammal Institute (MMI) comprises five research laboratories, each with specific research objectives, technological approaches, and data requirements. Among the types of data under management are individual photo-ID and field observations, telemetry (e.g., locations, dive characteristics, temperature, acoustics), genetics (and relatedness), stable isotope and toxicology assays, and remotely sensed environmental data. Coordinating data management that facilitates collaboration and comparative exploration among different researchers has been a longstanding challenge for our groups as well as for the greater wildlife research community. Research data are commonly stored locally in flat files or spreadsheets, with copies made and analyses performed with various packages without any common standards for interoperability, becoming a potential source of error. Database design, where it exists, is frequently arrived at ad-hoc. New types of data are generally tacked on when technological advances present them. A data management solution that can address these issues should meet the following requirements: be scalable, modular (i.e., able to incorporate new types of data as they arise), incorporate spatiotemporal dimensions, and be compliant with existing data standards such as DarwinCore. The MMI has developed a data architecture that allows the incorporation of any type of animal-associated data into a modular and portable format that can be integrated with any other dataset sharing the core format. It allows browsing, querying and visualization across any of the attributes that can be associated with individual animals, groups, sensors, or environmental datasets. We have implemented this architecture in an open-source geo-enabled relational database system (PostgreSQL, PostGIS), and have designed a suite of software tools (Python, R) to load, preprocess, visualize, analyze, and export data. This architecture could benefit organizations with similar data challenges.
Coordinated Transformation among Community Colleges Lacking a State System
ERIC Educational Resources Information Center
Russell, James Thad
2016-01-01
Community colleges face many challenges in the face of demands for increased student success. Institutions continually seek scalable interventions and initiatives focused on improving student achievement. Effectively implementing sustainable change that moves the needle of student success remains elusive. Facilitating systemic, scalable change…
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Cheetah: A Framework for Scalable Hierarchical Collective Operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua S
2011-01-01
Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer system. We describe a new hierarchical collective communication framework that takes advantage of hardware-specific data-access mechanisms. It is flexible, with run-time hierarchy specification, and sharing of collective communication primitives between collective algorithms. Data buffers are shared between levels in the hierarchy reducing collective communication management overhead. We have implemented several versions of the Message Passingmore » Interface (MPI) collective operations, MPI Barrier() and MPI Bcast(), and run experiments using up to 49, 152 processes on a Cray XT5, and a small InfiniBand based cluster. At 49, 152 processes our barrier implementation outperforms the optimized native implementation by 75%. 32 Byte and one Mega-Byte broadcasts outperform it by 62% and 11%, respectively, with better scalability characteristics. Improvements relative to the default Open MPI implementation are much larger.« less
Pedretti, Alessandro; Mazzolari, Angelica; Vistoli, Giulio
2018-05-21
The manuscript describes WarpEngine, a novel platform implemented within the VEGA ZZ suite of software for performing distributed simulations both in local and wide area networks. Despite being tailored for structure-based virtual screening campaigns, WarpEngine possesses the required flexibility to carry out distributed calculations utilizing various pieces of software, which can be easily encapsulated within this platform without changing their source codes. WarpEngine takes advantages of all cheminformatics features implemented in the VEGA ZZ program as well as of its largely customizable scripting architecture thus allowing an efficient distribution of various time-demanding simulations. To offer an example of the WarpEngine potentials, the manuscript includes a set of virtual screening campaigns based on the ACE data set of the DUD-E collections using PLANTS as the docking application. Benchmarking analyses revealed a satisfactory linearity of the WarpEngine performances, the speed-up values being roughly equal to the number of utilized cores. Again, the computed scalability values emphasized that a vast majority (i.e., >90%) of the performed simulations benefit from the distributed platform presented here. WarpEngine can be freely downloaded along with the VEGA ZZ program at www.vegazz.net .
NASA Astrophysics Data System (ADS)
Rossi, Francesco; Londrillo, Pasquale; Sgattoni, Andrea; Sinigardi, Stefano; Turchetti, Giorgio
2012-12-01
We present `jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cell (PIC) code, capable of running simulations in various laser plasma acceleration regimes on Graphics-Processing-Units (GPUs) HPC clusters. Standard energy/charge preserving FDTD-based algorithms have been implemented using double precision and quadratic (or arbitrary sized) shape functions for the particle weighting. When porting a PIC scheme to the GPU architecture (or, in general, a shared memory environment), the particle-to-grid operations (e.g. the evaluation of the current density) require special care to avoid memory inconsistencies and conflicts. Here we present a robust implementation of this operation that is efficient for any number of particles per cell and particle shape function order. Our algorithm exploits the exposed GPU memory hierarchy and avoids the use of atomic operations, which can hurt performance especially when many particles lay on the same cell. We show the code multi-GPU scalability results and present a dynamic load-balancing algorithm. The code is written using a python-based C++ meta-programming technique which translates in a high level of modularity and allows for easy performance tuning and simple extension of the core algorithms to various simulation schemes.
Lopez-Iturri, Peio; Aguirre, Erik; Trigo, Jesús Daniel; Astrain, José Javier; Azpilicueta, Leyre; Serrano, Luis; Villadangos, Jesús; Falcone, Francisco
2018-01-29
In the context of hospital management and operation, Intensive Care Units (ICU) are one of the most challenging in terms of time responsiveness and criticality, in which adequate resource management and signal processing play a key role in overall system performance. In this work, a context aware Intensive Care Unit is implemented and analyzed to provide scalable signal acquisition capabilities, as well as to provide tracking and access control. Wireless channel analysis is performed by means of hybrid optimized 3D Ray Launching deterministic simulation to assess potential interference impact as well as to provide required coverage/capacity thresholds for employed transceivers. Wireless system operation within the ICU scenario, considering conventional transceiver operation, is feasible in terms of quality of service for the complete scenario. Extensive measurements of overall interference levels have also been carried out, enabling subsequent adequate coverage/capacity estimations, for a set of Zigbee based nodes. Real system operation has been tested, with ad-hoc designed Zigbee wireless motes, employing lightweight communication protocols to minimize energy and bandwidth usage. An ICU information gathering application and software architecture for Visitor Access Control has been implemented, providing monitoring of the Boxes external doors and the identification of visitors via a RFID system. The results enable a solution to provide ICU access control and tracking capabilities previously not exploited, providing a step forward in the implementation of a Smart Health framework.
Singh, Kunwar; Tiwari, Satish Chandra; Gupta, Maneesha
2014-01-01
The paper introduces novel architectures for implementation of fully static master-slave flip-flops for low power, high performance, and high density. Based on the proposed structure, traditional C(2)MOS latch (tristate inverter/clocked inverter) based flip-flop is implemented with fewer transistors. The modified C(2)MOS based flip-flop designs mC(2)MOSff1 and mC(2)MOSff2 are realized using only sixteen transistors each while the number of clocked transistors is also reduced in case of mC(2)MOSff1. Postlayout simulations indicate that mC(2)MOSff1 flip-flop shows 12.4% improvement in PDAP (power-delay-area product) when compared with transmission gate flip-flop (TGFF) at 16X capacitive load which is considered to be the best design alternative among the conventional master-slave flip-flops. To validate the correct behaviour of the proposed design, an eight bit asynchronous counter is designed to layout level. LVS and parasitic extraction were carried out on Calibre, whereas layouts were implemented using IC station (Mentor Graphics). HSPICE simulations were used to characterize the transient response of the flip-flop designs in a 180 nm/1.8 V CMOS technology. Simulations were also performed at 130 nm, 90 nm, and 65 nm to reveal the scalability of both the designs at modern process nodes.
Tiwari, Satish Chandra; Gupta, Maneesha
2014-01-01
The paper introduces novel architectures for implementation of fully static master-slave flip-flops for low power, high performance, and high density. Based on the proposed structure, traditional C2MOS latch (tristate inverter/clocked inverter) based flip-flop is implemented with fewer transistors. The modified C2MOS based flip-flop designs mC2MOSff1 and mC2MOSff2 are realized using only sixteen transistors each while the number of clocked transistors is also reduced in case of mC2MOSff1. Postlayout simulations indicate that mC2MOSff1 flip-flop shows 12.4% improvement in PDAP (power-delay-area product) when compared with transmission gate flip-flop (TGFF) at 16X capacitive load which is considered to be the best design alternative among the conventional master-slave flip-flops. To validate the correct behaviour of the proposed design, an eight bit asynchronous counter is designed to layout level. LVS and parasitic extraction were carried out on Calibre, whereas layouts were implemented using IC station (Mentor Graphics). HSPICE simulations were used to characterize the transient response of the flip-flop designs in a 180 nm/1.8 V CMOS technology. Simulations were also performed at 130 nm, 90 nm, and 65 nm to reveal the scalability of both the designs at modern process nodes. PMID:24723808
SLURM: Simple Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Grondona, M
2002-12-19
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.
SLURM: Simplex Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Grondona, M
2003-04-22
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling, and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
ERIC Educational Resources Information Center
Younge, Andrew J.
2016-01-01
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
2004-12-01
handling using the X10 home automation protocol. Each 3D graphics client renders its scene according to an assigned virtual camera position. By having...control protocol. DMX is a versatile and robust framework which overcomes limitations of the X10 home automation protocol which we are currently using
Current and Future Development of a Non-hydrostatic Unified Atmospheric Model (NUMA)
2010-09-09
following capabilities: 1. Highly scalable on current and future computer architectures ( exascale computing and beyond and GPUs) 2. Flexibility... Exascale Computing • 10 of Top 500 are already in the Petascale range • Should also keep our eyes on GPUs (e.g., Mare Nostrum) 2. Numerical
Wireless Computing Architecture III
2013-09-01
MIMO Multiple-Input and Multiple-Output MIMO /CON MIMO with concurrent hannel access and estimation MU- MIMO Multiuser MIMO OFDM Orthogonal...compressive sensing \\; a design for concurrent channel estimation in scalable multiuser MIMO networking; and novel networking protocols based on machine...Network, Antenna Arrays, UAV networking, Angle of Arrival, Localization MIMO , Access Point, Channel State Information, Compressive Sensing 16
The Building of Multimedia Communications Network based on Session Initiation Protocol
NASA Astrophysics Data System (ADS)
Yuexiao, Han; Yanfu, Zhang
In this paper, we presented a novel design for a distributed multimedia communications network. We introduced the distributed tactic, flow procedure and particular structure. We also analyzed its scalability, stability, robustness, extension, and transmission delay of this architecture. Finally, the result shows our framework is suitable for very large scale communications.
Scalable Vector Media-processors for Embedded Systems
2002-05-01
Set Architecture for Multimedia “When you do the common things in life in an uncommon way, you will command the attention of the world.” George ...Bibliography [ABHS89] M. August, G. Brost , C. Hsiung, and C. Schiffleger. Cray X-MP: The Birth of a Super- computer. IEEE Computer, 22(1):45–52, January
Schematic driven layout of Reed Solomon encoders
NASA Technical Reports Server (NTRS)
Arave, Kari; Canaris, John; Miles, Lowell; Whitaker, Sterling
1992-01-01
Two Reed Solomon error correcting encoders are presented. Schematic driven layout tools were used to create the encoder layouts. Special consideration had to be given to the architecture and logic to provide scalability of the encoder designs. Knowledge gained from these projects was used to create a more flexible schematic driven layout system.
A Stateful Multicast Access Control Mechanism for Future Metro-Area-Networks.
ERIC Educational Resources Information Center
Sun, Wei-qiang; Li, Jin-sheng; Hong, Pei-lin
2003-01-01
Multicasting is a necessity for a broadband metro-area-network; however security problems exist with current multicast protocols. A stateful multicast access control mechanism, based on MAPE, is proposed. The architecture of MAPE is discussed, as well as the states maintained and messages exchanged. The scheme is flexible and scalable. (Author/AEF)
High temperature semiconductor diode laser pumps for high energy laser applications
NASA Astrophysics Data System (ADS)
Campbell, Jenna; Semenic, Tadej; Guinn, Keith; Leisher, Paul O.; Bhunia, Avijit; Mashanovitch, Milan; Renner, Daniel
2018-02-01
Existing thermal management technologies for diode laser pumps place a significant load on the size, weight and power consumption of High Power Solid State and Fiber Laser systems, thus making current laser systems very large, heavy, and inefficient in many important practical applications. To mitigate this thermal management burden, it is desirable for diode pumps to operate efficiently at high heat sink temperatures. In this work, we have developed a scalable cooling architecture, based on jet-impingement technology with industrial coolant, for efficient cooling of diode laser bars. We have demonstrated 60% electrical-to-optical efficiency from a 9xx nm two-bar laser stack operating with propylene-glycolwater coolant, at 50 °C coolant temperature. To our knowledge, this is the highest efficiency achieved from a diode stack using 50 °C industrial fluid coolant. The output power is greater than 100 W per bar. Stacks with additional laser bars are currently in development, as this cooler architecture is scalable to a 1 kW system. This work will enable compact and robust fiber-coupled diode pump modules for high energy laser applications.
Zou, Lei; Lai, Yanqing; Hu, Hongxing; Wang, Mengran; Zhang, Kai; Zhang, Peng; Fang, Jing; Li, Jie
2017-10-12
A facile and scalable method is realized for the in situ synthesis of N/S co-doped 3 D porous carbon nanosheet networks (NSPCNNs) as anode materials for sodium-ion batteries. During the synthesis, NaCl is used as a template to prepare porous carbon nanosheet networks. In the resultant architecture, the unique 3 D porous architecture ensures a large specific surface area and fast diffusion paths of both electrons and ions. In addition, the import of N/S produces abundant defects, increased interlayer spacings, more active sites, and high electronic conductivity. The obtained products deliver a high specific capacity and excellent long-term cycling performance, specifically, a capacity of 336.2 mA h g -1 at 0.05 A g -1 , remaining as large as 214.9 mA h g -1 after 2000 charge/discharge cycles at 0.5 A g -1 . This material has great prospects for future applications of scalable, low-cost, and environmentally friendly sodium-ion batteries. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Automation Hooks Architecture Trade Study for Flexible Test Orchestration
NASA Technical Reports Server (NTRS)
Lansdowne, Chatwin A.; Maclean, John R.; Graffagnino, Frank J.; McCartney, Patrick A.
2010-01-01
We describe the conclusions of a technology and communities survey supported by concurrent and follow-on proof-of-concept prototyping to evaluate feasibility of defining a durable, versatile, reliable, visible software interface to support strategic modularization of test software development. The objective is that test sets and support software with diverse origins, ages, and abilities can be reliably integrated into test configurations that assemble and tear down and reassemble with scalable complexity in order to conduct both parametric tests and monitored trial runs. The resulting approach is based on integration of three recognized technologies that are currently gaining acceptance within the test industry and when combined provide a simple, open and scalable test orchestration architecture that addresses the objectives of the Automation Hooks task. The technologies are automated discovery using multicast DNS Zero Configuration Networking (zeroconf), commanding and data retrieval using resource-oriented Restful Web Services, and XML data transfer formats based on Automatic Test Markup Language (ATML). This open-source standards-based approach provides direct integration with existing commercial off-the-shelf (COTS) analysis software tools.
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.
2012-09-01
A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.
NASA Astrophysics Data System (ADS)
Fink, Wolfgang; George, Thomas; Tarbell, Mark A.
2007-04-01
Robotic reconnaissance operations are called for in extreme environments, not only those such as space, including planetary atmospheres, surfaces, and subsurfaces, but also in potentially hazardous or inaccessible operational areas on Earth, such as mine fields, battlefield environments, enemy occupied territories, terrorist infiltrated environments, or areas that have been exposed to biochemical agents or radiation. Real time reconnaissance enables the identification and characterization of transient events. A fundamentally new mission concept for tier-scalable reconnaissance of operational areas, originated by Fink et al., is aimed at replacing the engineering and safety constrained mission designs of the past. The tier-scalable paradigm integrates multi-tier (orbit atmosphere surface/subsurface) and multi-agent (satellite UAV/blimp surface/subsurface sensing platforms) hierarchical mission architectures, introducing not only mission redundancy and safety, but also enabling and optimizing intelligent, less constrained, and distributed reconnaissance in real time. Given the mass, size, and power constraints faced by such a multi-platform approach, this is an ideal application scenario for a diverse set of MEMS sensors. To support such mission architectures, a high degree of operational autonomy is required. Essential elements of such operational autonomy are: (1) automatic mapping of an operational area from different vantage points (including vehicle health monitoring); (2) automatic feature extraction and target/region-of-interest identification within the mapped operational area; and (3) automatic target prioritization for close-up examination. These requirements imply the optimal deployment of MEMS sensors and sensor platforms, sensor fusion, and sensor interoperability.
Service-Oriented Architecture for NVO and TeraGrid Computing
NASA Technical Reports Server (NTRS)
Jacob, Joseph; Miller, Craig; Williams, Roy; Steenberg, Conrad; Graham, Matthew
2008-01-01
The National Virtual Observatory (NVO) Extensible Secure Scalable Service Infrastructure (NESSSI) is a Web service architecture and software framework that enables Web-based astronomical data publishing and processing on grid computers such as the National Science Foundation's TeraGrid. Characteristics of this architecture include the following: (1) Services are created, managed, and upgraded by their developers, who are trusted users of computing platforms on which the services are deployed. (2) Service jobs can be initiated by means of Java or Python client programs run on a command line or with Web portals. (3) Access is granted within a graduated security scheme in which the size of a job that can be initiated depends on the level of authentication of the user.
Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics.
Sadler, Brian M; Hoyos, Sebastian
2014-01-01
The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control.
Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics
Sadler, Brian M; Hoyos, Sebastian
2014-01-01
The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control. PMID:26601042
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Inertial Motion Capture Costume Design Study
Szczęsna, Agnieszka; Skurowski, Przemysław; Lach, Ewa; Pruszowski, Przemysław; Pęszor, Damian; Paszkuta, Marcin; Słupik, Janusz; Lebek, Kamil; Janiak, Mateusz; Polański, Andrzej; Wojciechowski, Konrad
2017-01-01
The paper describes a scalable, wearable multi-sensor system for motion capture based on inertial measurement units (IMUs). Such a unit is composed of accelerometer, gyroscope and magnetometer. The final quality of an obtained motion arises from all the individual parts of the described system. The proposed system is a sequence of the following stages: sensor data acquisition, sensor orientation estimation, system calibration, pose estimation and data visualisation. The construction of the system’s architecture with the dataflow programming paradigm makes it easy to add, remove and replace the data processing steps. The modular architecture of the system allows an effortless introduction of a new sensor orientation estimation algorithms. The original contribution of the paper is the design study of the individual components used in the motion capture system. The two key steps of the system design are explored in this paper: the evaluation of sensors and algorithms for the orientation estimation. The three chosen algorithms have been implemented and investigated as part of the experiment. Due to the fact that the selection of the sensor has a significant impact on the final result, the sensor evaluation process is also explained and tested. The experimental results confirmed that the choice of sensor and orientation estimation algorithm affect the quality of the final results. PMID:28304337
CRAB3: Establishing a new generation of services for distributed analysis at CMS
NASA Astrophysics Data System (ADS)
Cinquilli, M.; Spiga, D.; Grandi, C.; Hernàndez, J. M.; Konstantinov, P.; Mascheroni, M.; Riahi, H.; Vaandering, E.
2012-12-01
In CMS Computing the highest priorities for analysis tools are the improvement of the end users’ ability to produce and publish reliable samples and analysis results as well as a transition to a sustainable development and operations model. To achieve these goals CMS decided to incorporate analysis processing into the same framework as data and simulation processing. This strategy foresees that all workload tools (TierO, Tier1, production, analysis) share a common core with long term maintainability as well as the standardization of the operator interfaces. The re-engineered analysis workload manager, called CRAB3, makes use of newer technologies, such as RESTFul based web services and NoSQL Databases, aiming to increase the scalability and reliability of the system. As opposed to CRAB2, in CRAB3 all work is centrally injected and managed in a global queue. A pool of agents, which can be geographically distributed, consumes work from the central services serving the user tasks. The new architecture of CRAB substantially changes the deployment model and operations activities. In this paper we present the implementation of CRAB3, emphasizing how the new architecture improves the workflow automation and simplifies maintainability. In particular, we will highlight the impact of the new design on daily operations.
Design of the Protocol Processor for the ROBUS-2 Communication System
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.
2005-01-01
The ROBUS-2 Protocol Processor (RPP) is a custom-designed hardware component implementing the functionality of the ROBUS-2 fault-tolerant communication system. The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault tolerant integrated modular architecture currently under development at NASA Langley Research Center. ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, time reference (clock synchronization), and distributed diagnosis (group membership). ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. ROBUS consists of RPPs connected to each other by a lower-level physical communication network. The RPP has a pipelined architecture and the design is parameterized in the behavioral and structural domains. The design of the RPP enables the bus to achieve a PE-message throughput that approaches the available bandwidth at the physical layer.